WO2023127614A1 - 情報処理装置、情報処理方法、情報処理プログラムおよび情報処理システム - Google Patents
情報処理装置、情報処理方法、情報処理プログラムおよび情報処理システム Download PDFInfo
- Publication number
- WO2023127614A1 WO2023127614A1 PCT/JP2022/047000 JP2022047000W WO2023127614A1 WO 2023127614 A1 WO2023127614 A1 WO 2023127614A1 JP 2022047000 W JP2022047000 W JP 2022047000W WO 2023127614 A1 WO2023127614 A1 WO 2023127614A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- specialized
- existing
- unit
- frame
- Prior art date
Links
- 230000010365 information processing Effects 0.000 title claims abstract description 95
- 238000003672 processing method Methods 0.000 title claims description 6
- 238000012545 processing Methods 0.000 claims abstract description 764
- 238000012549 training Methods 0.000 claims abstract description 37
- 238000005070 sampling Methods 0.000 claims description 130
- 230000015572 biosynthetic process Effects 0.000 claims description 34
- 238000003786 synthesis reaction Methods 0.000 claims description 34
- 230000003068 static effect Effects 0.000 claims description 13
- 230000003287 optical effect Effects 0.000 claims description 12
- 230000001131 transforming effect Effects 0.000 claims description 9
- 238000006243 chemical reaction Methods 0.000 description 576
- 238000011156 evaluation Methods 0.000 description 382
- 238000010586 diagram Methods 0.000 description 247
- 238000000034 method Methods 0.000 description 220
- 238000009825 accumulation Methods 0.000 description 162
- 238000003384 imaging method Methods 0.000 description 151
- 230000006870 function Effects 0.000 description 111
- 230000000875 corresponding effect Effects 0.000 description 87
- 238000004821 distillation Methods 0.000 description 84
- 230000008569 process Effects 0.000 description 76
- 238000004364 calculation method Methods 0.000 description 59
- 230000001276 controlling effect Effects 0.000 description 39
- 238000003860 storage Methods 0.000 description 39
- 238000010606 normalization Methods 0.000 description 37
- 230000004913 activation Effects 0.000 description 31
- 238000012986 modification Methods 0.000 description 31
- 230000004048 modification Effects 0.000 description 31
- 230000008859 change Effects 0.000 description 29
- 238000004458 analytical method Methods 0.000 description 27
- 238000007781 pre-processing Methods 0.000 description 27
- 238000012937 correction Methods 0.000 description 24
- 238000013527 convolutional neural network Methods 0.000 description 21
- 239000000284 extract Substances 0.000 description 20
- 238000001914 filtration Methods 0.000 description 19
- 238000012805 post-processing Methods 0.000 description 16
- 238000000605 extraction Methods 0.000 description 15
- 230000004044 response Effects 0.000 description 15
- 230000009467 reduction Effects 0.000 description 14
- 239000004065 semiconductor Substances 0.000 description 14
- 230000009466 transformation Effects 0.000 description 14
- 238000003705 background correction Methods 0.000 description 11
- 238000009792 diffusion process Methods 0.000 description 11
- 238000007667 floating Methods 0.000 description 11
- 238000004891 communication Methods 0.000 description 9
- 238000005457 optimization Methods 0.000 description 9
- 230000036961 partial effect Effects 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 6
- 238000013528 artificial neural network Methods 0.000 description 5
- 238000010801 machine learning Methods 0.000 description 5
- 238000013139 quantization Methods 0.000 description 5
- 238000009826 distribution Methods 0.000 description 4
- 230000007274 generation of a signal involved in cell-cell signaling Effects 0.000 description 4
- 230000015556 catabolic process Effects 0.000 description 3
- 230000006835 compression Effects 0.000 description 3
- 238000007906 compression Methods 0.000 description 3
- 238000000354 decomposition reaction Methods 0.000 description 3
- 238000006731 degradation reaction Methods 0.000 description 3
- 239000011159 matrix material Substances 0.000 description 3
- 230000002194 synthesizing effect Effects 0.000 description 3
- 238000012546 transfer Methods 0.000 description 3
- 230000007423 decrease Effects 0.000 description 2
- 230000006866 deterioration Effects 0.000 description 2
- 230000004069 differentiation Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 230000000873 masking effect Effects 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 230000000737 periodic effect Effects 0.000 description 2
- 230000000306 recurrent effect Effects 0.000 description 2
- 230000002441 reversible effect Effects 0.000 description 2
- 238000005096 rolling process Methods 0.000 description 2
- 244000235115 Alocasia x amazonica Species 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000013140 knowledge distillation Methods 0.000 description 1
- 238000003475 lamination Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 229910044991 metal oxide Inorganic materials 0.000 description 1
- 150000004706 metal oxides Chemical class 0.000 description 1
- 230000001151 other effect Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000000704 physical effect Effects 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 230000002829 reductive effect Effects 0.000 description 1
- 230000002787 reinforcement Effects 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 239000000758 substrate Substances 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
Definitions
- the present disclosure relates to an information processing device, an information processing method, an information processing program, and an information processing system.
- the above-described recognizer in the recognition-specialized sensor that controls the pixel readout unit is significantly different in configuration from a general recognizer that performs recognition processing based on image data for one to several frames. Also, the learning data and evaluation data applied to the recognition specialized sensor are different from the learning data and evaluation data applied to a general recognizer. As a result, there is a possibility that the above-described specialized recognition sensor may be used in a limited number of situations.
- An object of the present disclosure is to provide an information processing device, an information processing method, an information processing program, and an information processing system that enable wider utilization of the recognition specialized sensor.
- the information processing apparatus uses first learning data for learning a first recognizer that performs recognition processing based on a first signal read from a first sensor in a first readout unit.
- a second recognizer that performs recognition processing based on a second signal read from a second sensor that differs from the first sensor in at least one of a readout unit, a signal characteristic, and a pixel characteristic.
- a generation unit that generates second learning data for learning.
- FIG. 1 is a schematic diagram showing a configuration of an example of an information processing system commonly applicable to each embodiment;
- FIG. 1 is a block diagram showing an example configuration of a recognition system applicable to an embodiment;
- FIG. 1 is a functional block diagram of an example for explaining functions of a learning system applicable to an embodiment;
- FIG. It is a block diagram which shows the structure of an example of the imaging part applicable to each embodiment. It is a figure which shows the example which formed the recognition system which concerns on each embodiment by the lamination type CIS of a two-layer structure. It is a figure which shows the example which formed the recognition system which concerns on each embodiment by the laminated CIS of a 3-layer structure.
- 3 is a block diagram showing an example configuration of an information processing device 3100 for realizing a learning system applicable to the embodiment;
- FIG. 1 is a block diagram showing an example configuration of a recognition system applicable to an embodiment
- FIG. 1 is a block diagram showing an example configuration of a recognition system applicable to an embodiment
- FIG. 1 is a
- FIG. 4 is a diagram for schematically explaining image recognition processing by CNN;
- FIG. 4 is a diagram for schematically explaining image recognition processing for obtaining a recognition result from a part of an image to be recognized;
- FIG. 10 is a diagram schematically showing an example of identification processing by DNN when time-series information is not used;
- FIG. 10 is a diagram schematically showing an example of identification processing by DNN when time-series information is not used;
- FIG. 4 is a diagram schematically showing a first example of identification processing by DNN when using time-series information;
- FIG. 4 is a diagram schematically showing a first example of identification processing by DNN when using time-series information;
- FIG. 10 is a diagram schematically showing a second example of identification processing by DNN when time-series information is used;
- FIG. 10 is a diagram schematically showing a second example of identification processing by DNN when time-series information is used;
- FIG. 4 is a schematic diagram for schematically explaining recognition processing applicable to each embodiment of the present disclosure;
- FIG. 4 is a functional block diagram of an example for explaining functions of a conversion unit in the learning system according to the first example of the first embodiment;
- FIG. 4 is a schematic diagram showing a first example of generating learning data from specialized image data applicable to the first example of the first embodiment;
- FIG. 9 is a schematic diagram showing a second example of learning data generation applicable to the first example of the first embodiment;
- FIG. 11 is a schematic diagram showing a third example of learning data generation applicable to the first example of the first embodiment;
- FIG. 12 is a schematic diagram showing a fourth example of learning data generation applicable to the first example of the first embodiment;
- FIG. 12 is a schematic diagram showing a fifth example of learning data generation applicable to the first example of the first embodiment;
- FIG. 11 is a functional block diagram of an example for explaining functions of a conversion unit in the learning system according to the second example of the first embodiment;
- FIG. 9 is a schematic diagram showing a first example of learning data generation applicable to a second example of the first embodiment;
- FIG. 12 is a schematic diagram showing a second example of learning data generation applicable to the second example of the first embodiment;
- FIG. 12 is a schematic diagram showing a third example of learning data generation applicable to the second example of the first embodiment;
- FIG. 11 is a schematic diagram showing a fourth example of learning data generation applicable to the second example of the first embodiment
- FIG. 12 is a schematic diagram showing a fifth example of learning data generation applicable to the second example of the first embodiment
- FIG. 12 is a schematic diagram showing a sixth example of learning data generation applicable to the second example of the first embodiment
- FIG. 13 is a functional block diagram of an example for explaining functions of a conversion unit in the learning system according to the third example of the first embodiment
- FIG. 11 is a schematic diagram for more specifically explaining the generation of learning data according to the third example of the first embodiment
- FIG. 14 is a functional block diagram of an example for explaining functions of a conversion unit in the learning system according to the fourth example of the first embodiment
- FIG. 11 is a schematic diagram for explaining interpolation image generation processing according to a third example of the first embodiment
- FIG. 12 is a functional block diagram of an example for explaining functions of a conversion unit in the learning system according to the fifth example of the first embodiment
- FIG. 12 is a schematic diagram for explaining interpolation image generation processing according to a fifth example of the first embodiment
- FIG. 11 is a functional block diagram of an example for explaining functions of a conversion unit in the learning system according to the first example of the second embodiment
- FIG. 11 is a schematic diagram showing a first example of existing evaluation data generation applicable to the first example of the second embodiment
- FIG. 11 is a schematic diagram showing a second example of existing evaluation data generation applicable to the first example of the second embodiment
- FIG. 12 is a functional block diagram of an example for explaining functions of a conversion unit in the learning system according to the second example of the second embodiment;
- FIG. 11 is a schematic diagram showing a first example of existing evaluation data generation applicable to a second example of the second embodiment;
- FIG. 11 is a schematic diagram showing a second example of existing evaluation data generation applicable to the second example of the second embodiment;
- FIG. 11 is a functional block diagram of an example for explaining functions of a conversion unit in the learning system according to the third example of the second embodiment;
- FIG. 11 is a schematic diagram showing a first example of existing evaluation data generation applicable to the third example of the second embodiment;
- FIG. 13 is a schematic diagram showing a second example of existing evaluation data generation applicable to the third example of the second embodiment;
- FIG. 11 is a schematic diagram showing a first example of existing evaluation data generation applicable to a second example of the second embodiment;
- FIG. 13 is a schematic diagram showing a second example of existing evaluation data generation applicable to the third example of the second embodiment;
- FIG. 14 is a functional block diagram of an example for explaining functions of a conversion unit in the learning system according to the fifth example of the second embodiment;
- FIG. 20 is a schematic diagram for explaining a first example of output timing of existing evaluation data according to the fifth example of the second embodiment;
- FIG. 14 is a schematic diagram for explaining a second example of output timing of existing evaluation data according to the fifth example of the second embodiment;
- FIG. 16 is a schematic diagram for explaining a third example of output timing of existing evaluation data according to the fifth example of the second embodiment;
- FIG. 10 is a schematic diagram for explaining a case where the cycle of outputting existing learning data and the cycle of inputting specialized evaluation data for one frame do not have an integral multiple relationship;
- FIG. 11 is a schematic diagram for schematically explaining each processing pattern according to the third embodiment;
- FIG. 11 is a schematic diagram for explaining a distillation process applicable to the third embodiment;
- FIG. 11 is a schematic diagram showing classified processes according to the third embodiment; It is a schematic diagram for demonstrating general distillation processing.
- FIG. 11 is a schematic diagram for explaining a distillation process according to a third embodiment;
- FIG. 11 is a schematic diagram for explaining processing according to the first example of the third embodiment;
- FIG. FIG. 12 is a schematic diagram for explaining processing according to a second example of the third embodiment;
- FIG. FIG. 12 is a schematic diagram for explaining processing according to a third example of the third embodiment;
- FIG. FIG. 12 is a schematic diagram for explaining processing according to a fourth example of the third embodiment;
- FIG. FIG. 4 is a schematic diagram for explaining Dream Distillation;
- FIG. 11 is a schematic diagram for explaining a distillation process applicable to the third embodiment;
- FIG. 11 is a schematic diagram showing classified processes according to the third embodiment; It is a schematic diagram for demonstrating general distillation processing.
- FIG. 12 is a schematic diagram for explaining processing according to the fifth example of the third embodiment
- FIG. 13 is a functional block diagram of an example for explaining functions of a conversion unit according to the fourth embodiment
- FIG. 4 is a schematic diagram for explaining the principle of filter conversion processing in a filter conversion unit
- FIG. 4 is a schematic diagram showing a comparison between processing by an existing NW and processing by a specialized NW
- FIG. 11 is a schematic diagram for explaining processing according to the first example of the fourth embodiment
- FIG. FIG. 20 is a schematic diagram for explaining processing according to the first modification of the first example of the fourth embodiment
- FIG. 12 is a functional block diagram of an example for explaining functions of a conversion unit according to the second example of the fourth embodiment
- FIG. 4 is a schematic diagram for explaining the principle of filter conversion processing by a filter conversion unit;
- FIG. 4 is a schematic diagram showing a comparison between processing by an existing NW and processing by a specialized NW;
- FIG. 14 is a schematic diagram for explaining processing according to a second example of the fourth embodiment;
- FIG. 20 is a schematic diagram for explaining processing according to the first modification of the second example of the fourth embodiment;
- FIG. 20 is a schematic diagram for explaining processing according to the second modification of the second example of the fourth embodiment;
- FIG. 14 is a functional block diagram of an example for explaining functions of a conversion unit according to the third example of the fourth embodiment;
- FIG. FIG. 4 is a schematic diagram for explaining a receptive field;
- FIG. 14 is a schematic diagram for explaining processing according to a third example of the fourth embodiment;
- FIG. FIG. 11 is a schematic diagram schematically showing layer conversion according to first to third examples of the fourth embodiment;
- FIG. 12 is a schematic diagram for explaining a first example of a fourth example of the fourth embodiment;
- FIG. 14 is a schematic diagram for explaining a second example of the fourth example of the fourth embodiment;
- FIG. 21 is a functional block diagram of an example for explaining functions of a conversion unit commonly applicable to each example of the fifth embodiment;
- FIG. 12 is a schematic diagram for explaining a conversion process of optical linearity applicable to the first example of the fifth embodiment;
- FIG. 12 is a schematic diagram for explaining an example of conversion processing of an SNR curve that can be applied to the first example of the fifth embodiment;
- FIG. 12 is a schematic diagram for explaining another example of conversion processing of an SNR curve that can be applied to the first example of the fifth embodiment;
- FIG. 21 is a schematic diagram for explaining noise histogram conversion processing applicable to the first example of the fifth embodiment;
- FIG. 20 is a schematic diagram for explaining bit length conversion processing applicable to the second embodiment of the fifth embodiment;
- FIG. 21 is a schematic diagram for explaining conversion processing for converting image data before HDR synthesis into image data after HDR synthesis, which is applicable to the second embodiment of the fifth embodiment;
- FIG. 12 is a schematic diagram for explaining conversion processing for converting image data after HDR synthesis into image data before HDR synthesis, which is applicable to the second embodiment of the fifth embodiment;
- FIG. 21 is a schematic diagram showing an example of static tone conversion applicable to the second embodiment of the fifth embodiment;
- FIG. FIG. 14 is a schematic diagram showing an example of shading correction applicable to the second embodiment of the fifth embodiment;
- FIG. 21 is a schematic diagram for schematically explaining the processing according to the second example of the eighth embodiment;
- FIG. 21 is a functional block diagram of an example for explaining functions of a NW converter applicable to the second example of the eighth embodiment;
- FIG. 20 is a schematic diagram for schematically explaining the processing according to the third example of the eighth embodiment;
- FIG. 22 is a functional block diagram of an example for explaining functions of a NW conversion unit applicable to the third example of the eighth embodiment;
- FIG. FIG. 22 is a schematic diagram for schematically explaining processing according to the first example of the ninth embodiment;
- FIG. 22 is a schematic diagram for explaining processing according to the first example of the first example of the ninth embodiment;
- FIG. 22 is a schematic diagram for explaining processing according to the second example of the first example of the ninth embodiment;
- FIG. 21 is a schematic diagram for schematically explaining control processing according to a second example of the ninth embodiment;
- FIG. 20 is a schematic diagram for explaining processing according to the first example of the second example of the ninth embodiment;
- FIG. 22 is a schematic diagram for explaining processing according to a second example of the second example of the ninth embodiment;
- FIG. 22 is a schematic diagram for explaining processing according to the third example of the second example of the ninth embodiment;
- FIG. 10 is a schematic diagram for explaining a region in which target objects appear frequently, which is indicated by statistics;
- FIG. 10 is a schematic diagram for explaining a region in which target objects appear frequently, which is indicated by statistics;
- FIG. 22 is a sequence diagram for explaining read control applicable to the third example of the second example of the ninth embodiment;
- FIG. 21 is a schematic diagram for explaining the principle of processing according to the third example of the ninth embodiment;
- FIG. 22 is a schematic diagram for more specifically explaining the processing according to the third example of the ninth embodiment;
- FIG. 21 is a schematic diagram for explaining control information generated by a control generation unit in the third example of the ninth embodiment;
- FIG. FIG. 22 is a schematic diagram for explaining learning processing in the third example of the ninth embodiment;
- FIG. 21 is a schematic diagram for explaining processing according to the fourth example of the ninth embodiment;
- FIG. 20 is a schematic diagram schematically showing learning processing by an existing recognizer according to the first example of the tenth embodiment;
- FIG. 22 is a schematic diagram schematically showing processing regarding evaluation data by an existing recognizer according to the first example of the tenth embodiment
- FIG. 21 is a functional block diagram of an example for explaining functions of an existing recognizer according to the first example of the tenth embodiment
- FIG. 22 is a schematic diagram more specifically showing the processing of the evaluation data by the recognizer according to the first example of the tenth embodiment
- FIG. 22 is a schematic diagram for more specifically explaining the processing by the attention area selection unit according to the first example of the tenth embodiment
- FIG. 12B is a schematic diagram schematically showing a process related to evaluation data by an existing recognizer according to the second example of the tenth embodiment
- FIG. 20 is a schematic diagram showing classified processes according to the eleventh embodiment
- FIG. 22 is a schematic diagram for explaining processing according to the first example of the eleventh embodiment;
- FIG. 22 is a schematic diagram for explaining processing according to the second example of the eleventh embodiment;
- FIG. 22 is a schematic diagram for explaining processing according to the third example of the eleventh embodiment;
- FIG. 22 is a schematic diagram for explaining processing according to the fourth example of the eleventh embodiment;
- FIG. 22 is a schematic diagram for explaining processing according to the fifth example of the eleventh embodiment;
- FIG. 22 is a schematic diagram for explaining processing according to the sixth example of the eleventh embodiment;
- FIG. 32 is a schematic diagram for explaining processing according to a modification of the sixth example of the eleventh embodiment;
- FIG. 22 is a schematic diagram schematically showing processing according to the twelfth embodiment;
- FIG. 22 is a schematic diagram for explaining processing according to the first example of the twelfth embodiment;
- FIG. 22 is a schematic diagram for explaining processing according to the second example of the twelfth embodiment;
- Third Example of Second Embodiment 5-3-1 First example of generating evaluation data by format conversion 5-3-2.
- Fourth Example of Second Embodiment 5-5 Fifth Example of Second Embodiment 5-5-1.
- Fourth Embodiment 7-1 First Example of Fourth Embodiment 7-1-1. First modification of first embodiment 7-1-2. Second modification of first embodiment 7-2. Second example of fourth embodiment 7-2-1. First modification of second embodiment 7-2-2. Second modification of second embodiment 7-3. Third Example of Fourth Embodiment 7-4. 8. Fourth example of the fourth embodiment. Fifth Embodiment 8-1. Outline of conversion processing by conversion unit 8-2. First Example of Fifth Embodiment 8-3. Second example of the fifth embodiment9. Sixth Embodiment 10. Seventh Embodiment 11. Eighth Embodiment 11-1. First Example of Eighth Embodiment 11-2. Second Example of Eighth Embodiment 11-3. Third example of the eighth embodiment 12. Ninth Embodiment 12-1.
- the present disclosure relates to image recognition processing by a sensor (referred to as a specialized recognition sensor) incorporating a configuration for realizing an image recognition function, and image recognition by a sensor (referred to as an existing sensor) based on existing technology that does not have such a configuration.
- a sensor referred to as a specialized recognition sensor
- an existing sensor image recognition by a sensor (referred to as an existing sensor) based on existing technology that does not have such a configuration.
- pixel signals are read out in units of one frame (frame base). It is to be noted that processing of pixel signals on a frame-by-frame basis is referred to as frame-based processing.
- a recognizer corresponding to an existing sensor (referred to as an existing recognizer) performs recognition processing on a frame basis in units of one frame of image data read from an imaging device in the existing sensor.
- the recognition-specific sensor can process pixel signals in readout units smaller than one frame (non-frame basis).
- Recognition-specific sensors can also have specialized signal characteristics for recognition processing.
- recognition-specific sensors can perform processing on pixel signals on a non-frame basis with signal characteristics that are specialized for recognition processing.
- Non-frame-based processing units include line units and sub-sample units.
- Sub-sampling means, for example, extracting a predetermined number of pixels from one frame that is less than the total number of pixels in the one frame. For example, in sub-sampling, pixels are extracted in units of one or more pixels from one frame, and pixel signals are obtained from the extracted pixels.
- the specialized recognition sensor can terminate recognition processing when a sufficient recognition result is obtained before pixel signals for one frame are read out. It is possible.
- the existing recognizer is trained using frame-based image data as training data. Also, evaluation data for existing recognizers is based on frame-based image data. On the other hand, the specialized recognizer learns using non-frame-based image data as teacher data. Similarly, the evaluation data for specialized recognizers will also be based on non-frame-based image data.
- a user who uses an existing recognizer has a frame-based data set with frame-based learning data and evaluation data.
- learning data is sometimes called teacher data.
- evaluation data is sometimes called test data.
- existing recognizers generally perform recognition processing on frame-based image data using a CNN (Convolutional Neural Network).
- the specialized recognizer regards the sequentially input non-frame-based image data as time-series image data, uses RNN (Recurrent Neural Network), and further uses CNN for recognition. process.
- RNN Recurrent Neural Network
- each item of ((1), (2)) dataset and ((3), (4)) network (A) subsample (including line division), (B) dataset and (C) each item of control of the recognizer.
- the network refers to a neural network, and may be described as "NW”.
- the data set is divided into (1) conversion processing for learning data as input data for the recognizer and (2) conversion processing for evaluation data as input data for the recognizer.
- conversion processing for learning data as input data for the recognizer
- evaluation data as input data for the recognizer.
- For the training data convert the frame-based training data to non-frame-based training data so that the specialized recognizer can be trained.
- evaluation data when the specialized recognizer executes recognition processing, frame-based equivalent data is generated from the non-frame-based data output from the recognition specialized sensor.
- the network is divided into (3) conversion processing for the entire network included in the recognizer and (4) conversion processing for individual configurations (layers, etc.) included in the network. (3) For the entire network, train a specialized recognizer based on the outputs of the existing recognizers. (4) For each network, based on the output of the existing recognizer, the processing parameters of the specialized recognizer are converted so that the output of the specialized recognizer approximates the output of the existing recognizer.
- (A) subsamples are converted between the data set or network related to the existing recognizer and the data set or network related to the specialized recognizer.
- (B) For the characteristics conversion is performed between the characteristics of the dataset related to the existing recognizer and the characteristics of the dataset for performing recognition processing with the specialized recognizer.
- (C) control a control rule for performing recognition processing with a specialized recognizer is generated.
- the first embodiment is an example in which item (A) and item (1) are combined. More specifically, in the first embodiment, frame-based image data (learning data) related to an existing recognizer is converted into subsampled or line-divided image data (learning data) corresponding to a specialized recognizer. This is an example of
- the second embodiment is an example in which item (A) and item (2) are combined. More specifically, the second embodiment is an example of converting non-frame-based image data (evaluation data) related to a recognition specialized sensor into frame-based image data (evaluation data) related to an existing recognizer. be.
- the third embodiment is an example in which item (A) and item (3) are combined. More specifically, in the third embodiment, an equivalent output can be obtained from a network of existing recognizers (for example, a network for frame-based use) and a network of specialized recognizers (network for non-frame-based use). This is an example of training a specialized recognizer.
- a network of existing recognizers for example, a network for frame-based use
- a network of specialized recognizers network for non-frame-based use
- the fourth embodiment is an example in which item (A) and item (4) are combined. More specifically, the fourth embodiment converts a network of existing recognizers (network for frame-based) into a network of specialized recognizers (network for non-frame-based). In the fourth embodiment, for example, conversion of a frame-based network to a non-frame-based network is realized by converting at least one of the layers and filters included in the network.
- the fifth embodiment is an example in which item (B) and item (1) are combined. More specifically, the fifth embodiment transforms properties of training data for existing recognizers into properties expected of a network of specialized recognizers.
- the sixth embodiment is an example of combining item (B) and item (2). More specifically, the sixth embodiment converts the characteristics of evaluation data input to a network of existing recognizers into characteristics assumed for the network.
- the seventh embodiment is an example in which item (B) and item (3) are combined. More specifically, the seventh embodiment is an example of generating a network of specialized recognizers based on a network of existing recognizers.
- the eighth embodiment is an example in which item (B) and item (4) are combined. More specifically, the eighth embodiment is an example of converting a network of existing recognizers into a network of specialized recognizers. In the eighth embodiment, by adding preprocessing to the existing recognizer or transforming at least one of the layers and filters included in the network, the network of specialized recognizers of the existing recognizer network Realize the conversion to
- the ninth embodiment is an example in which item (C) and item (1) are combined. More specifically, the ninth embodiment generates a control rule for executing recognition processing by a specialized recognizer based on learning data for an existing recognizer.
- the tenth embodiment is an example in which item (C) and item (2) are combined. More specifically, the tenth embodiment generates a control rule for executing recognition processing by a specialized recognizer based on output data of a specialized recognition sensor.
- the eleventh embodiment is an example in which item (C) and item (3) are combined. More specifically, the eleventh embodiment generates a control rule for executing recognition processing by a specialized recognizer based on the output of an existing recognizer.
- the twelfth embodiment is an example of combining item (C) and item (4). More specifically, in the twelfth embodiment, an existing recognizer is used so that the output of each processing unit of the recognizer matches or approximates between when an existing sensor is used and when a recognition-specific sensor is used.
- a specialized recognizer is generated by transforming at least one processing unit (layer, filter, etc.) of the network.
- FIG. 1 is a schematic diagram showing a configuration of an example of an information processing system commonly applicable to each embodiment.
- information processing system 1 includes recognition system 2 and learning system 3 .
- the recognition system 2 includes a sensor section 10 and a recognition section 20 .
- the sensor unit 10 includes at least an imaging device that captures an image of a subject and outputs image data.
- the recognition unit 20 performs recognition processing based on the image data output from the sensor unit 10 by a recognizer using, for example, a neural network.
- the recognizer is stored as a program, for example, in a memory (not shown) of the recognizer 20 .
- FIG. 1 shows the sensor unit 10 and the recognition unit 20 as separate blocks for the sake of explanation, this is not limited to this example.
- the recognition section 20 may be included in the sensor section 10 .
- the imaging device is capable of imaging and outputting image data on a non-frame basis, such as line division and sub-sampling.
- the recognition unit 20 is also capable of recognition processing based on non-frame-based image data.
- the sensor unit 10 and the recognition unit 20 function as a specialized recognition sensor and a specialized recognizer, respectively.
- the learning system 3 includes a configuration for learning the recognizer in the recognition unit 20 .
- the learning system 3 may include a database of datasets with training data and evaluation data for the recognizer to learn.
- the learning system 3 can also train the recognizer based on the dataset.
- the learned recognizer is transferred to the recognition system 2 via a predetermined interface, for example, and applied to the recognition section 20 .
- the learning system 3 is capable of conversion processing between different types of data sets. For example, learning system 3 may convert frame-based training data to non-frame-based data. Furthermore, the learning system 3 is capable of converting between different types of recognizers. For example, learning system 3 can transform an existing recognizer for frame-based image data into a specialized recognizer for non-frame-based image data.
- the recognition unit 20 can be an existing recognizer that performs recognition processing on a frame basis.
- the recognition system 2 can convert the non-frame-based image data output from the sensor unit 10 as a recognition specialized sensor into frame-based image data corresponding to the existing recognizer.
- the recognition system 2 and learning system 3 do not need to be connected all the time.
- the recognition system 2 and the learning system 3 are connected via a predetermined interface when a recognizer trained in the learning system 3 is transferred to the recognition system 2 .
- the learning system 3 is shown configured on a stand-alone device, but this is not limited to this example.
- the learning system 3 can be composed of an information processing device and another information processing device (for example, a server) connected to the information device via a communication network.
- the recognition system 2 and the learning system 3 can also be configured on one device.
- the information processing system 1 converts between a frame-based dataset corresponding to an existing recognizer and a non-frame-based dataset corresponding to a specialized recognizer, an existing recognizer conversion to a specialized recognizer, etc. Therefore, a wider range of utilization of recognition specialized sensors is possible.
- FIG. 2A is an example functional block diagram for explaining the functions of the recognition system 2 applicable to the embodiment.
- the recognition system 2 includes an imaging unit 11 , a conversion unit 12 , an imaging control unit 13 and a recognition unit 20 .
- the conversion unit 12, the imaging control unit 13, and the recognition unit 20 are configured by predetermined logic circuits. Not limited to this, each of these units may be configured by a processor such as an MPU (Micro Processing Unit) or a DSP (Digital Signal Processor). The configuration of the imaging unit 11 will be described later.
- MPU Micro Processing Unit
- DSP Digital Signal Processor
- the imaging unit 11 includes an imaging device that images a subject and outputs pixel signals.
- the image sensor includes a pixel array in which a plurality of pixels are arranged in a matrix and output pixel signals corresponding to incident light, and a pixel signal output from each pixel in the pixel array in accordance with an instruction from the imaging control unit 13. and a control circuit that controls reading.
- the pixel signals read out from the pixel array are converted into digital signals and output from the imaging section 11 as image data for each predetermined readout unit.
- a specific configuration example of the imaging element will be described later.
- the conversion unit 12 converts the image data output from the imaging unit 11 into image data in a format compatible with the recognition unit 20 as necessary. For example, when the recognition unit 20 is an existing recognizer and the imaging unit 11 outputs non-frame-based image data, the conversion unit 12 converts the non-frame-based image data output from the imaging unit 11 into frames. It is converted into base image data and supplied to the recognition unit 20 .
- the recognition unit 20 has, for example, a memory, and a recognizer is stored in the memory as, for example, a program.
- the recognition unit 20 performs recognition processing using the recognizer based on the image data supplied from the conversion unit 12 .
- the recognition result by the recognition unit 20 is output to the outside of the recognition system 2, for example. Further, the recognition result by the recognition unit 20 is also supplied to the imaging control unit 13 .
- the recognition unit 20 can apply either an existing recognizer that performs frame-based recognition processing or a specialized recognizer that performs non-frame-based recognition processing. Further, recognition processing in the recognition unit 20 can be controlled by a predetermined control command.
- the imaging control unit 13 generates imaging control signals for controlling the operation of the imaging unit 11 .
- the imaging control unit 13 generates imaging signals for controlling, for example, imaging by the imaging unit 11, reading of pixel signals from the pixel array, output of image data from the imaging unit 11, and the like.
- the imaging control section 13 can generate an imaging control signal according to the recognition result by the recognition section 20 .
- the imaging control unit 13 generates an imaging control signal for controlling the operation of the imaging unit 11 to either a frame-based imaging operation or a non-frame-based imaging operation in accordance with a predetermined control command. can do.
- the sensor unit 10 shown in FIG. 1 may include only the imaging unit 11 as shown as the sensor unit 10a in FIG. 2A, or may include the imaging unit 11 and the imaging control unit 13 as shown as the sensor unit 10b. may be included. Further, the sensor unit 10 may include the imaging unit 11 and the conversion unit 12 as shown as the sensor unit 10c in FIG. 2A, or may include the imaging unit 11, the conversion unit 12 and the An imaging control unit 13 may be included. Not limited to these, the sensor unit 10 may include the imaging unit 11, the conversion unit 12, the imaging control unit 13, and the recognition unit 20, as shown as the sensor unit 10e in FIG. 2A.
- these sensor units 10a to 10e are configured on the same chip as the imaging unit 11.
- FIG. 2B is an example functional block diagram for explaining the functions of the learning system 3 applicable to the embodiment.
- the learning system 3 includes a data generator 30 and a recognizer generator 31 that implement functions independent of each other.
- the data generation unit 30 includes a conversion unit 301.
- a conversion unit 301 converts existing learning data 300, which is learning data based on frame-based image data, into specialized learning data 302, which is learning data based on non-frame-based image data.
- the conversion unit 301 also converts specialized evaluation data 304, which is evaluation data based on non-frame-based image data, into existing evaluation data 303, which is evaluation data based on frame-based image data.
- the conversion unit 301 is a specialized recognizer that performs recognition processing based on non-frame-based image data based on any of the existing learning data 300, specialized learning data 302, existing evaluation data 303, and specialized evaluation data 304.
- a specialized control law 313 that is a control law for is generated.
- the recognizer generation unit 31 includes a NW (network) conversion unit 311.
- the NW conversion unit 311 generates a specialized recognizer 312 that performs recognition processing using non-frame-based image data based on the existing recognizer 310 that performs recognition processing using frame-based image data.
- the NW conversion unit 311 generates an existing recognizer 310 based on the specialized recognizer 312 .
- the NW conversion unit 311 generates a specialized control rule 313 that is a control rule for the specialized recognizer 312 based on the existing recognizer 310 or the specialized recognizer 312 .
- the conversion unit 301 and the NW conversion unit 311 are implemented by a program that runs on the CPU (Central Processing Unit) of the information processing device.
- CPU Central Processing Unit
- the existing learning data 300 and the existing recognizer 310 may be stored in advance, for example, in the storage device of the information processing device. Not limited to this, the existing learning data 300 and the existing recognizer 310 may be acquired from another information processing device (server or the like) via a communication network connected to the information processing device.
- the specialized learning data 302, the existing evaluation data 303 and the specialized control law 313 converted or generated by the conversion unit 301, and the specialized recognizer 312 and the specialized control law 313 generated by the NW conversion unit 311 are, for example, It is stored in a storage device or memory included in the information processing device. Also, the learning system 3 transfers the generated specialized recognizer 312 to the recognition system 2 via, for example, a predetermined interface. Similarly, the learning system 3 transfers the generated specialized control law 313 to the recognition system 2 via, for example, a predetermined interface.
- the conversion unit 301 and the NW conversion unit 311 can additionally apply HITL (Human-in-the-loop) using real sensors to conversion processing.
- HITL Human-in-the-loop
- FIG. 3 is a block diagram showing an example configuration of the imaging unit 11 applicable to each embodiment.
- the imaging unit 11 includes a pixel array unit 101, a vertical scanning unit 102, an AD (Analog to Digital) conversion unit 103, a pixel signal line 106, a vertical signal line VSL, a control unit 1100, a signal and a processing unit 1101 .
- AD Analog to Digital
- the pixel array unit 101 includes a plurality of pixel circuits 100 including photoelectric conversion elements, for example, photodiodes that perform photoelectric conversion according to received light, and circuits that read out charges from the photoelectric conversion elements.
- the plurality of pixel circuits 100 are arranged in a matrix in the horizontal direction (row direction) and vertical direction (column direction).
- the arrangement of the pixel circuits 100 in the row direction is called a line.
- the pixel array section 101 includes at least 1080 lines each including at least 1920 pixel circuits 100 .
- An image (image data) of one frame is formed by pixel signals read from the pixel circuits 100 included in the frame.
- the pixel signal line 106 is connected to each row and column of each pixel circuit 100, and the vertical signal line VSL is connected to each column.
- the ends of the pixel signal lines 106 that are not connected to the pixel array section 101 are connected to the vertical scanning section 102 .
- the vertical scanning unit 102 transmits control signals such as drive pulses for reading out pixel signals from pixels to the pixel array unit 101 via the pixel signal lines 106 under the control of the control unit 1100 to be described later.
- An end of the vertical signal line VSL that is not connected to the pixel array unit 101 is connected to the AD conversion unit 103 .
- a pixel signal read from the pixel is transmitted to the AD conversion unit 103 via the vertical signal line VSL.
- Pixel signals are read out from the pixel circuit 100 by transferring the charge accumulated in the photoelectric conversion element due to exposure to a floating diffusion layer (FD) and converting the transferred charge into a voltage in the floating diffusion layer. conduct. A voltage resulting from charge conversion in the floating diffusion layer is output to the vertical signal line VSL via an amplifier.
- FD floating diffusion layer
- the photoelectric conversion element and the floating diffusion layer are turned off (opened), and the photoelectric conversion element generates light according to incident light through photoelectric conversion. charge is accumulated.
- the floating diffusion layer and the vertical signal line VSL are connected according to the selection signal supplied through the pixel signal line 106 . Further, the floating diffusion layer is connected to the power supply voltage VDD or the black level voltage supply line for a short period of time in response to a reset pulse supplied through the pixel signal line 106 to reset the floating diffusion layer.
- a reset level voltage (assumed to be voltage A) of the floating diffusion layer is output to the vertical signal line VSL.
- a transfer pulse supplied through the pixel signal line 106 turns on (closes) the space between the photoelectric conversion element and the floating diffusion layer, thereby transferring the charges accumulated in the photoelectric conversion element to the floating diffusion layer.
- a voltage (referred to as voltage B) corresponding to the charge amount of the floating diffusion layer is output to the vertical signal line VSL.
- the AD conversion unit 103 includes an AD converter 107 provided for each vertical signal line VSL, a reference signal generation unit 14, and a horizontal scanning unit 15.
- the AD converter 107 is a column AD converter that performs AD conversion processing on each column of the pixel array unit 101 .
- the AD converter 107 performs AD conversion processing on pixel signals supplied from the pixels 110 via the vertical signal line VSL, and performs noise reduction on correlated double sampling (CDS) processing. generates two digital values (values corresponding to voltage A and voltage B, respectively).
- CDS correlated double sampling
- the AD converter 107 supplies the two generated digital values to the signal processing section 1101 .
- the signal processing unit 1101 performs CDS processing based on the two digital values supplied from the AD converter 107 to generate pixel signals (pixel data) as digital signals. Pixel data generated by the signal processing unit 1101 is output to the outside of the imaging unit 11 .
- the reference signal generation unit 104 Based on the control signal input from the control unit 1100, the reference signal generation unit 104 generates, as a reference signal, a ramp signal used by each AD converter 107 to convert the pixel signal into two digital values.
- a ramp signal is a signal whose level (voltage value) decreases with a constant slope with respect to time, or a signal whose level decreases stepwise.
- the reference signal generator 104 supplies the generated ramp signal to each AD converter 107 .
- the reference signal generator 104 is configured using, for example, a DAC (Digital to Analog Converter).
- the counter starts counting according to the clock signal.
- the comparator compares the voltage of the pixel signal supplied from the vertical signal line VSL with the voltage of the ramp signal, and stops counting by the counter when the voltage of the ramp signal straddles the voltage of the pixel signal.
- the AD converter 107 converts the analog pixel signal into a digital value by outputting a value corresponding to the count value of the time when the counting is stopped.
- the AD converter 107 supplies the two generated digital values to the signal processing section 1101 .
- the signal processing unit 1101 performs CDS processing based on the two digital values supplied from the AD converter 107 to generate pixel signals (pixel data) as digital signals.
- a pixel signal that is a digital signal generated by the signal processing unit 1101 is output to the outside of the imaging unit 11 .
- the horizontal scanning unit 105 selects each AD converter 107 in a predetermined order, thereby scanning each digital value temporarily held by each AD converter 107.
- the signals are sequentially output to the signal processing unit 1101 .
- the horizontal scanning unit 105 is configured using, for example, a shift register and an address decoder.
- the control unit 1100 drives and controls the vertical scanning unit 102, the AD conversion unit 103, the reference signal generation unit 104, the horizontal scanning unit 105, etc. according to the imaging control signal supplied from the imaging control unit 13.
- the control unit 1100 generates various drive signals that serve as references for the operations of the vertical scanning unit 102 , AD conversion unit 103 , reference signal generation unit 104 and horizontal scanning unit 105 .
- the control unit 1100 controls the vertical scanning unit 102 to supply signals to the pixel circuits 100 via the pixel signal lines 106 based on the vertical synchronization signal or the external trigger signal included in the imaging control signal and the horizontal synchronization signal. Generate control signals.
- the control unit 1100 supplies the generated control signal to the vertical scanning unit 102 .
- control unit 1100 passes information indicating the analog gain included in the imaging control signal supplied from the imaging control unit 13 to the AD conversion unit 103, for example.
- the AD converter 103 controls the gain of the pixel signal input to each AD converter 107 included in the AD converter 103 via the vertical signal line VSL according to the information indicating the analog gain.
- the vertical scanning unit 102 Based on control signals supplied from the control unit 1100, the vertical scanning unit 102 applies various signals including drive pulses to the pixel signal lines 106 of the selected pixel rows of the pixel array unit 101 to the pixel circuits 100 line by line. Then, each pixel circuit 100 outputs a pixel signal to the vertical signal line VSL.
- the vertical scanning unit 102 is configured using, for example, shift registers and address decoders. Also, the vertical scanning unit 102 controls exposure in each pixel circuit 100 according to information indicating exposure supplied from the control unit 1100 .
- the control unit 1100 controls the vertical scanning unit 102 and the horizontal scanning unit 105 based on the imaging control signal supplied from the imaging control unit 13, thereby performing the readout operation by each pixel circuit 100 included in the pixel array unit 101, Operations by the AD converter 107 can be controlled. Thereby, the imaging unit 11 can output non-frame-based image data such as line division and sub-sampling.
- the imaging unit 11 configured in this way is a column AD type CMOS (Complementary Metal Oxide Semiconductor) image sensor in which AD converters 107 are arranged for each column.
- CMOS Complementary Metal Oxide Semiconductor
- the recognition system 2 can be formed on one substrate.
- the recognition system 2 may be a stacked CIS (CMOS Image Sensor) integrally formed by stacking a plurality of semiconductor chips.
- CMOS Image Sensor CMOS Image Sensor
- the sensor unit 10 in the recognition system 2 is the sensor unit 10e including the imaging unit 11, the conversion unit 12, the imaging control unit 13, and the recognition unit 20 shown in FIG. 2A. conduct.
- the recognition system 2 can be formed with a two-layer structure in which semiconductor chips are stacked in two layers.
- FIG. 4A is a diagram showing an example in which the recognition system 2 according to each embodiment is formed by a laminated CIS having a two-layer structure.
- the stacked CIS has the pixel section 2010 formed in the semiconductor chip of the first layer and the memory+logic section 2020 formed in the semiconductor chip of the second layer.
- a pixel unit 2010 includes at least the pixel array unit 101 in the imaging unit 11 .
- the memory+logic unit 2020 includes, for example, the conversion unit 12, the imaging control unit 13, the recognition unit 20, and an interface (not shown) for communicating between the recognition system 2 and the outside.
- the memory+logic unit 2020 further includes part or all of the driving circuit that drives the pixel array unit 101 in the imaging unit 11 .
- the memory+logic unit 2020 further includes a memory used by the conversion unit 12 and the recognition unit 20 to process image data, and a memory for storing a recognizer used by the recognition unit 20. can be done.
- the recognition system 2 is configured as one solid-state imaging device 2000a by bonding the semiconductor chips of the first layer and the semiconductor chips of the second layer in electrical contact with each other. .
- the recognition system 2 can be formed with a three-layer structure in which semiconductor chips are stacked in three layers.
- FIG. 4B is a diagram showing an example in which the recognition system 2 according to each embodiment is formed by a stacked CIS having a three-layer structure.
- the pixel section 2010 is formed in the semiconductor chip of the first layer
- the memory section 2021 is formed in the semiconductor chip of the second layer
- the logic section 2022 is formed in the semiconductor chip of the third layer.
- the logic unit 2022 includes, for example, the conversion unit 12, the imaging control unit 13, the recognition unit 20, and an interface for communicating between the recognition system 2 and the outside.
- the memory unit 2021 can further include, for example, a memory used by the conversion unit 12 and the recognition unit 20 to process image data, and a memory for storing recognizers used by the recognition unit 20 .
- the recognition system 2 is integrated into one by bonding the semiconductor chips of the first layer, the semiconductor chips of the second layer, and the semiconductor chips of the third layer while keeping them in electrical contact. It is configured as one solid-state imaging device 2000b.
- FIG. 5 is a block diagram showing an example configuration of an information processing device 3100 for realizing the learning system 3 applicable to the embodiment.
- an information processing device 3100 includes a CPU 3000, a ROM (Read Only Memory) 3001, a RAM (Random Access Memory) 3002, a display control unit 3003, and a storage device, which are communicably connected to each other via a bus 3010. 3004 , an input device 3005 , a data I/F (interface) 3006 and a communication I/F 3007 .
- the storage device 3004 is a storage medium that can store data in a nonvolatile manner, such as a hard disk drive or flash memory.
- the CPU 3000 operates according to programs stored in the storage device 3004 and the ROM 3001 using the RAM 3002 as a work memory, and controls the overall operation of the information processing device 3100 .
- the display control unit 3003 generates a display signal that can be displayed by the display 3020 based on the display control signal generated by the CPU 3000 according to the program.
- Display 3020 displays a screen according to a display signal supplied from display control section 3003 .
- the input device 3005 receives user operations, and includes a pointing device such as a mouse and a keyboard.
- the data I/F 3006 is an interface for the information processing apparatus 3100 to input/output data with an external device, and USB (Universal Serial Bus) or Bluetooth (registered trademark), for example, can be applied.
- a communication I/F 3007 controls communication via a communication network such as a LAN (Local Area Network) or the Internet.
- the CPU 3000 executes the information processing program for realizing the learning system 3 according to the embodiment, thereby storing the conversion unit 301 and the NW conversion unit 311 in the main storage area of the RAM 3002. For example, they are configured as modules.
- the information processing program can be acquired from the outside via a communication network, for example, by communication via the communication I/F 3007 and installed on the information processing apparatus 3100 .
- the information processing program may be stored in a removable storage medium such as a CD (Compact Disk), a DVD (Digital Versatile Disk), or a USB (Universal Serial Bus) memory and provided.
- the data generation unit 30 and the recognizer generation unit 31 included in the learning system 3 are configured on the same information processing device 3100, but this is not limited to this example.
- the data generation unit 30 and the recognizer generation unit 31 may be configured on a separate information processing device 3100, or only one of the data generation unit 30 and the recognizer generation unit 31 may be installed in the information processing device 3100. may be configured.
- DNN Deep Neural Network
- RNN Recurrent Neural Network
- FIG. 6 is a diagram for schematically explaining image recognition processing by CNN.
- a predetermined learned CNN 52 performs processing on pixel information 51 of the entire image 50 in which the number "8", which is an object to be recognized, is drawn. As a result, the number “8” is recognized as the recognition result 53 .
- FIG. 7 is a diagram for schematically explaining image recognition processing for obtaining a recognition result from a part of the image to be recognized.
- an image 50' is obtained by partially acquiring the number "8", which is the object to be recognized, line by line.
- pixel information 54a, 54b and 54c for each line forming pixel information 51' of this image 50' is sequentially processed by a CNN 52' which has been learned in a predetermined manner.
- a valid recognition result means, for example, a recognition result whose score indicating the degree of reliability of the recognized result is equal to or higher than a predetermined value.
- the CNN 52' updates the internal state 55 based on this recognition result 53a.
- the CNN 52' whose internal state has been updated 55 based on the previous recognition result 53a, performs recognition processing on the pixel information 54b of the second line.
- a recognition result 53b indicating that the number to be recognized is either "8" or "9" is obtained.
- the internal information of the CNN 52' is updated 55.
- recognition processing is performed on the pixel information 54c of the third line by the CNN 52' whose internal state has been updated 55 based on the previous recognition result 53b.
- the number to be recognized is narrowed down to "8" out of "8" and "9".
- the internal state of the CNN is updated using the result of the previous recognition processing.
- Recognition processing is performed using the pixel information of the line to be read. That is, the recognition processing shown in FIG. 7 is executed line by line with respect to the image while updating the internal state of the CNN based on the previous recognition result. Therefore, the recognition process shown in FIG. 7 is a process that is recursively executed line by line, and can be considered to have a structure corresponding to RNN.
- FIGS. 8A and 8B are diagrams schematically showing examples of identification processing (recognition processing) by DNN when time-series information is not used.
- identification processing recognition processing
- FIG. 8A one image is input to the DNN, as shown in FIG. 8A.
- identification processing is performed on the input image, and the identification result is output.
- FIG. 8B is a diagram for explaining the processing of FIG. 8A in more detail.
- the DNN performs feature extraction processing and identification processing.
- feature amounts are extracted from the input image by feature extraction processing.
- identification processing is performed on the extracted feature quantity to obtain identification results.
- FIGS. 9A and 9B are diagrams schematically showing a first example of identification processing by DNN when using chronological information.
- identification processing by DNN is performed using a fixed number of past information on time series.
- the image at time T (T), the image at time T-1 before time T (T-1), and the image at time T-2 before time T-1 (T-2 ) and are input to the DNN (in the case of N 2 in the figure).
- classification processing is performed on each of the input images (T), (T-1) and (T-2), and a classification result (T) at time T is obtained.
- FIG. 9B is a diagram for explaining the processing of FIG. 9A in more detail.
- each of the input images (T), (T-1) and (T-2) is subjected to the feature extraction process described above with reference to FIG. 1 to extract features corresponding to images (T), (T-1) and (T-2), respectively.
- each feature amount obtained based on these images (T), (T-1) and (T-2) is integrated, identification processing is performed on the integrated feature amount, and identification at time T is performed. Obtain the result (T).
- the method of FIGS. 9A and 9B requires a plurality of configurations for extracting the feature quantity, and requires a configuration for extracting the feature quantity depending on the number of available past images. configuration may become large.
- FIG. 10A and 10B are diagrams schematically showing a second example of identification processing by DNN when using time-series information.
- an image (T) at time T is input to the DNN whose internal state has been updated to the state at time T-1, and the identification result (T) at time T is obtained.
- FIG. 10B is a diagram for explaining the processing of FIG. 10A in more detail.
- the feature extraction processing described with reference to FIG. Extract in the DNN, the internal state is updated with an image before time T, and the feature amount related to the updated internal state is stored.
- the feature amount related to the stored internal information and the feature amount in the image (T) are integrated, and identification processing is performed on the integrated feature amount.
- the identification processing shown in FIGS. 10A and 10B is performed using, for example, a DNN whose internal state has been updated using the previous identification result, and is a recursive process.
- a DNN that performs recursive processing in this way is called an RNN.
- Identification processing by RNN is generally used for moving image recognition, etc. For example, by sequentially updating the internal state of DNN with frame images updated in time series, it is possible to improve identification accuracy. .
- FIG. 11 is a schematic diagram for schematically explaining recognition processing applicable to each embodiment of the present disclosure.
- the imaging unit 11 in step S1, the imaging unit 11 (see FIG. 2A) starts imaging a target image to be recognized.
- the target image is, for example, an image in which the number "8" is drawn by handwriting.
- the recognition unit 20 stores in advance a learning model, which has been trained so as to be able to identify numbers using predetermined teacher data, in the memory of the recognition unit 20 as a program. , and is capable of identifying the numbers contained in the image.
- the imaging unit 11 performs imaging by a rolling shutter method. Note that even when the imaging unit 11 performs imaging by the global shutter method, the following processing can be applied in the same manner as in the case of the rolling shutter method.
- step S2 the imaging unit 11 sequentially reads the frames line by line from the upper end side to the lower end side of the frame.
- the recognizing unit 20 identifies the number “8" or “9” from the image of the read line (step S3).
- the numbers “8” and “9” include a characteristic portion common to the upper half portion, so when the lines are read in order from the top and the characteristic portion is recognized, the recognized object is the number “8". ” and “9”.
- step S4a the whole picture of the recognized object appears by reading up to the bottom line or the line near the bottom of the frame, and in step S2, it is displayed as either the number "8" or "9". It is determined that the identified object is the number "8".
- the processing in step S4a is processing by an existing recognizer (existing recognizer) that performs recognition processing on a frame basis, for example.
- steps S4b and S4c are processes related to the present disclosure. That is, the processing in steps S4b and S4c is processing by a recognizer (specialized recognizer) that performs recognition processing on a non-frame basis, for example.
- a recognizer specialized recognizer
- step S4b the line is further read from the line position read out in step S3, and the recognized object is identified as the number "8" even in the middle of reaching the lower end of the number "8".
- the lower half of the number "8" and the lower half of the number "9" have different characteristics.
- step S4c by further reading in the state of step S3 from the line position of step S3, it is possible to determine whether the object identified in step S3 is the number "8" or "9". It is also conceivable to jump to a line position that is likely to be distinguished. By reading this jump destination line, it is possible to determine whether the object identified in step S3 is the number "8" or "9". Note that the jump destination line position can be determined based on a learning model that has been learned in advance based on predetermined teacher data.
- the imaging unit 11 can terminate the recognition process. As a result, it is possible to reduce the time required for recognition processing in the imaging unit 11 and save power.
- the recognizer is trained using a data set that holds a plurality of combinations of input signals and output signals for each readout unit.
- data for each readout unit (line data, sub-sampled data, etc.) is applied as the input signal, and data indicating the "correct number" is applied as the output signal.
- data for each readout unit (line data, subsampled data, etc.) is applied as the input signal, and the object class (human/vehicle/non-object) or Object coordinates (x, y, h, w), etc. can be applied.
- self-supervised learning may be used to generate an output signal only from an input signal.
- the frame-based image data associated with the existing recognizer is converted into non-frame-based image data by sub-sampling or line division corresponding to the specialized recognizer.
- the user can use an existing recognizer that performs frame-based recognition processing and learning data based on existing frame-based image data corresponding to the existing recognizer.
- this user uses a specialized recognizer that performs recognition processing based on image data obtained by dividing or sub-sampling frame-based image data into lines.
- the user needs to prepare learning data corresponding to the specialized recognizer, which is line-divided or sub-sampled non-frame-based specialized image data, in order to train the specialized recognizer.
- the first embodiment provides a method for easily generating learning data based on line-divided or sub-sampled specialized image data from learning data based on existing image data.
- sub-sampling when sub-sampling is defined as, for example, extracting a predetermined number of pixels from one frame that is less than the total number of pixels in the one frame, line division is also a concept included in sub-sampling. It can be said.
- frame-based image data related to existing recognizers may be referred to as "existing image data”
- non-frame-based image data corresponding to specialized recognizers may be referred to as "specialized image data”.
- a first example of the first embodiment is an example of converting existing image data into specialized image data by line division.
- the processing according to each example of the first embodiment corresponds to the processing of converting the existing learning data 300 into the specialized learning data 302 by the conversion unit 301 in the data generation unit 30 of the learning system 3 shown in FIG. 2B. .
- FIG. 12 is a functional block diagram of an example for explaining the functions of the conversion unit 301a in the learning system 3 according to the first example of the first embodiment.
- the conversion unit 301a includes a frame data division unit 320a.
- the frame data division unit 320a divides the existing learning data 300 based on the existing image data into lines to generate specialized learning data 302 as specialized image data.
- the frame data dividing unit 320a divides the generated specialized learning data 302 into information indicating the existing image data on which the specialized learning data 302 is based and information indicating the line corresponding to the specialized learning data 302. can be associated with
- FIG. 13A is a schematic diagram showing a first example of generation of specialized learning data 302 applicable to the first example of the first embodiment.
- a specialized recognizer to which specialized image data based on existing image data is to be applied performs recognition processing based on specialized image data obtained by dividing one frame of image data into line units.
- the existing learning data 300 based on the existing image data corresponding to the recognition process using the existing sensor has one frame of lines L#1 and L#, as schematically shown on the left side of the figure. 2, L#3, . . .
- the arrows indicating time correspond to the passage of time when the specialized image data is read line by line from the recognition specialized sensor in the processing by the specialized recognizer.
- the frame data division unit 320a divides the existing learning data 300 into lines L#1, L#2, L#3, .
- each piece of specialized image data is generated by the frame data division unit 320a is not limited to the order shown in the drawing.
- FIG. 13B is a schematic diagram showing a second example of generation of specialized learning data 302 applicable to the first example of the first embodiment.
- a specialized recognizer to which specialized image data based on existing image data is to be applied performs recognition processing based on specialized image data obtained by dividing one frame of image data into units of a plurality of adjacent lines. We are going to do it.
- the existing learning data 300 based on the existing image data in the figure is assumed to be the same as in FIG. 13A.
- the arrows indicating time in the figure indicate the time when the specialized image data is read line by line from the recognition specialized sensor in the processing by the specialized recognizer to which the specialized image data based on the existing image data is applied. corresponds to the time course of
- the frame data division unit 320a divides the existing learning data 300 into line groups Ls#1, Ls#2, Ls#3, .
- Each of the specialized learning data 302Ls#1, 302Ls#2, 302Ls#3, . may Each of the specialized learning data 302Ls#1, 302Ls#2, 302Ls#3, . good too.
- each piece of specialized image data is generated by the frame data division unit 320a is not limited to the order shown in the drawing.
- FIG. 13C is a schematic diagram showing a third example of generation of specialized learning data 302 applicable to the first example of the first embodiment.
- a specialized recognizer to which specialized image data based on existing image data is to be applied converts one frame of image data into lines L#1, L#2, L#3, . . . Recognition processing is performed based on the specialized image data divided into partial units.
- the existing learning data 300 based on the existing image data is the same as in FIG. 13A.
- the arrows indicating time in the figure correspond to the passage of time when the specialized image data is read line by line from the recognition specialized sensor in the processing by the specialized recognizer.
- the frame data dividing unit 320a divides the existing learning data 300 into partial lines Lp#1, Lp#2, Lp#2, Lp#2, Lp#2, Lp#2, Lp#2, Lp#2, Lp#2, Lp#2, Lp#2, Lp#2, Lp#2, Lp#2, Lp#2, Lp#2, Lp#2, Lp#2, Lp#2, Lp#2, Lp#2, Lp#2, Lp#2, Lp#2, Lp#2, Lp#2, Lp#2, Lp#2, Lp#2, Lp#2, Lp#2, Lp#2, Lp#2, Lp#2, Lp#2, Lp#2, Lp#2, Lp#2, Lp#2, Lp#2, Lp#2, Lp#2, Lp#2, Lp#2, Lp#2, Lp#2, Lp#2, Lp#2, Lp#2, Lp#2, Lp#2, Lp#2, Lp#2, Lp#2, Lp#2, . are divided into partial lines Lp#1, Lp#2,
- each piece of specialized image data is generated by the frame data division unit 320a is not limited to the order shown in the drawing.
- FIG. 13D is a schematic diagram showing a fourth example of generation of specialized learning data 302 applicable to the first example of the first embodiment.
- a specialized recognizer to which specialized image data based on existing image data is to be applied converts one frame of image data into lines L#1, L#2, L#3, . . . , line by line, the recognition processing is performed based on the specialized image data divided at predetermined intervals.
- existing learning data 300 based on existing image data includes n lines (n is an even number) of lines L#1, L#2, L#3, . shall be taken. Also, the arrows indicating time in section (b) correspond to the passage of time when the specialized image data is read line by line from the recognition specialized sensor in the processing by the specialized recognizer.
- the frame data division unit 320a divides the existing learning data 300 into each line of odd line numbers and each line of even line numbers in one frame. , two lines separated by a distance of 1/2 of the number of lines in the frame of the existing learning data 300 are set. The frame data division unit 320a sequentially divides each line of the set of odd line numbers, and then sequentially divides each line of the set of even line numbers. .
- the existing image data as the existing learning data 300 is obtained from the existing sensor from the lines L#1, L#2, . 1, L#3, L#(1+n/2), L#(3+n/2), . 3+n/2), . . .
- the frame data division unit 320a divides the existing learning data 300 into lines L#1, L#2, . . . , L#n.
- the frame data dividing unit 320a divides each specialized learning data as specialized image data by each line of odd line numbers L#1, L#(1+n/2), L#3, L#(3+n/2), . 302L#1, 302L#(1+n/2), 302L#3, 302L#(3+n/2), . . . are generated.
- the frame data division unit 320a divides each specialized image data into specialized image data by each line of even line number L#2, L#(2+n/2), L#4, L#(4+n/2), . . . Learning data 302L#2, 302L#(2+n/2), 302L#4, 302L#(4+n/2), . . . are generated.
- each piece of specialized image data is generated by the frame data division unit 320a is not limited to the order shown in the figure.
- FIG. 13E is a schematic diagram showing a fifth example of generation of specialized learning data 302 applicable to the first example of the first embodiment.
- a specialized recognizer to which specialized image data based on existing image data is to be applied converts one frame of image data into lines L#1, L#2, L#3, . . .
- Recognition processing is performed based on specialized image data including two lines divided at predetermined intervals.
- existing learning data 300 based on existing image data includes n lines (n is an even number) of lines L#1, L#2, L#3, . shall be taken. Also, the arrows indicating time in section (b) correspond to the passage of time when the specialized image data is read line by line from the recognition specialized sensor in the processing by the specialized recognizer.
- frame data dividing section 320a divides existing learning data 300 into lines with odd line numbers and lines with even line numbers in one frame. Two lines separated by half the number of lines in the frame of data 300 are paired.
- the frame data dividing unit 320a divides the existing learning data 300 for each set.
- the existing image data as the existing learning data 300 is obtained from the existing sensor from the lines L#1, L#2, . 1 and line L#(1+n/2), line L#3 and line L#(3+n/2), . . .
- the existing image data is obtained from the existing sensor as follows: . . . are read out in order.
- the frame data dividing unit 320a divides the existing learning data 300 into a set of odd-numbered line L#1 and line L#(1+n/2), line L#3 and line L#(3+n/2). ), . .
- the frame data division unit 320a divides the odd line number line L#1 and line L#(1+n/2) into a pair, the line L#3 and line L#(3+n/2) into a pair, . , a set of line L#2 and line L#(2+n/2), a set of line L#4 and line L#(4+n/2), . . . , as specialized image data.
- Learning data 302Lpr#1, 302Lpr#2, 302Lpr#3, 302Lpr#4, . . . are generated.
- Each of the specialized learning data 302Lpr#1, 302Lpr#2, 302Lpr#3, 302Lpr#4, . may Each of the specialized learning data 302Lpr#1, 302Lpr#2, 302Lpr#3, 302Lpr#4, . There may be.
- each piece of specialized image data is generated by the frame data division unit 320a is not limited to the order shown in the figure.
- the existing learning data 300 based on the existing image data is divided based on each line L#1, L#2, L#3, .
- Each specialized learning data 302 is generated from the specialized image data. Therefore, for example, a user who holds existing learning data 300 corresponding to an existing sensor, even when using the recognition system 2 including the sensor unit 10 as a recognition specialized sensor corresponding to line division, newly specializes the data. There is no need to prepare each specialized learning data 302 based on image data.
- a second example of the first embodiment is an example of converting existing image data into specialized image data by sub-sampling.
- FIG. 14 is a functional block diagram of an example for explaining the functions of the conversion unit 301b in the learning system 3 according to the second example of the first embodiment.
- the conversion section 301b includes a frame data division section 320b.
- the frame data division unit 320b performs sub-sampling on the existing learning data 300 based on the existing image data to generate specialized learning data 302 as specialized image data.
- the frame data dividing unit 320b divides the generated specialized learning data 302 into information indicating the existing image data on which the specialized learning data 302 is based and information indicating pixels corresponding to the specialized learning data 302. can be associated with
- FIG. 15A is a schematic diagram showing a first example of generation of specialized learning data 302 applicable to the second example of the first embodiment.
- Existing learning data 300 based on existing image data corresponding to recognition processing using an existing sensor, as schematically shown in section (a) of FIG. It is arranged and configured by
- the specialized recognizer is composed of a plurality of pixels px arranged discretely and periodically in the line direction and the vertical direction, respectively, for one frame of image data. Recognition processing is performed based on specialized image data sub-sampled in units of pattern P ⁇ #xy. More specifically, in the first example, the specialized recognizer transforms the pattern P ⁇ #xy from the specialized recognition sensor into specialized image data subsampled while shifting the position of the pattern P ⁇ #xy by one pixel in the line direction. Recognition processing is performed.
- the operation of shifting the pattern P ⁇ #xy by one pixel is the operation of shifting the phase of the pattern P ⁇ #xy.
- the specialized sensor reads each pattern P#xy while shifting the pattern P ⁇ #xy in the line direction by phases ⁇ .
- the pattern P ⁇ #xy is moved in the vertical direction, for example, by shifting the phase ⁇ ′ in the vertical direction with respect to the position of the first pattern P ⁇ #1-y in the line direction.
- the frame data division unit 320b performs sub-sampling on the existing learning data 300 in units of the aforementioned pattern P ⁇ #xy.
- the pattern P ⁇ #xy has three pixels arranged at predetermined intervals in the line direction, and the positions of the three pixels in the line direction are associated with each other. It consists of three pixels that are arranged at predetermined intervals in the vertical direction, and six pixels that are arranged periodically.
- the frame data division unit 320b performs sub-sampling for each pattern P ⁇ #1-1, P ⁇ #2-1, . . . , P ⁇ #1-2, .
- the frame data division unit 320b divides the specialized learning data 302P ⁇ #1-1, 302P ⁇ as specialized image data according to the patterns P ⁇ #1-1, P ⁇ #2-1, . . . , P ⁇ #1-2, . #2-1, . . . , 302P ⁇ #1-2, .
- each piece of specialized image data is generated by the frame data division unit 320b is not limited to the order shown in the figure.
- FIG. 15B is a schematic diagram showing a second example of generating specialized learning data 302 applicable to the second example of the first embodiment.
- Existing learning data 300 based on existing image data corresponding to recognition processing using an existing sensor, as schematically shown in section (a) of FIG. It is arranged and configured by
- the specialized recognizer sets the pattern P ⁇ #z to be the same as the pattern P ⁇ #xy in the above-described first example for one frame of image data, and the pattern P ⁇ #z Recognition processing is performed based on specialized image data obtained by discretely designating positions in an image of one frame and performing sub-sampling.
- the specialization recognizer starts at the upper left corner of the image of one frame, and finds a subsampled specialization with the pattern P ⁇ #1 located at the upper left corner. Recognition processing is performed based on the image data. Next, recognition processing is performed based on the sub-sampled special image data of the pattern P ⁇ #2, which is shifted by half the distance between the pixels in the pattern P ⁇ #1 in the line direction and the vertical direction. Next, recognition processing is performed based on the sub-sampled specialized image data of the pattern P ⁇ #3, which is shifted from the position of the pattern P ⁇ #1 by 1/2 of the interval in the line direction.
- next recognition processing is performed based on the sub-sampled special image data of the pattern P ⁇ #4 which is shifted from the position of the pattern P ⁇ #1 by 1/2 of the vertical interval.
- the specialized recognizer repeats sub-sampling and recognition processing for these patterns P ⁇ #1 to P ⁇ #4 while shifting the position of the pattern P ⁇ #1, for example, by one pixel in the line direction. Execute repeatedly while shifting.
- the frame data division unit 320b sub-samples the existing learning data 300 for each of the patterns P ⁇ #1, P ⁇ #2, P ⁇ #3, P ⁇ #4, . I do.
- the frame data dividing unit 320b divides the specialized learning data 302P ⁇ #1, 302P ⁇ #2, 302P ⁇ #3, 302P ⁇ #3, 302P ⁇ #3, 302P ⁇ #3, 302P ⁇ #4, . . . are respectively generated.
- Each specialized learning data 302P ⁇ #1, 302P ⁇ #2, 302P ⁇ #3, 302P ⁇ #4, . good. Not limited to this, each of the specialized learning data 302P ⁇ #1, 302P ⁇ #2, 302P ⁇ #3, 302P ⁇ #4, . may
- each piece of specialized image data is generated by the frame data division unit 320b is not limited to the order shown in the figure.
- FIG. 15C is a schematic diagram showing a third example of generation of specialized learning data 302 applicable to the second example of the first embodiment.
- Existing learning data 300 based on existing image data corresponding to recognition processing using an existing sensor, as schematically shown in section (a) of FIG. It is arranged and configured by
- the specialized recognizer uses a plurality of consecutively adjacent lines in the line direction and in the vertical direction for one frame of image data. Recognition processing is performed based on specialized image data obtained by sub-sampling in units of areas Ar#xy of a predetermined size containing pixels of . As a more specific example, in the third example, the specialized recognizer sequentially sub-samples the area Ar#xy from the recognition specialized sensor in the line direction, and further performs sequential sub-sampling in the line direction. Recognition processing is performed based on each specialized image data for which sampling is sequentially repeated in the vertical direction.
- Frame data division section 320b divides existing learning data 300 into areas Ar#1-1, Ar#2-1, . 2, . . . , sub-sampling is performed.
- the frame data dividing unit 320b divides each specialized learning data 302Ar as specialized image data by each area Ar#1-1, Ar#2-1, . . . , Ar#1-2, Ar#2-2, . #1-1, 302Ar#2-1, . . . , 302Ar#1-2, 302Ar#2-2, .
- FIG. 15D is a schematic diagram showing a fourth example of generation of specialized learning data 302 applicable to the second example of the first embodiment.
- Existing learning data 300 based on existing image data corresponding to recognition processing using an existing sensor, as schematically shown in section (a) of FIG. It is arranged and configured by
- the specialized recognizer recognizes one frame of image data in units of area Ar#xy described with reference to FIG. 15C. , recognition processing is performed based on specialized image data obtained by discretely designating the position of the area Ar#xy in one frame image and performing sub-sampling.
- the specialized recognizer first performs subsampling and recognition processing in the upper left corner area Ar#1-1 of one frame.
- the frame data division unit 320b performs sampling and recognition processing in the area Ar#3-1, which includes the same line as the area Ar#1-1 and is located in the center in the line direction. conduct.
- the specialized recognizer then performs sampling and recognition processing on the area Ar#1-3 in the upper left corner of the bottom half of the frame, including the same lines as the area Ar#1-3. , and the sampling and recognition processing are performed in the area Ar#3-3 located at the center in the line direction.
- Areas Ar#2-2 and Ar#4-2, and areas Ar#2-4 and Ar#4-4 also perform sampling and recognition processing in the same manner.
- Frame data division section 320b divides existing learning data 300 into areas Ar#1-1, Ar#3-1, . 3, . . . , sub-sampling is performed.
- the frame data dividing unit 320b divides each specialized learning data 302Ar as specialized image data by each area Ar#1-1, Ar#3-1, . . . , Ar#1-3, Ar#3-3, . #1-1, 302Ar#3-1, . . . , 302Ar#1-3, 302Ar#3-3, .
- the frame data division unit 320b similarly samples areas Ar#2-2, Ar#4-2, . . . , Ar#2-4, Ar#4-4, .
- FIG. 15E is a schematic diagram showing a fifth example of generating specialized learning data 302 applicable to the second example of the first embodiment.
- Existing learning data 300 based on existing image data corresponding to recognition processing using an existing sensor, as schematically shown in section (a) of FIG. It is arranged and configured by
- the specialized recognizer applies a pattern Pt#x- Recognition processing is performed based on specialized image data sub-sampled in y units.
- the pattern Pt#xy can be a pattern in which pixels are arranged according to, for example, the shape of an assumed or separately recognized recognition object.
- the specialized recognizer sequentially samples the pattern Pt#xy from the specialized recognition sensor by shifting one pixel in the line direction. Recognition processing is performed based on each specialized image data obtained by sequentially repeating the sequential sub-sampling in the vertical direction.
- the frame data division unit 320b divides the pattern Pt#xy into patterns Pt#1-1, Pt#2-1, . , sub-sampling is performed for each.
- the frame data dividing unit 320b divides each specialized learning data 302Pt#1 as specialized image data by each pattern Pt#1-1, Pt#2-1, . -1, 302Pt#2-1, . . . , 302Pt#1-2, 302Pt#2-2, .
- each piece of specialized image data is generated by the frame data division unit 320b is not limited to the order shown in the figure.
- FIG. 15F is a schematic diagram showing a sixth example of generation of specialized learning data 302 applicable to the second example of the first embodiment.
- Existing learning data 300 based on existing image data corresponding to recognition processing using an existing sensor, as schematically shown in section (a) of FIG. It is arranged and configured by
- the specialized recognizer applies a pattern of discrete and aperiodically arranged pixels to one frame of image data. Recognition processing is performed based on specialized image data sub-sampled in units of Rd#m_x. As an example, the specialized recognizers are arranged discretely and aperiodically within the frame (s/D), where s is the total number of pixels contained in one frame, and D is the number of divisions of the frame period. pixels are selected to form a pattern Rd#m_1.
- the specialized recognizer detects all the frames included in the frame (m) in the first period in which the frame period of the frame (m) read from the recognition specialized sensor is divided. A predetermined number of pixels are selected from the pixels based on a pseudo-random number to determine a pattern Rd#m_1 as a sub-sampling unit. In the next period, the specialized recognizer selects a predetermined number of pixels based on a pseudo-random number from all pixels included in the frame (m) excluding the pixels selected by the pattern Rd#m_1, for example. pattern Rd#m_2 is determined. Alternatively, the specialized recognizer may again select a predetermined number of pixels from all pixels included in frame (m) based on pseudo-random numbers to determine the next pattern Rd#m_2.
- the frame data division unit 320b performs sub-sampling for each pattern Rd#m_1, Rd#m_2, . . . , Rd#m_n, Rd#(m+1)_1, . , Rd#m_n, Rd#(m+1)_1, . , 302Rd#m_n, 302Rd#(m+1)_1, . . . , respectively.
- 302Rd#m_n, 302Rd#(m+1)_1, . may be data including. , 302Rd#m_n, 302Rd#(m+1)_1, . It may be frame data that has been processed.
- each piece of specialized image data is generated by the frame data division unit 320b is not limited to the order shown in the drawing.
- the existing learning data 300 based on existing image data is sub-sampled based on each pixel, and each specialized learning data 302 based on specialized image data obtained by sub-sampling is obtained. are generating. Therefore, for example, a user who holds existing learning data 300 corresponding to an existing sensor, even when using the recognition system 2 including the sensor unit 10 as a recognition specialized sensor corresponding to sub-samples, newly specializes data. There is no need to prepare each specialized learning data 302 based on image data.
- FIG. 16A is a functional block diagram of an example for explaining functions of the conversion unit 301c in the learning system 3 according to the third example of the first embodiment.
- the conversion unit 301c includes an interpolated image generation unit 321a and a frame data division unit 320.
- the conversion unit 301c includes an interpolated image generation unit 321a and a frame data division unit 320.
- Existing learning data 300a and 300b at different times based on existing image data are input to the conversion unit 301c.
- the existing learning data 300b can be existing image data captured one frame to several frames after the existing learning data 300a.
- the interval between existing learning data 300a and 300b may be even longer.
- the interpolated image generation unit 321a Based on these existing learning data 300a and 300b, the interpolated image generation unit 321a generates an interpolated image whose time is different from those of the existing learning data 300a and 300b.
- the interpolated image generation unit 321a generates interpolated images at times between the existing learning data 300a and 300b based on the existing learning data 300a and 300b. Not limited to this, the interpolated image generation unit 321a can also generate an interpolated image temporally later than the existing learning data 300b or an interpolated image temporally earlier than the existing learning data 300a by complementing processing. .
- the frame data dividing unit 320 performs line division or sub-sampling on the existing learning data 300a and 300b and the interpolated image generated by the interpolated image generating unit 321a, and performs specialized learning data 302 based on specialized image data. to generate For generation of the specialized learning data 302 by the frame data division unit 320, for example, the method described in the first and second examples of the first embodiment can be applied.
- FIG. 16B is a schematic diagram for more specifically explaining the generation of specialized learning data 302 according to the third example of the first embodiment.
- Section (a) of FIG. 16B shows an example in which the interpolated image generator 321a generates interpolated images at times between the existing learning data 300a and 300b based on the existing learning data 300a and 300b.
- the interpolated image generation unit 321a generates interpolated images 61 1 , 61 2 , 61 2 , 61 2 , 61 2 , 61 2 , 61 2 , 61 2 , 61 2 , 61 2 , 61 2 , 61 , 61 , 61 , 61 , 61 , 61 , 61 , 61 , 61 , 61 , 61 , 61 , 61 , 61 , 61 , 61 , 61 , 61 , 61 , 61 , 61 , 61 , 61 , 61 , 61 , 61 , 61 , 61 , 61 , 61 , 61 , 61
- the interpolated image generator 321a can use a known method such as motion interpolation to generate the interpolated images 61 1 , 61 2 , and 61 3 .
- the interpolation image generation unit 321a may predict and generate the interpolation images 61 1 , 61 2 , and 61 3 using a model learned by machine learning or the like.
- the interpolated image generator 321 a passes the existing learning data 300 a and 300 b and the generated interpolated images 61 1 , 61 2 and 61 3 to the frame data divider 320 .
- the frame data division unit 320 performs line division or sub-sampling on the existing learning data 300a and 300b passed from the interpolation image generation unit 321a and the interpolation images 61 1 , 61 2 and 61 3 .
- the frame data division unit 320 performs line division, and from the existing learning data 300a and 300b and the interpolated images 61 1 , 61 2 and 61 3 , lines 62 1 to 62 5 arranged in time series. to extract Based on these lines 62 1 to 62 5 , the frame data dividing section 320 generates five pieces of specialized learning data 302 (not shown) arranged in time series.
- Section (b) of FIG. 16B shows an example in which the interpolated image generator 321a generates an interpolated image at a time between the existing learning data 300a and 300b and generates an interpolated image temporally later than the existing learning data 300b. showing.
- the interpolated image generation unit 321a generates interpolated images 63 1 and 63 2 that are temporally positioned between the existing learning data 300a and 300b and arranged in time series, based on the existing learning data 300a and 300b. Generate. Further, the interpolated image generation unit 321a generates interpolated images 64 1 and 64 2 that are temporally behind the existing learning data 300b and arranged in time series, for example, based on the existing learning data 300a and 300b. do.
- the interpolated image generator 321a can use a known technique such as motion prediction to generate the interpolated images 64 1 and 64 2 .
- the interpolation image generator 321a may predict and generate the interpolation images 64 1 and 64 2 using a model learned by machine learning or the like.
- the interpolated image generator 321 a passes the existing learning data 300 a and 300 b and the generated interpolated images 63 1 , 63 2 , 64 1 and 64 2 to the frame data divider 320 .
- the frame data division unit 320 performs line division or sub-sampling on the existing learning data 300a and 300b passed from the interpolation image generation unit 321a and the generated interpolation images 63 1 , 63 2 , 64 1 and 64 2 . I do.
- the frame data division unit 320 divides the lines, and divides the existing learning data 300a and 300b and the generated interpolated images 63 1 , 63 2 , 64 1 , 64 2 into lines arranged in time series. 62 11 to 62 16 are extracted. Based on these lines 62 11 to 62 16 , the frame data dividing section 320 generates six pieces of specialized learning data 302 (not shown) arranged in time series.
- one image that is, one existing learning data 300 based on existing image data is subjected to line division or subsampling to obtain a specialized image.
- a plurality of specialized learning data 302 are generated from the data.
- the recognition specialized sensor performs line division or subsampling at different times.
- line division an operation of extracting line L#1 at the timing of the first frame and extracting line L#2 at the timing of the next second frame can be considered.
- the specialized recognizer learns based on data extracted at different times.
- the specialized recognizer performs line division or subsampling from one image (existing learning data 300) acquired at the same time. Learning is performed based on the generated specialized learning data 302 . Therefore, the specialized recognizer may perform different learning than when using the actual specialized recognition sensor.
- the third example of the first embodiment two images (existing learning data 300) at different times are used, and learning is performed based on data extracted at pseudo different times by motion interpolation or the like. It can be carried out. Therefore, by applying the third example of the first embodiment, it becomes possible to learn with higher accuracy than the first and second examples of the first embodiment described above.
- a fourth example of the first embodiment will be described.
- a plurality of interpolated images with different times are generated from one frame image (existing learning data 300), and the generated interpolated images are divided into lines or Subsampling.
- the plurality of interpolated images are generated based on the movement of the camera when capturing the frame image.
- FIG. 17A is a functional block diagram of an example for explaining the functions of the conversion unit 301d in the learning system 3 according to the fourth example of the first embodiment.
- the conversion unit 301d includes an interpolated image generation unit 321b and a frame data division unit 320.
- the conversion unit 301d includes an interpolated image generation unit 321b and a frame data division unit 320.
- the image 60 which is a frame image as the existing learning data 300 corresponding to the existing recognizer, and the camera motion information 41 included in the camera information 40 are input to the interpolated image generation unit 321b.
- the camera information 40 is, for example, information about a camera that includes the imaging unit 11 according to the present disclosure, and includes camera motion information 41 that indicates the motion of the camera during imaging. If the camera has an IMU (Inertial Measurement Unit), the camera motion information 41 can be obtained based on the output of this IMU.
- IMU Inertial Measurement Unit
- the interpolated image generation unit 321b estimates a future image for the image 60 based on the input image 60 and the camera motion information 41, and generates an interpolated image after the time when the image 60 was captured.
- FIG. 17B is a schematic diagram for explaining interpolation image generation processing according to the fourth example of the first embodiment.
- the camera imaging unit 11
- the camera is rotated counterclockwise as indicated by an arrow 43, and performs imaging while changing the imaging direction 42 counterclockwise according to the rotation.
- Information indicating the rotation of the camera is passed as the camera motion information 41 to the interpolated image generator 321b.
- the interpolated image generator 321b estimates the future motion of the subject 56 with respect to the imaging time point in the frame image by, for example, global shift.
- the interpolated image generation unit 321b generates interpolated images 66 1 , 66 2 , and 66 3 that are future images with respect to the image 60 and that change in time series based on the estimated movement of the subject 56 within the frame image. Generate.
- the image 60 and interpolated images 66 1 , 66 2 , 66 3 are illustrated as time elapses in this order.
- the interpolated image generation unit 321 b passes the image 60 and the interpolated images 66 1 , 66 2 and 66 3 to the frame data division unit 320 .
- the frame data dividing unit 320 divides the image 60 passed from the interpolated image generating unit 321b and the interpolated images 66 1 , 66 2 and 66 3 into the first example or the second example of the first embodiment.
- Line division or subsampling is performed as described in the embodiment of .
- the frame data division unit 320 generates four pieces of specialized learning data 302 (not shown) that are arranged in time series in the future direction, starting from the time when the image 60 was captured.
- the camera motion information 41 is obtained based on the output of the IMU in the above description, it is not limited to this example.
- the camera movement may be set manually, and the camera movement information 41 may be obtained based on this setting information.
- a plurality of images that change in time series are generated from one existing learning data 300 . Then, based on the existing learning data 300 and the plurality of images, it is possible to generate a plurality of specialized learning data 302 that change in time series, each using specialized image data. Therefore, even if the existing learning data 300 based on existing image data is small, it is possible to sufficiently train the specialized recognizer.
- a fifth example of the first embodiment will be described.
- a plurality of interpolated images with different times are generated from one frame image (existing learning data 300), and the generated interpolated images are divided into lines or Subsampling.
- the plurality of interpolated images are generated by estimating the movement of the subject in the frame image.
- FIG. 18A is a functional block diagram of an example for explaining the functions of the conversion unit 301e in the learning system 3 according to the fifth example of the first embodiment.
- the conversion unit 301e includes an interpolated image generation unit 321c and a frame data division unit 320.
- FIG. 18A is a functional block diagram of an example for explaining the functions of the conversion unit 301e in the learning system 3 according to the fifth example of the first embodiment.
- the conversion unit 301e includes an interpolated image generation unit 321c and a frame data division unit 320.
- the image 60 as the existing learning data 300 corresponding to the existing recognizer and the subject movement information 75 acquired based on the other sensor information 74 are input to the interpolated image generation unit 321c.
- Other sensor information 74 is information based on the output of a sensor capable of detecting the movement of the subject. As such a sensor, for example, radar or LiDAR (Laser Imaging Detection and Ranging) can be applied.
- the recognition system 2 is configured as an in-vehicle vehicle, and the vehicle on which the recognition system 2 is mounted is further provided with sensors such as radar and LiDAR.
- sensors such as radar and LiDAR.
- the outputs of these radars and LiDAR can be used as other sensor information 74 .
- the interpolated image generation unit 321c estimates the movement of the subject in the image 60 based on the input image 60 and the subject movement information 75.
- the interpolated image generation unit 321c generates frame images after the time when the image 60 was captured as an interpolated image based on the estimated motion of the subject.
- FIG. 18B is a schematic diagram for explaining interpolation image generation processing according to the fifth example of the first embodiment. As shown, image 60 includes subjects 58 and 59 .
- the interpolated image generator 321 c estimates the motion of the subjects 58 and 59 included in the image 60 based on the subject motion information 75 . In the example of FIG. 18B, it is assumed that subject 58 is stationary, while subject 59 is moving from left to right in the image.
- the interpolated image generation unit 321c generates interpolated images 67 1 , 67 2 , and 67 3 that are future images with respect to the image 60 and that change in time series according to the estimated movement of the subject 59 .
- the subject 59 moves from left to right in the order of the image 60 and interpolated images 67 1 , 67 2 and 67 3 , and time elapses.
- the interpolated image generation unit 321 b passes the image 60 and the interpolated images 67 1 , 67 2 and 67 3 to the frame data division unit 320 .
- the frame data dividing unit 320 divides the image 60 passed from the interpolated image generating unit 321c and the interpolated images 67 1 , 67 2 and 67 3 into the first example or the second example of the first embodiment.
- Line division or subsampling is performed as described in the embodiment of .
- the frame data division unit 320 generates four pieces of specialized learning data 302 (not shown) that are arranged in time series in the future direction, starting from the time when the image 60 was captured.
- the interpolated image generator 321 c may estimate the motion of the subjects 58 and 59 based on the image 60 .
- the interpolated image generator 321c can estimate the movement of the vehicle based on the traveling direction of the vehicle estimated from the image 60, the blurring of the image of the vehicle in the image 60, and the like.
- the interpolated image generation unit 321c generates a plurality of interpolated images 67 1 that change in time series by changing the position of the vehicle with the estimated movement with respect to a fixed object (for example, the subject 58) in the image 60.
- 67 2 , 67 3 can be generated.
- the movement of the subject may be manually set, and based on this setting information, a plurality of interpolated images 67 1 , 67 2 , 67 3 that change in time series may be generated.
- a plurality of images that change in time series are generated from one existing learning data 300 . Then, based on the existing learning data 300 and the plurality of images, it is possible to generate a plurality of specialized learning data 302 that change in time series, each using specialized image data. Therefore, even if the existing learning data 300 based on existing image data is small, it is possible to sufficiently train the specialized recognizer.
- each of the conversion units 301a to 301e performs recognition processing based on the first signal read from the first sensor that performs readout in the first readout unit. for performing recognition processing based on a second signal read from a second sensor that reads out the first data set or the first recognizer in a second readout unit different from the first readout unit; Acts as a converter that converts to a second data set or a second recognizer.
- each of the conversion units 301a to 301e trains a first recognizer that performs recognition processing based on the first signal read out from the first sensor in the first readout unit.
- the evaluation data based on the non-frame-based image data related to the recognition specialized sensor is converted into the evaluation data based on the frame-based image data related to the existing recognizer.
- the provider of the specialized recognition sensor can provide conversion means for converting specialized evaluation data into existing evaluation data, thereby improving user convenience. That is, by using the converting means, the user can evaluate the recognition result of the existing recognizer based on the specialized evaluation data provided by the provider of the specialized recognition sensor.
- a first example of the second embodiment is an example of converting non-frame-based specialized evaluation data obtained by line division into existing frame-based evaluation data.
- a first example of the second embodiment will be described with reference to FIGS. 19A, 19B and 19C.
- the process according to each example of the second embodiment corresponds to the process of converting the specialized evaluation data 304 into the existing evaluation data 303 by the conversion unit 301 in the data generation unit 30 of the learning system 3 shown in FIG. 2B. .
- FIG. 19A is an example functional block diagram for explaining the function of the conversion unit 301f in the learning system 3 according to the first example of the second embodiment.
- the conversion unit 301f includes an accumulation/update processing unit 322 and an accumulation unit 323.
- FIG. 19A is an example functional block diagram for explaining the function of the conversion unit 301f in the learning system 3 according to the first example of the second embodiment.
- the conversion unit 301f includes an accumulation/update processing unit 322 and an accumulation unit 323.
- Specialized evaluation data 304 by line division is input to the conversion unit 301f.
- the specialized evaluation data 304 for example, any of the patterns described with reference to FIGS. 13A to 13E in the first embodiment may be applied.
- the accumulation/update processing unit 322 accumulates the input specialized evaluation data 304L#1, 304L#2, 304L#3, .
- the accumulation/update processing unit 322 integrates the accumulated specialized evaluation data 304, Generate frame-based existing evaluation data 303 .
- FIG. 19B is a schematic diagram showing a first example of generating the existing evaluation data 303 applicable to the first example of the second embodiment.
- Section (a) of FIG. 19B shows specialized evaluation data 304L#1, 304L#2, 304L#3, . shows an example of The specialized evaluation data 304L#1, 304L#2, 304L#3, .
- the accumulation/update processing unit 322 stores the areas of one frame in the areas updated in the specialized evaluation data 304L#1, 304L#2, 304L#3, . . . , 304L#n. They are sequentially replaced and accumulated in the accumulation unit 323 .
- the accumulation/update processing unit 322 stores the data corresponding to the line L#1 in one frame to the accumulation unit 323. is replaced with the data of line L#1 in the specialized evaluation data 304L#1. Thereafter, accumulation/update processing section 322 stores line L#2, L# in one frame in accumulation section 323 according to input specialization evaluation data 304L#2, 304L#3, . . . , 304L#n. 3, . replace.
- the accumulation/update processing unit 322 stores lines L#1, L#2, L#3, L#3, L#1, L#2, L#3, . . , the existing evaluation data 303 can be output at the time when the area of one frame in the storage unit 323 is replaced with all the data of L#n.
- 304L#n are transferred to the accumulation/update processing unit 322 for each of the lines L#1, L#2, L #3, . That is, each of the specialized evaluation data 304L#1, 304L#2, 304L#3, .
- a second example of generating the existing evaluation data 303 from the specialized evaluation data 304 applicable to the first example of the second embodiment will be described.
- the specialized evaluation data 304 consists of line-by-line data obtained by line division, and the existing evaluation data 303 is generated based on the specialized evaluation data 304 for each line.
- the specialization evaluation data 304 is assumed to consist of data of lines thinned out by line division.
- FIG. 19C is a schematic diagram showing a second example of generating the existing evaluation data 303 applicable to the first example of the second embodiment.
- one frame includes n lines (n is an odd number).
- Section (a) of FIG. 19C shows specialized evaluation data 304L# by lines L#1, L#3, L#5, . 1, 304L#3, 304L#5, . . . , 304L#n.
- the accumulation/update processing unit 322 stores the areas of one frame in the areas updated in the specialized evaluation data 304L#1, 304L#3, 304L#5, . They are sequentially replaced and accumulated in the accumulation unit 323 . At this time, the accumulation/update processing unit 322 interpolates the portions of the specialized evaluation data 304L#1, 304L#3, 304L#5, . do.
- the interpolation method is not particularly limited, but for example, linear interpolation using lines before and after the thinned line can be applied.
- the accumulation/update processing unit 322 generates a thinned line L#2 by interpolation processing based on the specialized evaluation data 304L#1 and 304L#3, for example.
- the accumulation/update processing unit 322 uses the line L#2 generated by the interpolation process to extract the data between the line L#1 based on the specialized evaluation data 304L#1 and the line #3 based on the specialized evaluation data 304L#3. replace.
- the accumulation/update processing unit 322 stores each line L#1, L#3, .
- the existing evaluation data 303 is output at the time when the area of one frame is replaced in the accumulation unit 323 with all the data of each line L#2, L#4, . can do.
- the special evaluation data 304L#1, 304L#3, . may be input to the accumulation/update processing unit 322 in any order.
- the specialized evaluation data 304 is composed of line-divided data for each line, and the specialized evaluation data 304 for each line divided is sequentially input to the conversion unit 301e. , but this is not limited to this example.
- the first example of the second embodiment uses FIG. 13B or FIG. 13C to describe the second or third example of the first example of the first embodiment.
- 304 can also be applied to examples where 304 is divided into groups of lines or partial lines.
- the specialized evaluation data 304 described in the fourth example of the first example of the first embodiment with reference to FIG. It can also be applied to an example in which is divided by a predetermined interval.
- each line is divided at predetermined intervals, which was described in the fifth example of the first example of the first embodiment using FIG. 13E. It can also be applied to examples involving two lines.
- a second example of the second embodiment is an example of converting sub-sampling non-frame-based specialized evaluation data into frame-based existing evaluation data.
- a second example of the second embodiment will be described with reference to FIGS. 20A, 20B and 20C.
- FIG. 20A is a functional block diagram of an example for explaining functions of the conversion unit 301g in the learning system 3 according to the second example of the second embodiment.
- the conversion unit 301g includes an accumulation/update processing unit 322 and an accumulation unit 323.
- FIG. 20A is a functional block diagram of an example for explaining functions of the conversion unit 301g in the learning system 3 according to the second example of the second embodiment.
- the conversion unit 301g includes an accumulation/update processing unit 322 and an accumulation unit 323.
- Specialized evaluation data 304 obtained by sub-sampling is input to the conversion unit 301g.
- the specialized evaluation data 304 for example, any of the patterns described with reference to FIGS. 15A to 15F in the first embodiment may be applied.
- specialized evaluation data 304 specialized evaluation data 304P ⁇ #1, 304P ⁇ #2, 304P ⁇ #3 and 304P ⁇ #4 obtained by sub-sampling this pattern while shifting the phase by one pixel in each of the row and column directions are applied.
- the accumulation/update processing unit 322 accumulates the input specialized evaluation data 304P ⁇ #1, 304P ⁇ #2, 304P ⁇ #3, and 304P ⁇ #4 in the accumulation unit 323 .
- the accumulation/update processing unit 322 integrates the accumulated specialized evaluation data 304, Generate frame-based existing evaluation data 303 .
- FIG. 20B is a schematic diagram showing a first example of generating the existing evaluation data 303 applicable to the second example of the second embodiment.
- Section (a) of FIG. 20B shows an example of specialized evaluation data 304P ⁇ #1, 304P ⁇ #2, 304P ⁇ #3 and 304P ⁇ #4 for each phase P ⁇ #1, P ⁇ #2, P ⁇ #3 and P ⁇ #4. ing.
- the specialized evaluation data 304P ⁇ #1, 304P ⁇ #2, 304P ⁇ #3 and 304P ⁇ #4 are sequentially input to the accumulation/update processing unit 322, for example.
- the accumulation/update processing unit 322 sequentially updates the area of one frame with the parts updated in the specialized evaluation data 304P ⁇ #1, 304P ⁇ #2, 304P ⁇ #3, and 304P ⁇ #4. It replaces and accumulates in the accumulation unit 323 .
- the accumulation/update processing unit 322 supplies the accumulation unit 323 with data corresponding to the phase P ⁇ #1 in one frame. is replaced with data of phase P ⁇ #1 in specialized evaluation data 304P ⁇ #1. After that, accumulation/update processing section 322 supplies phases P ⁇ #1 to P ⁇ #4 in one frame to accumulation section 323 according to input specialization evaluation data 304P ⁇ #2, 304P ⁇ #3, and 304P ⁇ #4. The corresponding data are sequentially replaced with the data of the phases P ⁇ #1 to P ⁇ #4 in the specialized evaluation data 304P ⁇ #1 to 304P ⁇ #4.
- the accumulation/update processing unit 322 replaces the area of one frame in the accumulation unit 323 with all the data of each phase P ⁇ #1 to P ⁇ #4 based on the specialized evaluation data 304P ⁇ #1 to 304P ⁇ #4.
- the existing evaluation data 303 can be output from the storage unit 323 .
- the specialized evaluation data 304P ⁇ #1 to 304P ⁇ #4 are input to the accumulation/update processing unit 322 in the order of the phases P ⁇ #1 to P ⁇ #4. is not limited to this example. That is, each of the specialized evaluation data 304P ⁇ #1 to 304P ⁇ #4 may be input to the accumulation/update processing section 322 in any order.
- FIG. 20C is a schematic diagram showing a second example of generating the existing evaluation data 303 applicable to the second example of the second embodiment.
- Section (a) of FIG. 20C is similar to section (a) of FIG. 20B and shows an example of specialized evaluation data 304P ⁇ #1 to 304P ⁇ #4 by phases P ⁇ #1 to P ⁇ #4. there is The specialized evaluation data 304P ⁇ #1 to 304P ⁇ #4 are sequentially input to the accumulation/update processing unit 322, for example.
- the accumulation/update processing unit 322 sequentially replaces the area of one frame with the parts updated in the specialized evaluation data 304P ⁇ #1 to 304P ⁇ #4, and accumulates them in the accumulation unit 323. do. At this time, the accumulation/update processing unit 322 interpolates the portions of the specialized evaluation data 304P ⁇ #1 to 304P ⁇ #4 that have not been updated, that is, the portions where pixels have been thinned out.
- the interpolation method is not particularly limited, but for example, linear interpolation using pixels in the vicinity of the thinned pixels can be applied.
- the accumulation/update processing unit 322 generates thinned pixels by interpolation processing at the positions of the phases P ⁇ #2, P ⁇ #3, and P ⁇ #4 based on the specialization evaluation data 304P ⁇ #1, for example. .
- the accumulation/update processing unit 322 replaces the data between the pixels of the specialized evaluation data 304P ⁇ #1 with the pixels of the phases P ⁇ #2 to P ⁇ #4 generated by the interpolation processing.
- the accumulation/update processing unit 322 can output the existing evaluation data 303 from the accumulation unit 323 when the specialized evaluation data 304P ⁇ #1 is input.
- the accumulation/update processing unit 322 converts the pixels of the phases P ⁇ #2 to P ⁇ #4 generated by the interpolation processing in response to the input of the specialization evaluation data 304P ⁇ #1 to the pixels of the specialization evaluation data 304P ⁇ #2. and each pixel generated by interpolation processing based on each pixel of the specialized evaluation data 304P ⁇ #1 and 304P ⁇ #2.
- the accumulation/update processing unit 322 can output the existing evaluation data 303 from the accumulation unit 323 even when the specialized evaluation data 304P ⁇ #2 is input after the specialized evaluation data 304P ⁇ #1.
- each pixel at the position of the phase P ⁇ #4 is thinned out.
- the accumulation/update processing unit 322 can generate the pixel at the position of phase P ⁇ #4 by interpolation processing based on the pixels of phases P ⁇ #1 to P ⁇ #3.
- the accumulation/update processing unit 322 replaces the data between each pixel of the specialized evaluation data 304P ⁇ #1, 304P ⁇ #2 and 304P ⁇ #3 with the pixel of phase P ⁇ #4 generated by the interpolation processing.
- the accumulation/update processing unit 322 converts each pixel generated by the interpolation processing in response to the input of the specialization evaluation data 304P ⁇ #1 and 304P ⁇ #2 to each pixel of the specialization evaluation data 304P ⁇ #1 to 304P ⁇ #3. may be replaced with each pixel generated by an interpolation process based on .
- accumulation/update processing unit 322 can output existing evaluation data 303 from accumulation unit 323 even when specialized evaluation data 304P ⁇ #3 is input after specialized evaluation data 304P ⁇ #1 and 304P ⁇ #2. can.
- the accumulation/update processing unit 322 can output the existing evaluation data 303 from the accumulation unit 323 .
- the accumulation/update processing unit 322 may replace each pixel generated by the interpolation process in accordance with the input of the specialization evaluation data 304P ⁇ #1 to 304P ⁇ #3 with each pixel of the specialization evaluation data 304P ⁇ #4. .
- the specialized evaluation data 304P ⁇ #1 to 304P ⁇ #4 are input to the accumulation/update processing unit 322 in the order of the phases P ⁇ #1 to P ⁇ #4. is not limited to this example. That is, each of the specialized evaluation data 304P ⁇ #1 to 304P ⁇ #4 may be input to the accumulation/update processing section 322 in any order. Further, which of the specialized evaluation data 304P ⁇ #1 to 304P ⁇ #4 is input to output the existing evaluation data 303 from the storage unit 323 depends on the quality required for the existing evaluation data 303, for example. You can decide.
- sub-sampled specialization evaluation data 304P ⁇ #1, 304P ⁇ #2, 304P ⁇ # at positions corresponding to respective phases P ⁇ #1, P ⁇ #2, P ⁇ #3 and P ⁇ #4 3 and 304P ⁇ #4 are input to the conversion unit 301f, but this is not limited to this example.
- the second example of the second embodiment is the specialized evaluation data described in the first or second example of the second example of the first embodiment using FIG. 304 can also be applied to an example in which a plurality of pixels px are arranged discretely and periodically in each of the line direction and vertical direction.
- the second example of the second embodiment uses FIG. 15C or FIG. 15D to describe the third or fourth example of the second example of the first embodiment.
- 304 can also be applied to an example in which sub-sampling is performed in units of a plurality of pixels that are sequentially adjacent in each of the line direction and vertical direction.
- a second example of the second embodiment is the pattern of a plurality of discretely arranged pixels described in the fifth example of the first embodiment with reference to FIG. It can also be applied to an example in which sub-sampling is performed in units of patterns in which pixels are arranged according to the shape of an object or the like. Furthermore, the second example of the second embodiment is the pattern of a plurality of discrete and non-periodically arranged pixels described in the sixth example of the first embodiment using FIG. 15F. It can also be applied to sub-sampled examples according to .
- a second example of the second embodiment is an example of converting the format of non-frame-based specialized evaluation data by subsampling to generate frame-based existing evaluation data.
- a third example of the second embodiment will be described with reference to FIGS. 21A, 21B and 21C.
- FIG. 21A is a functional block diagram of an example for explaining the functions of the conversion unit 301h in the learning system 3 according to the third example of the second embodiment.
- conversion section 301h includes format conversion section 324 .
- Specialized evaluation data 304 obtained by line division or subsampling is input to the conversion unit 301h.
- the format conversion unit 324 performs format conversion processing on the specialized evaluation data 304 input to the conversion unit 301 h to generate frame-based existing evaluation data 303 . More specifically, the format conversion unit 324 generates the existing evaluation data 303 by combining line-divided or sub-sampled lines or pixels and integrating them into one image.
- the specialization evaluation data 304 is such that the arrangement of each line-divided or sub-sampled data for one frame image can be handled as a frame-based data arrangement.
- the specialized evaluation data 304 applicable to the third example of the second embodiment may apply line-segmented or sub-sampled data in a periodic pattern throughout an image of one frame. can.
- a first example of generating evaluation data by format conversion which is applicable to the third example of the second embodiment, will be described.
- This first example is an example of generating the existing evaluation data 303 from the specialized evaluation data 304 generated by performing line division by line thinning.
- FIG. 21B is a schematic diagram showing a first example of existing evaluation data generation applicable to the third example of the second embodiment.
- the specialized evaluation data 304Lt is generated by dividing an image of one frame into lines and periodically thinning out the divided lines.
- This specialized evaluation data 304Lt is input to the format conversion section 324 .
- the format conversion unit 324 extracts each line included in the input specialized evaluation data 304Lt, that is, each line not thinned in the original one-frame image.
- the format conversion unit 324 combines the extracted lines in the order of the lines in the direction perpendicular to the lines to generate the existing evaluation data 303Lt.
- This existing evaluation data 303Lt can be considered as an image obtained by lowering the resolution of the original one-frame image. For example, the user can use the existing evaluation data 303Lt generated in this way to evaluate the recognition result of the existing recognizer.
- This first example is an example of generating existing evaluation data 303 from specialized evaluation data 304 generated by extracting pixels by sub-sampling.
- FIG. 21C is a schematic diagram showing a second example of existing evaluation data generation applicable to the third example of the second embodiment.
- specialized evaluation data 304P ⁇ #1 corresponds to specialized evaluation data 304P ⁇ #1 described with reference to FIG. 20B. That is, the specialized evaluation data 304P ⁇ #1 is each pixel at the position of the phase P ⁇ #1 among the phases P ⁇ #1 to P ⁇ #4 according to the pattern obtained by thinning out one pixel each in the row and column directions from the image of one frame. Consists of
- This specialized evaluation data 304P ⁇ #1 is input to the format conversion unit 324.
- the format conversion unit 324 extracts each pixel included in the input specialized evaluation data 304P ⁇ #1, that is, the image at the position of phase P ⁇ #1 in the original one-frame image.
- the format conversion unit 324 combines the extracted pixels according to the positional relationship of the pixels to generate the existing evaluation data 303P ⁇ #1.
- This existing evaluation data 303P ⁇ #1 can be considered as an image obtained by lowering the resolution of the original one-frame image. For example, the user can use the existing evaluation data 303P ⁇ #1 generated in this way to evaluate the recognition result of the existing recognizer.
- a fourth example of the second embodiment is an example in which the first and second examples of the second embodiment described above and the third example are combined.
- the first and second examples of the second embodiment are collectively referred to as an accumulation method
- the third example is referred to as a non-accumulation method.
- the accumulation method and the non-accumulation method are executed in parallel, and the existing evaluation data generated by the accumulation method and the existing evaluation data generated by the non-accumulation method are stored. , selected according to predetermined conditions. Alternatively, the existing evaluation data generated by the accumulation method and the existing evaluation data generated by the non-accumulation method are weighted, and priority is set for these data.
- the evaluation of the storage method and the non-storage method for each item of (1) resolution, (2) reliability, and (3) processing delay will be described.
- the resolution indicates the resolution of the existing evaluation data as an image.
- the reliability indicates the reliability of the result of recognition processing by an existing recognizer evaluated using existing evaluation data.
- the processing delay indicates the delay in the timing at which the existing evaluation data 303 based on the input specialized evaluation data 304 is output from the conversion unit 301 with respect to the timing at which the specialized evaluation data 304 is input to the conversion unit 301 .
- the evaluation of reliability depending on the size of the object is as follows depending on whether the size of the object is greater than or equal to a predetermined value or less than a predetermined value. ⁇ Objects larger than a specified size: Non-accumulation method > Accumulation method ⁇ Objects smaller than a specified size: Accumulation method > Non-accumulation method
- the evaluation of reliability that depends on the motion of the object is as follows depending on whether the motion of the object is greater than or equal to a predetermined value or less than a predetermined value. ⁇ Objects moving more than a predetermined amount: Non-accumulation method > Accumulation method ⁇ Objects moving less than a predetermined amount: Accumulation method > Non-accumulation method
- the non-accumulation method cannot obtain information on the thinned out parts, so it may be difficult to grasp the movement.
- the accumulation method since all information of one frame can be obtained, a small movement can be easily grasped, and the influence of the difference in acquisition timing of data of each part in the existing evaluation data 303 is small.
- the degree of delay is "non-storage method>storage method".
- the non-storage method the existing evaluation data 303 is generated without acquiring all the information in the image of one frame.
- the accumulation method the existing evaluation data 303 is generated after all the information in the image of one frame is obtained. Therefore, the non-accumulation method can reduce the processing delay as compared with the accumulation method.
- the recognition result of the existing evaluation data 303 of the non-storage method and the recognition result of the existing evaluation data 303 of the storage method should be prioritized.
- the indicated weight is set to these existing evaluation data 303, and these existing evaluation data 303 are integrated.
- the recognition result of the existing evaluation data 303 of the non-storage method is prioritized over the recognition result of the existing evaluation data 303 of the storage method.
- the existing evaluation data 303 and the existing evaluation data 303 of the accumulation method are weighted.
- the existing evaluation data 303 of the non-storage method is set so that the recognition result of the existing evaluation data 303 of the storage method has priority over the recognition result of the existing evaluation data 303 of the non-storage method. and the existing evaluation data 303 of the accumulation method are weighted.
- these weights indicate which of the recognition result of the existing evaluation data 303 of the non-accumulation method and the recognition result of the existing evaluation data 303 of the accumulation method should be prioritized.
- existing evaluation data 303, and these existing evaluation data 303 are integrated. Note that the motion of the object included in the existing evaluation data 303 here includes the motion of the object in the existing evaluation data 303 according to the motion of the camera and the motion of the object itself, which is the subject.
- the recognition result of the existing evaluation data 303 of the non-storage method is prioritized over the recognition result of the existing evaluation data 303 of the storage method.
- the evaluation data 303 and the existing evaluation data 303 of the accumulation method are weighted.
- the existing evaluation data 303 of the non-storage method is set so that the recognition result of the existing evaluation data 303 of the storage method has priority over the recognition result of the existing evaluation data 303 of the non-storage method. , and the existing evaluation data 303 of the accumulation method are weighted.
- the existing evaluation data 303 of the non-accumulation method is weighted by 80 (%), and the existing evaluation data 303 of the accumulation method is weighted by 20 (%).
- the conversion unit 301 blends the non-accumulated existing evaluation data 303 and the accumulated existing evaluation data 303 at a ratio corresponding to the weight, and outputs the final existing evaluation data 303 .
- the amount of processing delay differs between the storage method and the non-storage method. Therefore, in a scene requiring promptness, first, the existing evaluation data 303 by the non-accumulation method is output. After that, when the existing evaluation data 303 by the accumulation method can be output, the result of integrating the previously output existing evaluation data 303 by the non-accumulation method and the existing evaluation data 303 by the accumulation method is output.
- a fifth example of the second embodiment relates to output timing at which the conversion unit 301 outputs the existing evaluation data 303 .
- a fifth example of the second embodiment will be described with reference to FIGS. 22A to 22E.
- FIG. 22A is a functional block diagram of an example for explaining the functions of the conversion unit 301i in the learning system 3 according to the fifth example of the second embodiment.
- the conversion unit 301 i includes an accumulation unit 323 , an accumulation processing unit 325 and an accumulation determination unit 326 .
- Specialized evaluation data 304 obtained by line division or subsampling is sequentially input to the conversion unit 301i for each line division process or subsampling.
- the accumulation processing unit 325 sequentially accumulates the specialized evaluation data 304 input to the conversion unit 301 i in the accumulation unit 323 .
- the accumulation determination unit 326 monitors the amount of the specialized evaluation data 304 accumulated in the accumulation unit 323, and determines that a predetermined amount of the specialized evaluation data 304 has been accumulated.
- the evaluation data 304 are integrated and output as existing evaluation data 303 .
- the specialized evaluation data 304 input to the conversion unit 301i for example, using FIGS. 13A to 13E in the first example of the first embodiment, Any of the described specialized evaluation data 304 generated by line division can be applied.
- 15A to 15F in the second example of the first embodiment the special evaluation data 304 generated by sub-sampling other than the non-periodic pattern sub-sampling shown in FIG. any of the specialized evaluation data 304 of .
- a first example of the existing evaluation data 303 output timing according to the fifth example of the second embodiment will be described.
- This first example is an example in which the accumulation determination unit 326 outputs the existing evaluation data 303 when the accumulation unit 323 accumulates the specialized evaluation data 304 of all regions of one frame.
- FIG. 22B is a schematic diagram for explaining a first example of output timing of the existing evaluation data 303 according to the fifth example of the second embodiment. 22B, section (a) shows specialized evaluation data 304L#1 for each line L#1, L#2, L#3, L#4, . , 304L#2, 304L#3, 304L#4, . . . , 304L#n. Each specialized evaluation data 304L#1, 304L#2, .
- Section (b) of FIG. 22B schematically shows how the specialized evaluation data 304L#1, 304L#2, .
- the accumulation processing unit 325 sequentially replaces the updated portions of the input specialized evaluation data 304L#1, 304L#2, .
- the accumulation determination unit 326 obtains the specialized evaluation data 304L#1, 304L#2, . . . 304L#n is accumulated.
- the accumulation determination unit 326 outputs the data accumulated in the accumulation unit 323 as the existing evaluation data 303 according to this determination.
- a second example of the existing evaluation data 303 output timing according to the fifth example of the second embodiment will be described.
- This second example is an example in which the accumulation determination unit 326 outputs the existing evaluation data 303 when the accumulation unit 323 accumulates the specialized evaluation data 304 in an area equal to or greater than a predetermined ratio of the area of one frame. is.
- FIG. 22C is a schematic diagram for explaining a second example of the output timing of the existing evaluation data 303 according to the fifth example of the second embodiment.
- one frame includes 9 lines.
- the line at the top end of one frame is line L#1
- the line at the bottom end is line L#9.
- section (a) shows the Examples of specialized evaluation data 304L#1, 304L#2, 304L#3, 304L#4, 304L#5, 304L#6, .
- the accumulation determination unit 326 integrates the accumulated specialized evaluation data 304 and outputs the accumulated specialized evaluation data 304 as the existing evaluation data 303 every time the specialized evaluation data 304 is accumulated in the 1 ⁇ 3 area of one frame. and In this example in which one frame includes nine lines, the accumulation determination unit 326 outputs the existing evaluation data 303 each time three lines of specialized evaluation data 304, which is 1/3 of one frame, is accumulated. Become.
- Section (b) of FIG. 22C schematically shows how the specialized evaluation data 304L#1, 304L#2, .
- the accumulation processing unit 325 sequentially replaces the updated portions of the input specialized evaluation data 304L#1, 304L#2, .
- the accumulation determination unit 326 stores the special evaluation data for these three lines.
- the new evaluation data 304L#1, 304L#2 and 304L#3 are integrated and output as existing evaluation data 303(1).
- accumulation processing section 325 stores these special evaluation data 304L#. 4, 304L#5 and 304L#6 are accumulated in the accumulation unit 323.
- the accumulation determination unit 326 determines the three lines of the specialized evaluation data 304L#4, 304L#5, and 304L#. 6 are integrated and output as existing evaluation data 303(2).
- the accumulation processing unit 325 sequentially accumulates the input specialization evaluation data 304L#x for each line L#x in the accumulation unit 323 .
- the accumulation determination unit 326 obtains the special evaluation data 304L#1, L#2, . . , L#9 is determined to be accumulated.
- the accumulation determination unit 326 integrates the specialized evaluation data 304L#1, 304L#2, .
- FIG. 22D is a schematic diagram for explaining a third example of the output timing of the existing evaluation data 303 according to the fifth example of the second embodiment.
- one frame includes 9 lines, as in FIG. 22C described above.
- the description will be made assuming that the existing evaluation data 303 is output each time the specialized evaluation data 304 for four lines is input.
- Section (b) of FIG. 22D schematically shows how the specialized evaluation data 304L#1, 304L#2, .
- the accumulation processing unit 325 sequentially replaces the updated portions of the input specialized evaluation data 304L#1, 304L#2, . . . and accumulates them in the accumulation unit 323.
- Accumulation determination unit 326 stores specialized evaluation data 304L#1, 304L#2, 304L#3, and 304L#4 for four lines of lines L#1, L#2, L#3, and L#4 in accumulation unit 323.
- the four lines of specialized evaluation data 304L#1 to 304L#4 are integrated and output as existing evaluation data 303(10).
- accumulation processing section 325 receives specialized evaluation data 304L#5, 304L#6, 304L#7 and 304L#8 by lines L#5, L#6, L#7 and L#8. , the specialized evaluation data 304L#5 to 304L#8 are accumulated in the accumulation unit 323.
- FIG. 3 When the four lines of the specialized evaluation data 304L#5 to 304L#8 are accumulated in the accumulation unit 323, the accumulation determination unit 326 integrates the four lines of the specialized evaluation data 304L#5 to 304L#8. and output as the existing evaluation data 303(11).
- the accumulation processing unit 325 sequentially accumulates each specialized evaluation data 304L#x for each line L#x after the line L#9 in the accumulation unit 323 .
- the accumulation determination unit 326 outputs the existing evaluation data 303(y) each time the accumulation unit 323 accumulates specialized evaluation data 304L#x for four lines.
- FIG. 22E is a schematic diagram for explaining a case where the cycle of outputting existing evaluation data and the cycle of inputting specialized evaluation data for one frame do not have an integral multiple relationship.
- FIG. 22E starts from the timing at which the existing evaluation data 303(11) is output in FIG. 22D described above.
- the specialized evaluation data 304L#8 of the second line L#8 from the bottom in one frame is input, the specialized evaluation data 304L#9 of the bottom line L#9 of one frame is accumulated. It is input to the processing unit 325 and accumulated in the accumulation unit 323 .
- the specialized evaluation data 304L#10 based on the upper end line L#1 of one frame is input to the accumulation processing unit 325 and accumulated in the accumulation unit 323.
- the accumulation determination unit 326 assumes that four lines of the specialized evaluation data 304L#x are accumulated, and determines that the specialized evaluation data 304L#9 and 304L are accumulated.
- #10, 304L#11 and 304L#12 are integrated and output as existing evaluation data 303(12).
- the existing evaluation data 303(12) includes specialized evaluation data 304L#10, 304L#11, and 304L#12 that are line-sequentially continuous, and specialized evaluation data 304L#9 that is not continuous with these within a frame. and data including
- the specialized evaluation data 304 is input to the accumulation processing unit 325 periodically, that is, line by line division, but this is not limited to this example.
- the specialized evaluation data 304 may be input every several lines by line division, or may be input in an aperiodic pattern (such as a random pattern). In these cases, it is assumed that the output period of the existing evaluation data 303 is shifted from the frame update period.
- each of the conversion units 301f to 301i performs recognition processing based on the first signal read from the first sensor that performs readout in the first readout unit. for performing recognition processing based on a second signal read from a second sensor that reads out the first data set or the first recognizer in a second readout unit different from the first readout unit; Acts as a converter that converts to a second data set or a second recognizer.
- each of the conversion units 301f to 301i generates a second signal read from a second sensor that differs from the first sensor in at least one of the readout unit, pixel characteristics, and signal characteristics. It also functions as a generator that generates a signal corresponding to the first signal read out from the first sensor.
- the third embodiment is an example of training a specialized recognizer so that the network of existing recognizers and the network of specialized recognizers can obtain the same output.
- the explanation is given assuming that the existing recognizer network is a frame-based network, and the specialized recognizer network is a non-frame-based network.
- the network of specialized recognizers may be a network with special signal characteristics for recognition.
- the processing according to each example of the third embodiment is the processing of converting the existing recognizer 310 into the specialized recognizer 312 by the NW conversion unit 311 in the recognizer generation unit 31 of the learning system 3 shown in FIG. 2B. handle.
- a technique called “distillation” is used to train a specialized recognizer.
- “Distillation” generally refers to the technique of using the output of an existing recognizer to improve the performance of a target recognizer.
- the existing recognizers are assumed to be large-scale, high-performance, and/or recognizers with abundant training data.
- the target recognizer is assumed to be a recognizer with small scale, low performance, and/or insufficient training data. In this way, it is known that the performance can be further improved by using not only the training data but also the outputs of other recognizers for learning the target recognizer.
- FIG. 23 is a schematic diagram for schematically explaining each processing pattern according to the third embodiment.
- NW frame-based
- NW non-frame-based
- Input data: frame-based indicates frame-based input data (referred to as existing input data)
- output data: non-frame-based indicates non-frame-based input data (referred to as specialized input data).
- GT is an abbreviation for “Correct data: Ground Truth”
- GT: Frame-based indicates frame-based correct data (referred to as existing correct data)
- GT: Non-frame-based shows non-frame-based correct answer data (referred to as specialized correct answer data).
- Both learning data and evaluation data can be applied as input data.
- the input data is assumed to be learning data unless otherwise specified. Processing when the input data is the evaluation data is the same as when the input data is the learning data.
- the specialized input data data of each pattern described with reference to FIGS. 13A to 13E and FIGS. 15A to 15F in the first embodiment can be applied.
- Case #1 (CASE #1) is an example in which an existing recognizer other than a specialized recognizer, existing input data, specialized input data, existing correct data, and specialized correct data are available. In this case, the specialized recognizer is trained by ordinary distillation.
- Case #2 (CASE #2) is an example in which there are existing recognizers, existing input data, existing correct data, and specialized correct data, but no specialized input data.
- specialized input data is generated from existing input data, and then distilled to train a specialized recognizer.
- Case #3 (CASE #3) is an example in which there are existing recognizers, specialized input data, existing correct data, and specialized correct data, but no existing input data.
- the existing input data is generated from the specialized input data, and the distillation is performed thereon to train the specialized recognizer.
- Case #4 (CASE #4) is an example in which there is an existing recognizer, existing correct data and specialized correct data, but no existing input data and specialized input data.
- existing input data is generated based on the existing recognizer
- specialized input data is generated based on the generated existing input data. After generating existing input data and specialized input data in this way, distillation is performed to train a specialized recognizer.
- Case #5 (CASE #5) is an example in which an existing recognizer, existing correct data, and specialized correct data exist, but there is no existing input data and specialized input data, similar to case #4 described above. .
- specialized input data is generated in some way, and existing input data is generated based on the generated specialized input data.
- a random generation method can be applied for generation of specialized input data. After generating existing input data and specialized input data in this way, distillation is performed to train a specialized recognizer.
- FIG. 24 is a schematic diagram for explaining a distillation process applicable to the third embodiment.
- (B) input data for the existing recognizer (existing input data) is input to the learned (A) existing recognizer.
- the existing recognizer (B) performs recognition processing on input data for the existing recognizer, and (C) outputs an existing recognition output.
- input data for (E) specialized recognizer (specialized input data) is input to the unlearned (D) specialized recognizer.
- D) The specialized recognizer performs (E) recognition processing on the input data for the specialized recognizer, and (F) outputs a specialized recognition output.
- Elements necessary for distillation are (A) existing recognizer, (B) input data for existing recognizer, (C) existing recognition output, (D) specialized recognizer, (E) input data for specialized recognizer, and (F) specialized recognition output.
- A existing recognizer
- B input data for existing recognizer
- C existing recognition output
- D existing recognition output
- E specialized recognizer
- E input data for specialized recognizer
- F specialized recognition output
- FIG. 25 is a schematic diagram showing classified processes according to the third embodiment.
- processing related to existing input data and specialized input data is classified into processing of converting existing input data into specialized input data and processing of converting specialized input data into existing input data. can. Further, in the third embodiment, processing such as conversion can be classified into processing of conversion only and processing of conversion and generation.
- the conversion process is from specialized input data to existing input data, and only the conversion process is performed, it corresponds to case #3 described above, and (B) is an example in which there is no input data for an existing recognizer. In this case, a process of converting specialized input data into existing input data is performed. This conversion processing is processing equivalent to that of the above-described second embodiment.
- the above-described case #4 corresponds to (B) the input data for the existing recognizer and (E) This is an example in which there is no input data for a specialized recognizer. In this case, a process of generating existing input data and converting the generated existing input data into specialized input data is performed.
- the above-described case #5 corresponds to (B) the input data for the existing recognizer and (E) This is an example in which there is no input data for a specialized recognizer. In this case, a process of generating specialized input data and converting the generated specialized input data into existing input data is performed.
- FIG. 26 is a schematic diagram for explaining a general distillation process.
- existing learning data 400 is applied as existing input data.
- existing learning data 400 includes images 401 and correct data 402 .
- the output of the existing recognizer 410 is used to train the target recognizer 422 .
- the existing recognizer 410 and the target recognizer 422 each perform recognition processing.
- An existing recognition output 411 is obtained by the recognition processing of the existing recognizer 410 .
- a target recognition output 423 is obtained by the recognition processing of the target recognizer 422 .
- the inter-recognition output error calculator 430 obtains the error between the existing recognition output 411 and the target recognition output 423 , performs calculations to minimize the distance between the existing recognition output 411 and the target recognition output 423 , and calculates the minimization error 431 .
- Ask for The recognition output error calculation unit 430 can use, for example, the Euclidean distance based on the L 2 norm or KL-Divergence to calculate the distance minimization.
- the inter-recognition output error calculation unit 430 feeds back the calculated minimization error 431 to the target recognizer 422 to update the target recognizer 422 .
- the inter-recognition-output error calculator 430 optimizes the target recognizer 422 by training the target recognizer 422 so as to reduce the minimization error 431 .
- Error backpropagation can be applied to the process of feeding back the minimization error 431 to the target recognizer 422 to update the target recognizer 422 .
- the object recognizer 422 is optimized using the existing recognition output 411 and the object recognition output 423 based on the image 401 included in the existing learning data 400, but this is not limited to this example.
- normal learning using the correct answer data 402 may be executed at the same time.
- FIG. 27 is a schematic diagram for explaining the distillation process according to the third embodiment.
- existing learning data 400 for input to the existing recognizer 410 and specialized learning data 440 for input to the specialized recognizer 420 are used as input data.
- Specialized learning data 440 includes an image 441 and correct answer data (GT) 442 .
- the deviation correction 450a for the input of the existing learning data 400 to the existing recognizer 410 and the deviation correction 450c for the input of the specialized learning data 440 to the specialized recognizer 420 are performed. may be required. Further, there are cases where a deviation correction 450b for the input of the existing recognition output 411 to the recognition output error calculation unit 430 and a deviation correction 450d for the minimized error 431 output from the recognition output error calculation unit 430 are required. be.
- the deviation correction 450 d can also be applied to the input of the specialized recognition output 421 to the inter-recognized output error calculator 430 . Furthermore, the calculation of the minimization error 431 in the error calculation unit 430 between recognition outputs may need to include the deviation amount data 451 .
- the existing learning data 400 image 401
- the specialized learning data 440 image 441
- the correct data 402 and 442 need to be coordinate-transformed.
- the specialized learning data 440 (image 441) has a higher frame rate than the existing learning data 400 (image 401), and the existing learning data A case where only the data 400 has the correct data 402 is exemplified. In this case, since the correct data 402 of the existing learning data 400 is low frame rate data, interpolation in the time direction is required, for example.
- the deviation corrections 450a to 450d and the deviation amount data 451 may be unnecessary.
- the deviation corrections 450a to 450d and the deviation amount data 451 are not required. In this case, since the existing learning data 400 and the specialized learning data 440 match in advance in the imaging range and frame rate, no correction is required.
- the calculated error is weighted according to the amount of deviation correction.
- the weighting of the calculated error is increased as the amount of deviation or the amount of deviation correction is smaller, and is decreased as the amount of deviation or the amount of deviation correction is larger.
- the deviation corrections 450a and 450b on the side of the existing recognizer 410, the deviation corrections 450c and 450d on the side of the specialized recognizer 420, the deviation correction by the error calculation unit 430 between recognition outputs based on the deviation amount data 451, is executed, this is not limited to this example.
- the deviation correction by the recognition output error calculator 430 based on the deviation amount data 451 can be omitted.
- the first example of the third embodiment corresponds to case #1 described with reference to FIG. This is an example of generating a specialized recognizer when correct data and specialized correct data are available.
- the general distillation process described above can be applied.
- FIG. 28 is a schematic diagram for explaining processing according to the first example of the third embodiment.
- the inter-recognized-output error calculator 430 is included in the NW converter 311 in the recognizer generator 31 of the learning system 3 shown in FIG. 2B.
- existing learning data 400 including an image 401 and correct data 402 is applied as existing input data.
- specialized learning data 440 including an image 441 and correct answer data 442 is applied as specialized input data.
- the existing recognizer 410 executes recognition processing based on the image 401 included in the existing learning data 400 and outputs an existing recognition output 411 .
- the specialized recognizer 420 executes recognition processing based on the image 441 included in the specialized learning data 440 and outputs a specialized recognition output 421 .
- the inter-recognition output error calculator 430 obtains the error between the existing recognition output 411 and the specialized recognition output 421, performs calculations to minimize the distance between the existing recognition output 411 and the specialized recognition output 421, and minimizes the distance between the existing recognition output 411 and specialized recognition output 421. Find the error 431 .
- the recognition output error calculation unit 430 can use, for example, the Euclidean distance based on the L 2 norm or KL-Divergence to calculate the distance minimization.
- the inter-recognition output error calculation unit 430 feeds back the calculated minimized error 431 to the specialized recognizer 420 by, for example, the error backpropagation method, and updates the specialized recognizer 420 .
- the inter-recognized-output error calculator 430 optimizes the specialized recognizer 420 by re-learning the specialized recognizer 420 so as to reduce the minimization error 431 .
- the specialized recognizer 420 is optimized using the existing recognition output 411 and the specialized recognition output 421 based on the image 401 included in the existing learning data 400 and the image 441 contained in the specialized learning data 440.
- regular training using correct answer data 402 and 442 may optimize specialized recognizer 420 .
- the optimization based on the images 401 and 441 and the optimization based on the correct data 402 and 442 may be performed at the same time.
- a second example of the third embodiment corresponds to case #2 described with reference to FIG. This is an example of generating a specialized recognizer when there is no transformation input data. In this case, specialized input data is generated from existing input data, and then distillation is performed.
- FIG. 29 is a schematic diagram for explaining processing according to the second example of the third embodiment.
- the recognition output error calculation unit 430 and the existing/specialization conversion unit 460 are included in the NW conversion unit 311 in the recognizer generation unit 31 of the learning system 3 shown in FIG. 2B.
- the existing/specialized conversion unit 460 has a function of converting the existing learning data 300 in the conversion unit 301 shown in FIG. 2B into the specialized learning data 302 .
- the function of the existing/specialized conversion unit 460 can also use the function of the conversion unit 301 in the data generation unit 30 .
- an image 401 included in existing learning data 400 (not shown) is applied as the existing input data.
- an existing recognizer 410 executes recognition processing based on an image 401 and outputs an existing recognition output 411.
- the existing/specialized converter 460 converts the image 401 corresponding to the existing recognizer 410 into an image 441 a corresponding to the specialized recognizer 420 .
- Existing/specialized converter 460 can perform this conversion using, for example, any of the examples in the first and second examples of the first embodiment.
- the specialized recognizer 420 executes recognition processing based on the image 441 a converted from the image 401 by the existing/specialized converter 460 and outputs a specialized recognition output 421 .
- the inter-recognition output error calculator 430 obtains the error between the existing recognition output 411 and the specialized recognition output 421, performs calculations to minimize the distance between the existing recognition output 411 and the specialized recognition output 421, and minimizes the distance between the existing recognition output 411 and specialized recognition output 421. Find the error 431 .
- the inter-recognition output error calculation unit 430 feeds back the calculated minimized error 431 to the specialized recognizer 420 by, for example, the error backpropagation method, and updates the specialized recognizer 420 .
- the inter-recognized-output error calculator 430 optimizes the specialized recognizer 420 by re-learning the specialized recognizer 420 so as to reduce the minimization error 431 .
- a third example of the third embodiment corresponds to case #3 described with reference to FIG. This is an example of generating a specialized recognizer when there is no existing input data. In this case, existing input data is generated from specialized input data, and then distillation is performed.
- FIG. 30 is a schematic diagram for explaining processing according to the third example of the third embodiment.
- the recognition output error calculation unit 430 and the specialization/existing conversion unit 461 are included in the NW conversion unit 311 in the recognizer generation unit 31 of the learning system 3 shown in FIG. 2B.
- the specialization/existing conversion unit 461 has a function of converting the specialization evaluation data 304 in the conversion unit 301 shown in FIG. 2B into the existing evaluation data 303 .
- the function of the specialized/existing conversion unit 461 can also use the function of the conversion unit 301 in the data generation unit 30 .
- an image 441 included in specialized learning data 440 (not shown) is applied as the existing input data.
- the specialized/existing converter 461 converts an image 441 corresponding to the specialized recognizer 420 into an image 401a corresponding to the existing recognizer 410.
- the specialized/existing conversion unit 461 can perform this conversion using, for example, any one of the examples in the first to fourth examples of the second embodiment.
- the existing recognizer 410 performs recognition processing based on the image 401 a converted from the image 441 by the specialization/existing conversion unit 461 and outputs an existing recognition output 411 .
- the specialized recognizer 420 executes recognition processing based on the image 441 and outputs a specialized recognition output 421.
- the inter-recognition output error calculator 430 obtains the error between the existing recognition output 411 and the specialized recognition output 421, performs calculations to minimize the distance between the existing recognition output 411 and the specialized recognition output 421, and minimizes the distance between the existing recognition output 411 and specialized recognition output 421. Find the error 431 .
- the inter-recognition output error calculation unit 430 feeds back the calculated minimized error 431 to the specialized recognizer 420 by, for example, the error backpropagation method, and updates the specialized recognizer 420 .
- the inter-recognized-output error calculator 430 optimizes the specialized recognizer 420 by re-learning the specialized recognizer 420 so as to reduce the minimization error 431 .
- a fourth example of the third embodiment corresponds to case #4 described with reference to FIG. This is an example of generating a specialized recognizer when there is no transformation input data.
- existing input data is generated based on the existing recognizer, and specialized input data is generated based on the generated existing input data. Distillation is performed after the existing input data and specialized input data are generated in this manner.
- FIG. 31A is a schematic diagram for explaining processing according to the fourth example of the third embodiment.
- the inter-recognized output error calculator 430, the existing/specialized converter 460, and the recognized image extractor 470 are included in the NW converter 311 in the recognizer generator 31 of the learning system 3 shown in FIG. 2B.
- the function of the existing/specialized conversion unit 460 can also use the function of the conversion unit 301 in the data generation unit 30 .
- the recognition image extraction unit 470 extracts and generates an image 401 b corresponding to the existing recognizer 410 from the existing recognizer 410 by using a known recognition image extraction technique for the existing recognizer 410 .
- the existing/specialized converter 460 converts the image 401 b extracted and generated by the recognized image extractor 470 into an image 441 b corresponding to the specialized recognizer 420 .
- Existing/specialized converter 460 can perform this conversion using, for example, any of the examples in the first and second examples of the first embodiment.
- the specialized recognizer 420 executes recognition processing based on the image 441b converted from the image 401b by the existing/specialized converter 460, and outputs a specialized recognition output 421.
- the inter-recognition output error calculator 430 obtains the error between the existing recognition output 411 and the specialized recognition output 421, performs calculations to minimize the distance between the existing recognition output 411 and the specialized recognition output 421, and minimizes the distance between the existing recognition output 411 and specialized recognition output 421. Find the error 431 .
- the inter-recognition output error calculation unit 430 feeds back the calculated minimized error 431 to the specialized recognizer 420 by, for example, the error backpropagation method, and updates the specialized recognizer 420 .
- the inter-recognized-output error calculator 430 optimizes the specialized recognizer 420 by re-learning the specialized recognizer 420 so as to reduce the minimization error 431 .
- the recognizer extracts feature values based on the input image and calculates the error with the target feature values. Based on the result of this error calculation, the recognizer is optimized by changing the recognizer so as to minimize the error. Also known is a technique called Deep Dream, which modifies an image so as to minimize the error based on the result of error calculation.
- FIG. 31B is a schematic diagram for explaining Dream Distillation.
- a feature amount is extracted from an image to be recognized by existing recognition processing, and error calculation is performed based on the extracted feature amount.
- extraction optimization processing is performed to optimize the feature quantity so as to reduce the error, and the image is changed based on the optimized feature quantity. That is, extraction optimization processing generates an image that can be easily recognized by an existing recognizer.
- Dream Distillation uses the statistic (centroid) of the target feature vector, and performs error calculation on the statistic of the feature vector plus noise. This makes it possible to obtain a plurality of images by giving variations to the generated images.
- a first method is a method of directly converting the existing recognizer 410 to the specialized recognizer 420 based on the weight of the existing recognizer 410 for the input data.
- the second method is to generate an image based on the existing recognizer 410 and, based on the generated image, optimally convert the existing recognizer 410 to the specialized recognizer 420 within the framework of general machine learning optimization. It is a method to convert to
- the fourth example of the third embodiment employs the second of these methods.
- the recognition image extraction unit 470 extracts images from the existing recognizer 410 .
- This method of extracting an image from the existing recognizer 410 without using the original image is proposed by Non-Patent Document 1 and Non-Patent Document 2, for example.
- Non-Patent Document 1 proposes a method of optimizing an image so that a recognizer generates a statistic (centroid) of a feature vector plus noise.
- Non-Patent Document 2 proposes a method of generating an image by creating a class similarity from the weight of input data held by a recognizer.
- the specialized recognizer 420 is generated based on the image 441b obtained by converting the image 401b extracted based on the existing recognizer 410. are doing. That is, the specialized recognizer 420 is generated using image conversion. Therefore, for example, when the difference in sensor output can be clearly defined as frame-based or non-frame-based, generating the specialized recognizer 420 based on the image directly distinguishes the existing recognizer 410. Compared to the method of converting to the transformation recognizer 420, handling becomes easier. In other words, the image domain is better suited to reflect the physical properties of the sensor compared to the recognizer domain.
- a fifth example of the third embodiment corresponds to case #5 described with reference to FIG. This is an example of generating a specialized recognizer when there is no transformation input data.
- specialized input data is generated by a predetermined method, existing input data is generated based on the generated specialized input data, and then distillation is performed.
- FIG. 32 is a schematic diagram for explaining processing according to the fifth example of the third embodiment.
- the recognition output error calculation unit 430, the specialization/existing conversion unit 461, and the image generation unit 462 are included in the NW conversion unit 311 in the recognizer generation unit 31 of the learning system 3 shown in FIG. 2B.
- the specialization/existing conversion unit 461 has a function of converting the specialization evaluation data 304 in the conversion unit 301 shown in FIG. 2B into the existing evaluation data 303 .
- the function of the specialized/existing conversion unit 461 can also use the function of the conversion unit 301 in the data generation unit 30 .
- an image 441 included in specialized learning data 440 (not shown) is applied as the existing input data.
- the image generator 462 generates an image 441c corresponding to the specialized recognizer 420 by a predetermined method.
- An image generation method by the image generation unit 462 is not particularly limited.
- the image generator 462 may randomly generate the image 441c.
- the image generator 462 may artificially generate the image 411c using a technique such as CG (Computer Graphics).
- the specialized/existing conversion unit 461 converts the image 441c corresponding to the specialized recognizer 420 into the image 401a corresponding to the existing recognizer 410.
- the specialized/existing conversion unit 461 can perform this conversion using, for example, any one of the examples in the first to fourth examples of the second embodiment.
- the existing recognizer 410 performs recognition processing based on the image 401 a converted from the image 441 by the specialization/existing conversion unit 461 and outputs an existing recognition output 411 .
- the specialized recognizer 420 executes recognition processing based on the image 441c and outputs a specialized recognition output 421.
- the inter-recognition output error calculator 430 obtains the error between the existing recognition output 411 and the specialized recognition output 421, performs calculations to minimize the distance between the existing recognition output 411 and the specialized recognition output 421, and minimizes the distance between the existing recognition output 411 and specialized recognition output 421. Find the error 431 .
- the inter-recognition output error calculation unit 430 feeds back the calculated minimized error 431 to the specialized recognizer 420 by, for example, the error backpropagation method, and updates the specialized recognizer 420 .
- the inter-recognized-output error calculator 430 optimizes the specialized recognizer 420 by re-learning the specialized recognizer 420 so as to reduce the minimization error 431 .
- the NW conversion unit 311 uses the first signal for performing recognition processing based on the first signal read from the first sensor that reads in the first readout unit.
- the NW conversion unit 311 performs recognition processing based on the first signal read from the first sensor, based on the output of the first recognizer. It also functions as a conversion unit that learns a second recognizer that performs recognition processing based on a second signal read from a second sensor having different characteristics.
- a network of existing recognizers is converted into a network of specialized recognizers.
- conversion of a network of existing recognizers into a network of specialized recognizers is realized by converting filters used in at least one layer included in the network.
- the explanation is given assuming that the existing recognizer network is a frame-based network, and the specialized recognizer network is a non-frame-based network.
- the network of specialized recognizers may be a network with special signal characteristics for recognition.
- the processing according to each example of the fourth embodiment is the processing of converting the specialized recognizer 312 into the existing recognizer 310 by the NW conversion unit 311 in the recognizer generation unit 31 of the learning system 3 shown in FIG. 2B. handle.
- a first example of the fourth embodiment is an example in which the non-frame-based NW 501 corresponds to specialized learning data 302 by line division.
- the NW converter 311 creates the non-frame-based NW 501 so that the recognition output by the non-frame-based NW 501 substantially matches the recognition output by the frame-based NW 500 .
- FIG. 33 is an example functional block diagram for explaining the function of the NW conversion unit 311a according to the first example of the fourth embodiment.
- the NW conversion unit 311 a includes a filter conversion layer selection unit 510 , a filter conversion unit 511 a and a NW (network) reconstruction unit 512 .
- a frame-based NW 500 corresponding to the existing recognizer 310 in FIG. 2B is input to the NW conversion unit 311a.
- Filter conversion layer selection section 510 selects a layer to be subjected to filter conversion from each layer included in input frame-based NW 500 .
- the filter conversion unit 511a performs conversion processing on the layer selected by the filter conversion layer selection unit 510 in the frame-based NW 500.
- FIG. The filter conversion unit 511a converts, for example, a two-dimensional filter in the layer selected by the filter conversion layer selection unit 510 into a one-dimensional filter.
- the NW reconstruction unit 512 reconstructs the NW based on the filters of each layer converted by the filter conversion unit 511a, and outputs the non-frame-based NW 501 corresponding to the specialized recognizer 312 in FIG. 2B.
- the non-frame-based NW 501 is a NW corresponding to specialized image data by line division.
- FIG. 34 is a schematic diagram for explaining the principle of filter conversion processing in the filter conversion unit 511a. It is known that a two-dimensional filter can be expressed by combining one-dimensional filters. Section (a) of FIG. 34 shows an example of filtering an image using a two-dimensional filter 513 having 3 rows ⁇ 3 columns of coefficients.
- the two-dimensional filter 513 shown in this example consists of a horizontal filter 514 with 1 row by 3 columns of coefficients that performs horizontal (row-wise) convolution, and a vertical and a vertical filter 515 with 3 rows by 1 column of coefficients that performs a (column-wise) convolution.
- horizontal filtering is performed on an image using a horizontal filter 514, and vertical filtering is performed on the result using a vertical filter 515 to obtain a two-dimensional filter shown in section (a).
- a result equivalent to filtering using 513 can be obtained.
- FIG. 35 is a schematic diagram showing a comparison between processing by an existing NW (frame-based NW 500) and processing by a specialized NW (non-frame-based NW 501).
- section (a) shows processing by the existing NW
- section (b) shows processing by the specialized NW according to the fourth embodiment.
- the specialized NW corresponds to the image 530 divided into lines.
- the frame-based NW 500 performs processing by the two-dimensional filter 513 on the frame-based image 520 in layer #1 to calculate the feature amount, and calculates the feature amount in layer #2.
- the feature quantity obtained is compressed to generate a feature quantity 580 .
- the frame-based NW 500 repeatedly executes layer #1 processing and layer #2 processing, and obtains a final output 581a at layer #n.
- the non-frame-based NW 501 decomposes the two-dimensional filter of layer #1 in section (a) into a horizontal filter 514 and a vertical filter 515, which are one-dimensional filters.
- the non-frame-based NW 501 decomposes Layer #1 into Layer #1-1 for processing by horizontal filter 514 and Layer #1-2 for processing by vertical filter 515 .
- the non-frame-based NW 501 performs horizontal filter processing on the non-frame-based image 530 based on line data in layer #1-1, and outputs a feature amount 582a for that one line.
- the non-frame-based NW 501 uses the feature quantity 582a output in layer #1-1 and the feature quantities 582b and 582c output in the past two lines of layer #1-1 in layer #1-2. Apply filtering.
- the non-frame-based NW 501 uses layer #1-2 outputs and layer #1-2 outputs for the past two lines to extract feature amounts 583 of the one line in layer #2.
- the non-frame-based NW 501 repeatedly executes the processing of layers #1-1 and #1-2 and the processing of layer #2, and obtains the final output for the one line in layer #n.
- the non-frame-based NW 501 can obtain an output 581b equivalent to the final output 581a in section (a) by executing this processing on each of the images 530 of all lines included in one frame.
- FIG. 36 is a schematic diagram for explaining processing according to the first example of the fourth embodiment.
- the upper stage shows the processing for the frame-based image 520 by the frame-based NW500
- the lower stage shows the processing for the non-frame-based image 530 by line division in the non-frame-based NW501.
- the output of layer #2 is assumed to be the final output.
- the frame-based NW 500 performs filtering with a two-dimensional filter 513 on layer #1 on an image 520 based on two-dimensional data, and extracts feature amounts for one frame.
- the frame-based NW 500 performs filter processing on the feature amount extracted in layer #1 at layer #2, and outputs a compressed feature amount 521 for one frame.
- the filter conversion layer selection unit 510 selects layer #1 as the layer for filter conversion.
- the filter conversion unit 511a decomposes the two-dimensional filter 513 of layer #1 and converts it into a horizontal filter 514 and a vertical filter 515, which are one-dimensional filters.
- layer #1 is decomposed into layer #1-1 and layer #1-2.
- the non-frame-based NW 501 performs filtering processing with a horizontal filter 514 on layer #1-1 on an image 530 of line data of one-dimensional data divided into lines, and extracts feature amounts for one line.
- the non-frame-based NW 501 is layer #1-2, the feature amount for one line extracted in layer #1-1, the feature amount for two lines extracted in the past in layer #1-1, is filtered by a vertical filter 515 to extract the feature quantity for the one line.
- the non-frame-based NW 501 is layer #2 for the feature amount for one line extracted in layer #1-2 and the feature amount for two lines previously extracted in layer #1-2.
- a feature amount 531 compressed for one line is output after filtering.
- the non-frame-based NW 501 executes this layer #1-1, layer #1-2 and layer #2 processing for all lines of one frame including the image 530.
- FIG. As a result, the non-frame-based NW 501 can obtain a feature amount 531 by all lines of one frame, which is similar to the frame-based feature amount 521 .
- the NW reconstruction unit 512 performs distillation processing based on the frame-based feature amount 521 and the feature amount 531 by all the lines of one frame, and converts the non-frame-based NW 501 so that the feature amount 531 approximates the feature amount 521. to reconfigure. For example, NW reconstruction section 512 adjusts the filter coefficients of the filters of layer #1-1, layer #1-2, and layer #2 to reconstruct non-frame-based NW501.
- each line to be processed can be expressed by multiplication in sequence. That is, there may be a case where a two-dimensional filter cannot be completely decomposed into a one-dimensional filter. In such a case, the two-dimensional filter may be converted to a one-dimensional filter so that the error between the original two-dimensional filter and the two-dimensional filter synthesized from the one-dimensional filters is minimized. .
- the filter conversion layer selection unit 510 selects the first layer, layer #1, as the layer for filter conversion, but this is not limited to this example.
- filter conversion layer selection section 510 can select layer #2 as a layer for filter conversion, or can select layer #1 and layer #2. That is, the filter conversion layer selection unit 510 can select layers to be subjected to filter conversion at any position and number. At this time, the filter conversion layer selection unit 510 can select layers and the number of layers to be subjected to filter conversion so as to optimize recognition accuracy, calculation amount, memory usage, and the like.
- a first modified example of the first example of the fourth embodiment is that the partial NW output of the specialized recognizer is the output of the existing recognizer in the first example of the fourth embodiment described above. This is an example of performing a distillation process so as to match with . More specifically, in the first modification of the first embodiment, the distillation process is performed so that the output of any layer matches between the multiple layers of the frame-based NW 500 and the non-frame-based NW 501 .
- FIG. 37 is a schematic diagram for explaining processing according to the first modification of the first example of the fourth embodiment.
- the output of layer #2 is used as the final output, and the distillation process is performed so that the outputs of the frame-based NW 500 and the non-frame-based NW 501 match.
- the distillation process is performed so that the output of layer #1 before layer #2 matches between frame-based NW 500 and non-frame-based NW 501 respectively.
- the output of layer #2 is assumed to be the final output, as in the example of FIG. 36 described above.
- NW reconstruction section 512 uses feature quantity 521 extracted at layer #1 in frame-based NW 500 and layer #1-2 in non-frame-based NW 501 where layer #1 is decomposed. A distillation process is performed based on the extracted feature quantity 531 and the non-frame base NW 501 is reconstructed so that the feature quantity 531 approximates the feature quantity 521 . For example, NW reconstruction section 512 adjusts the filter coefficients of the filters of layer #1-1 and layer #1-2 to reconstruct non-frame-based NW501.
- which layer's output is to be matched can be selected so as to optimize recognition accuracy, calculation amount, memory usage, and the like. .
- NW reconstruction section 512 converts one line or several lines of feature quantity 531 output from layer #2 of non-frame-based NW 501 and one line or several lines of feature quantity 531 output from layer #2 of frame-based NW 500 Distillation processing is executed based on the feature amount 521 of the frame. At this time, NW reconstruction section 512 reconstructs layer #1-1, layer #1- 2 and/or layer #2 filters to reconstruct the non-frame-based NW 501 .
- the first example of the fourth embodiment and its modifications can be combined with the distillation process according to each example of the third embodiment described with reference to FIGS. 28 to 32. It is possible.
- the processing in the existing recognizer 410 and specialized recognizer 420 described above can be the processing in the frame-based NW 500 and the non-frame-based NW 501, respectively.
- the feature quantities 521 and 531 can be applied as the existing recognition output 411 and the specialized recognition output 421 described above, respectively, and the processing of the NW reconstruction unit 512 can be applied as the processing of the error calculation unit 430 between recognition outputs.
- a second example of the fourth embodiment is an example in which the non-frame-based NW 501 corresponds to specialized learning data 302 by sub-sampling.
- the NW conversion unit 311 performs A non-frame based NW 501 is created.
- FIG. 38 is an example functional block diagram for explaining the function of the NW conversion unit 311b according to the second example of the fourth embodiment.
- the NW conversion unit 311b includes a filter conversion layer selection unit 510, a filter conversion unit 511b, and a NW reconstruction unit 512.
- a frame-based NW 500 corresponding to the existing recognizer 310 in FIG. 2B is input to the NW conversion unit 311b.
- Filter conversion layer selection section 510 selects a layer to be subjected to filter conversion from each layer included in input frame-based NW 500 .
- the filter conversion unit 511b performs conversion processing on the layer selected by the filter conversion layer selection unit 510 in the frame-based NW 500.
- FIG. The filter conversion unit 511b for example, converts the two-dimensional filter in the layer selected by the filter conversion layer selection unit 510 into another two-dimensional filter.
- the NW reconstruction unit 512 reconstructs the NW based on the filters of each layer converted by the filter conversion unit 511b, and outputs the non-frame-based NW 501b corresponding to the specialized recognizer 312 in FIG. 2B.
- the non-frame-based NW 501b is a NW corresponding to specialized image data by sub-sampling.
- FIG. 39 is a schematic diagram for explaining the principle of filter conversion processing by the filter conversion unit 511b.
- filtering is performed on an image 522 of one frame using a two-dimensional filter 516 having coefficients of 4 rows ⁇ 4 columns.
- the filtering process is performed by moving the two-dimensional filter 516 horizontally and vertically by two pixels (stride (2, 2)) on the image 522 .
- each pixel of the image 522 is sub-sampled for each phase P ⁇ #1, P ⁇ #2, P ⁇ #3 and P ⁇ #4.
- P ⁇ #1, P ⁇ #2, P ⁇ #3 and P ⁇ #4 As shown in section (b) of FIG. and 522P ⁇ #4.
- the two-dimensional filter 516 includes filters 517 P ⁇ #1, P ⁇ #1, P ⁇ #1, P ⁇ #1, It can be divided into 517P ⁇ #2, 517P ⁇ #3 and 517P ⁇ #4.
- Each of the filters 517P ⁇ #1, 517P ⁇ #2, 517P ⁇ #3 and 517P ⁇ #4 each apply one pixel horizontally and vertically to each image 522P ⁇ #1, 522P ⁇ #2, 522P ⁇ #3 and 522P ⁇ #4. Perform filter processing by moving (stride(1, 1)) step by step.
- the image 522 is filtered by the two-dimensional filter 516 having 4 rows ⁇ 4 columns of coefficients.
- a processing result equivalent to the case can be obtained.
- FIG. 40 is a schematic diagram showing a comparison between processing by an existing NW (frame-based NW 500) and processing by a specialized NW (non-frame-based NW 501b).
- section (a) shows processing by the existing NW
- section (b) shows processing by the specialized NW according to the fourth embodiment.
- the specialized NW corresponds to a specialized image by sub-sampling for each of phases P ⁇ #1 to P ⁇ #4.
- the frame-based NW 500 performs processing with a two-dimensional filter 516 having 4 rows ⁇ 4 columns of coefficients on the frame-based image 522 in layer #1 to calculate the feature amount, A feature quantity 584 is generated by compressing the feature quantity calculated in layer #1 in layer #2.
- the frame-based NW 500 then repeats layer #1 processing and layer #2 processing to obtain a final output 585a at layer #n.
- Section (b) of FIG. 40 shows the case where the image 522 is sub-sampled at phase P ⁇ #1 out of phases P ⁇ #1, P ⁇ #2, P ⁇ #3 and P ⁇ #4 for explanation.
- the non-frame-based NW 501 applies the two-dimensional filter 516 of layer #1 in section (a) to each phase P ⁇ #1 to P ⁇ #4, each having 2 rows ⁇ 2 columns of coefficients. , into filters 517P ⁇ #1, 517P ⁇ #2, 517P ⁇ #3 and 517P ⁇ #4 corresponding to .
- Section (b) of FIG. 40 shows a case where the image 522 is sub-sampled at phase P ⁇ #1 among the phases P ⁇ #1, P ⁇ #2, P ⁇ #3 and P ⁇ #4 for the sake of explanation.
- the non-frame-based NW 501b performs filter processing with a filter 517P ⁇ #1 on the image 522P ⁇ #1 sub-sampled at the phase P ⁇ #1 on the layer #1, and outputs the feature amount 586P ⁇ #1 of the phase P ⁇ #1. do. Although illustration is omitted, the non-frame-based NW 501b similarly sub-samples the image 522 at phases P ⁇ #2 to P ⁇ #4 at layer #1, resulting in images 522P ⁇ #2 to 522P ⁇ #4 (not shown). are filtered by filters 517P ⁇ #2 to 517P ⁇ #4. The non-frame-based NW 501b outputs feature quantities 586P ⁇ #2, 586P ⁇ #3 and 586P ⁇ #4 of these phases P ⁇ #2, P ⁇ #3 and P ⁇ #4 by this filtering process.
- the non-frame-based NW 501b integrates and compresses the feature quantities 586P ⁇ #1 to 586P ⁇ #4 of the phases P ⁇ #1 to P ⁇ #4 to generate a feature quantity 587 on Layer #2.
- Non-frame-based NW 501b then repeats layer #1 processing and layer #2 processing to obtain final output 585b at layer #n, which is equivalent to output 585a in section (a).
- filtering is performed by the filters 517P ⁇ #1 to 517P ⁇ #4 obtained by decomposing the two-dimensional filter 516 according to the subsample phases P ⁇ #1 to P ⁇ #4. This allows processing for non-frame-based images 522P ⁇ #1 to 522P ⁇ #4 by sub-sampling.
- FIG. 41 is a schematic diagram for explaining processing according to the second example of the fourth embodiment.
- the upper part shows the processing for the frame-based image 520 by the frame-based NW500.
- the lower part shows the processing for each image 540P ⁇ #1 to 540P ⁇ #4 obtained by sub-sampling the image 520 at each phase P ⁇ #1 to P ⁇ #4 in the non-frame-based NW 501 .
- the image 540P ⁇ #1 and its processing are shown, and the images 540P ⁇ #2 to 540P ⁇ #4 and their processing are omitted.
- the frame-based NW 500 performs filtering with a two-dimensional filter 516 on layer #1 on an image 520 based on two-dimensional data, and extracts feature amounts for one frame.
- the frame-based NW 500 performs filter processing on the feature amount extracted in layer #1 at layer #2, and outputs a compressed feature amount 521 for one frame.
- the filter conversion layer selection unit 510 selects layer #1 as the layer for filter conversion.
- the filter conversion unit 511b decomposes the two-dimensional filter 516 of layer #1 and converts it into filters 517P ⁇ #1 to 517P ⁇ #4, which are dimensional filters each having 2 rows ⁇ 2 columns of coefficients.
- the non-frame-based NW 501 performs filtering with a filter 517P ⁇ #1 on the sub-sampled image 540P ⁇ #1 in layer #1, and extracts a feature amount for one subsample with the phase P ⁇ #1.
- the feature amount of one subsample by the phase P ⁇ #1 extracted on layer #1 and three subsamples on layer #1 with other phases P ⁇ #2 to P ⁇ #4 is integrated, the integrated feature amount is subjected to filtering by, for example, a two-dimensional filter, and a compressed feature amount 541 for one frame is extracted.
- the NW reconstruction unit 512 performs distillation processing based on the feature amount 521 and the feature amount 541 for one frame, respectively, and reconstructs the non-frame-based NW 501 so that the feature amount 541 approximates the feature amount 521. .
- NW reconstruction section 512 for example, adjusts filter coefficients of filters 517P ⁇ #1 to 517P ⁇ #4 in layer #1 to reconstruct non-frame-based NW 501. FIG.
- the two-dimensional filter 516 cannot be completely converted into the filters 517P ⁇ #1 to 517P ⁇ #4 of the respective phases P ⁇ #1 to P ⁇ #4 due to mathematical conditions and the like.
- the conversion may be performed so as to minimize the error between the original two-dimensional filter 516 and the two-dimensional filter obtained by synthesizing the filters 517P ⁇ #1 to 517P ⁇ #4.
- the filter conversion layer selection unit 510 selects the layer #1, which is the first layer, as the layer on which filter conversion is to be performed, but this is not limited to this example.
- filter conversion layer selection section 510 can select layer #2 as a layer for filter conversion, or can select layer #1 and layer #2. That is, the filter conversion layer selection unit 510 can select layers to be subjected to filter conversion at any position and number. At this time, the filter conversion layer selection unit 510 can select layers and the number of layers to be subjected to filter conversion so as to optimize recognition accuracy, calculation amount, memory usage, and the like.
- a first modified example of the second example of the fourth embodiment is that, in the above-described second example of the fourth embodiment, the partial NW output of the specialized recognizer is the output of the existing recognizer. This is an example of performing a distillation process so as to match with .
- the feature quantity 541 used for NW reconstruction is all images 522P ⁇ #1 to 522P ⁇ #4 of each phase P ⁇ #1 to P ⁇ #4 by sub-sampling. It was generated using On the other hand, in the first modification of the second embodiment, some of the images 522P ⁇ #1 to 522P ⁇ #4 of the respective phases P ⁇ #1 to P ⁇ #4 are used to obtain the feature amount 541 to generate
- FIG. 42 is a schematic diagram for explaining processing according to the first modification of the second example of the fourth embodiment.
- the upper part shows the processing for the frame-based image 520 by the frame-based NW500.
- the lower part shows the processing for each image 540P ⁇ #1 to 540P ⁇ #4 obtained by sub-sampling the image 520 at each phase P ⁇ #1 to P ⁇ #4 in the non-frame-based NW 501 .
- the images 540P ⁇ #1 to 540P ⁇ #4 are omitted for the sake of explanation.
- the processing in the upper stage is the same as the processing according to the second example of the fourth embodiment described using FIG. 41, so the description is omitted here.
- the filter conversion layer selection unit 510 selects layer #1 as the layer for filter conversion.
- the filter conversion unit 511b decomposes the two-dimensional filter 516 of layer #1 and converts it into filters 517P ⁇ #1 to 517P ⁇ #4, which are dimensional filters each having 2 rows ⁇ 2 columns of coefficients.
- the non-frame-based NW 501 uses only one of the sub-sampled images 540P ⁇ #1-540P ⁇ #4, eg image 540P ⁇ #1.
- the filters 517P ⁇ #1 to 517P ⁇ #4 obtained by decomposing the two-dimensional filter 516 only the filter 517P ⁇ #1 whose phase P ⁇ #1 corresponds to the image 540P ⁇ #1 is used.
- the non-frame-based NW 501 applies filtering to the image P ⁇ #1 using the filter 517 P ⁇ #1, and extracts a feature amount for one subsample based on the phase P ⁇ #1.
- the non-frame-based NW 501 performs filtering, for example, with a two-dimensional filter on the feature amount of one subsample based on the phase P ⁇ #1 extracted in the layer #1.
- a compressed feature amount 541P ⁇ #1 for sub-samples is extracted.
- the NW reconstruction unit 512 performs a distillation process based on the feature amount 521 for one frame and the feature amount 541P ⁇ #1 for one subsample, so that the feature amount 541P ⁇ #1 approximates the feature amount 521. Reconfigure the non-frame-based NW 501 .
- NW reconstruction section 512 for example, adjusts filter coefficients of filters 517P ⁇ #1 to 517P ⁇ #4 in layer #1 to reconstruct non-frame-based NW 501.
- NW reconstruction section 512 reconstructs non-frame-based NW 501 based on feature amount 541P ⁇ #1 output from layer #2, but this is not limited to this example.
- the NW reconfiguring unit 512 reconfigures the non-frame-based NW 501 based on the output of layers after layer #2. .
- FIG. 43 is a schematic diagram for explaining processing according to the second modification of the second example of the fourth embodiment.
- layers up to layer #N after layer #2 are added to the configuration of FIG. 41 described above.
- the upper part shows the processing for the frame-based image 520 by the frame-based NW500.
- the lower part shows the processing for each image 540P ⁇ #1 to 540P ⁇ #4 obtained by sub-sampling the image 520 at each phase P ⁇ #1 to P ⁇ #4 in the non-frame-based NW 501 .
- the image 540P ⁇ #1 and its processing are shown, and the images 540P ⁇ #2 to 540P ⁇ #4 and their processing are omitted.
- the frame-based NW 500 performs filtering with a two-dimensional filter 516 on layer #1 on an image 520 based on two-dimensional data, and extracts feature amounts for one frame.
- the frame-based NW 500 filters the feature amount extracted in layer #1 at layer #2, and outputs the compressed feature amount for one frame to the next layer.
- the frame-based NW 500 applies filtering to the feature amount extracted in the immediately preceding layer, and extracts a compressed feature amount 521 for one frame.
- the non-frame-based NW 501 shows, in layer #N, the feature amount of one subsample by the phase P ⁇ #1 extracted in the immediately preceding layer, and the other phases P ⁇ #2 to P ⁇ # in the immediately preceding layer. 4 is integrated with the feature amount for 3 subsamples.
- the non-frame-based NW 501 in layer #N, filters the integrated feature amount using, for example, a two-dimensional filter, and extracts a compressed feature amount 541 for one frame.
- the NW reconstruction unit 512 performs distillation processing based on the feature amount 521 and the feature amount 541 for one frame, respectively, and reconstructs the non-frame-based NW 501 so that the feature amount 541 approximates the feature amount 521. .
- NW reconstruction section 512 for example, adjusts filter coefficients of filters 517P ⁇ #1 to 517P ⁇ #4 in layer #1 to reconstruct non-frame-based NW 501. FIG.
- the second example of the fourth embodiment and its first and second modifications are the distillation processes according to each example of the third embodiment described with reference to FIGS. 28 to 32. It is possible to implement in combination with In this case, the processing in the existing recognizer 410 and specialized recognizer 420 described above can be the processing in the frame-based NW 500 and the non-frame-based NW 501, respectively. Further, the feature quantities 521 and 531 can be applied as the existing recognition output 411 and the specialized recognition output 421 described above, respectively, and the processing of the NW reconstruction unit 512 can be applied as the processing of the error calculation unit 430 between recognition outputs.
- a third example of the fourth embodiment is an example in which calculations are selectively performed for a region corresponding to the receptive field of an image in the frame-based NW 500, and the frame-based NW 500 is updated and accumulated. .
- the frame-based NW 500 is updated and accumulated.
- the receptive field refers to the range in the image in which the feature amount is affected when calculating the feature amount based on the image. In other words, it can be said that the receptive field is the range of the original image used when calculating the feature amount. It can also be said that the receptive field indicates which area of the original image the feature amount is based on when a certain feature amount is viewed.
- FIG. 44 is an example functional block diagram for explaining the function of the NW conversion unit 311c according to the third example of the fourth embodiment.
- the NW conversion unit 311 c includes a mask processing additional layer selection unit 518 and a mask processing addition unit 519 .
- a frame-based NW 500a corresponding to the existing recognizer 310 in FIG. 2B is input to the NW conversion unit 311c.
- the mask processing addition layer selection unit 518 selects a layer to which mask processing is added from each layer included in the input frame base NW 500a.
- the mask processing addition unit 519 obtains the receptive field of the layer selected by the mask processing additional layer selection unit 518 in the frame-based NW 500a, and adds mask processing for masking areas other than the obtained receptive field to the layer. .
- the frame-based NW 500a to which the mask processing has been added is output from the NW conversion section 311c as the updated frame-based NW 500b.
- FIG. 45A is a schematic diagram for explaining the receptive field.
- a frame-based image 550 is input to frame-based NW 500a.
- the layer #X is selected as the layer to which mask processing is added by the mask processing addition layer selection unit 518.
- FIG. A feature amount 551 is extracted based on the image 550 in layer #X.
- receptive fields 561b and 562b are shown as receptive fields 561b and 562b in the image 550 with respect to regions of interest 561a and 562a, which are regions containing the feature quantity of interest, for example. That is, the feature amounts included in the attention areas 561a and 562a are calculated under the influence of the data included in the receptive fields 561b and 562b in the image 550, respectively.
- FIG. 45B is a schematic diagram for explaining processing according to the third example of the fourth embodiment.
- FIG. 45B shows processing for receptive fields 561b and 562b corresponding to the regions of interest 561a and 562a shown in FIG. 45A, respectively.
- the data of the image 550 are sequentially input to the frame-based NW 500a line by line.
- input data is sequentially stored in the memory, for example, by overwriting.
- feature quantities are calculated based on the stored data.
- the frame-based NW 500a determines which part of the calculated feature amount is affected by the data used in the calculation of the feature amount in the layer #X, that is, the reception in the image 550 affected by the feature amount. You can know the field.
- the feature amount of the attention area 561a is updated in the layer #X.
- the NW conversion unit 311c can detect that the line 552 overlaps the receptive field 561b corresponding to the attention area 561a.
- the mask processing addition unit 519 adds mask processing to the entire area of the feature amount 551 calculated from the image 550, excluding, for example, the area 553 overlapping the attention area 561a. By omitting the calculation of the feature amount for the region to which the mask processing has been added, it is possible to reduce the amount of calculation of the feature amount.
- the NW conversion unit 311c identifies the attention area 561a of the feature amount 551 in the layer #X by calculation in the frame-based NW 500a.
- the NW conversion unit 311c identifies the receptive field 561b in the image 550 for the identified attention area 561a based on the identified attention area 561a.
- the mask processing addition unit 519 in the NW conversion unit 311c adds mask processing to the processing of layer #X for lines included in regions other than the region of the image 550 that overlaps the receptive field 561b.
- the feature amount should be recalculated in an area 553 that overlaps the attention area 561a.
- the feature amount calculation is started from the upper left corner of the area 553, for example.
- the data in the image 550 used for this calculation is pixel data of 3 rows ⁇ 3 columns based on the data of the line 552 and the data of a predetermined area on the left end of, for example, two lines past the line 552 .
- the mask processing addition layer selection unit 518 sequentially selects layers on which mask addition processing is to be performed.
- the receptive field in each layer can be calculated.
- a mask processing addition unit 519 adds mask processing to each layer based on the receptive field obtained for each layer, and limits the area for calculation to an area without a mask.
- the mask processing addition layer selection unit 518 can select one or more arbitrary layers included in the frame base NW 500a as layers to which mask processing is added. At this time, the mask processing addition layer selection unit 518 can select layers and the number thereof to which mask processing is added so as to optimize recognition accuracy, calculation amount, memory usage, and the like.
- an attention area 562 a is specified for the feature amount 551 together with the attention area 561 a.
- Mask processing addition section 519 identifies receptive fields 561b and 562b of image 550 corresponding to respective regions of interest 561a and 562a even when a plurality of regions of interest 561a and 562a exist in feature quantity 551. and masking can be added.
- FIGS. 46A to 46C A fourth example of the fourth embodiment will be described with reference to FIGS. 46A to 46C.
- 46A to 46C the left side of the drawing shows the input side of the NW, and the right side shows the output side.
- FIG. 46A is a schematic diagram schematically showing layer conversion according to the first to third examples of the fourth embodiment described above.
- the first half (eg, layer #1, layer #2) of the frame-based NW (described as the existing NW in the figure) was targeted for conversion.
- the first half of the frame-based NW before conversion is used as the NW (layer) after conversion, and the NW before conversion is used as it is for the second half frame-based NW that is not the target of conversion.
- the range of layers to be converted in the frame-based NW can be adjusted.
- FIG. 46B is a schematic diagram for explaining the first example of the fourth example of the fourth embodiment.
- a non-frame-based NW prepared in advance (denoted as a specialized NW in the figure) is newly added, and the first half of the frame-based NW is replaced with the newly added non-frame-based NW.
- the portion of the frame-based NW before conversion that has been replaced with the non-frame-based NW is discarded.
- the first half of the remaining portion of the frame-based NW replaced with the non-frame-based NW is converted, and the frame-based NW before conversion is used as is for the latter half. Even in this case, the range to be transformed in the frame-based NW can be adjusted.
- FIG. 46C is a schematic diagram for explaining the second example of the fourth example of the fourth embodiment.
- the frame-based NW does not perform layer conversion or the like, and a non-frame-based NW prepared in advance is newly added to the input side of the frame-based NW.
- the example of FIG. 46C is not limited to this example.
- a non-frame-based NW prepared in advance can be newly added to the input side of the NW in which the layer of the first half is converted shown in FIG. 46A.
- the NW conversion unit 311 uses the first sensor for performing recognition processing based on the first signal read from the first sensor that reads in the first read unit.
- a second data set or a first recognizer for performing recognition processing based on a second signal read from a second sensor that reads out the data set or the first recognizer in a second readout unit different from the first readout unit data set or a second recognizer.
- the NW conversion unit 311 performs recognition processing based on the first signal read from the first sensor, based on the output of the first recognizer. It also functions as a conversion unit that converts processing parameters related to recognition processing of the second recognizer that performs recognition processing based on a second signal read from a second sensor having different characteristics.
- FIG. 47 is a functional block diagram of an example for explaining the function of the conversion unit 301j applicable in common to each example of the fifth embodiment.
- the conversion unit 301j includes a plurality of characteristic conversion units 330 1 , 330 2 , . . . , 330 N . . . , 330 N convert the first characteristic, second characteristic, . . .
- the image 60 input to the conversion unit 301j is characteristically converted by each of the characteristic conversion units 330 1 , 330 2 , . . .
- the conversion unit 301j is shown to include three or more characteristic conversion units 330 1 , 330 2 , . . . , 330 N , but this is not limited to this example.
- the conversion section 301j may include only one characteristic conversion section 330 1 or may include two characteristic conversion sections 330 1 and 330 2 . , 330N are represented by the characteristic conversion unit 330 when there is no need to distinguish between the characteristic conversion units 3301 , 3302 , . I do.
- the input image 60 is learning data for the existing recognizer 310, and is, for example, a captured image captured by an existing sensor.
- the output image 61 is an image that can be used as learning data for the specialized recognizer 312 and has characteristics assumed for the learning data applied to the specialized recognizer 312.
- the image 61 is an image whose characteristics are approximated to those of a captured image captured by a specialized sensor corresponding to the specialized recognizer 312, for example.
- the transforming unit 301j transforms the image 60 into the image 61 by transforming pixel characteristics or signal characteristics that cannot be directly transformed.
- the following two types of characteristics can be considered as the characteristics to be converted by the conversion unit 301j.
- the conversion unit 301j performs conversion when a characteristic that can be uniquely converted is included (c).
- the characteristics of the image depend on the characteristics of the sensor that acquires (captures) the image, and the characteristics of the signal in the signal processing for the data of the image 60 or 61 .
- the sensor characteristics on which the image characteristics depend are considered to be (A) light linearity and (B) noise characteristics.
- the (B) noise characteristic specifically includes an SNR (Signal-Noise Ratio) curve and a noise histogram.
- HDR High Dynamic Range Imaging
- E gradation conversion
- F other signal processing. mentioned.
- HDR synthesis is a method of, for example, synthesizing a plurality of images with different exposures to generate an image with a wider dynamic range.
- (C) bit length is the bit length of pixel data, and has different values before and after HDR synthesis and before and after bit compression processing.
- Gradation conversion includes static conversion and dynamic conversion.
- Static conversion includes piecewise linear conversion, gamma conversion, conversion by logarithmic ratio, and the like.
- Dynamic conversion includes local tone mapping that locally changes gradation in an image.
- F Other signal processing includes noise reduction processing, shading correction processing, and white balance processing.
- the conversion without information deterioration is the static gradation conversion expressed by one function in the above (E) gradation conversion. It is possible to convert to characteristics without tone conversion. Examples of such gradation conversion include gamma conversion, conversion according to characteristics obtained by discretely extracting gamma curve values obtained by gamma conversion and performing linear interpolation, and logarithmic conversion. Further, in the above pattern (c), the presence or absence of shading correction among the other signal processing of the above (F) can be converted without deterioration of information.
- the bit length in (C) above is, for example, conversion from 8 (bits) to 24 (bits) (high bit length conversion) can be considered.
- the noise characteristic (B) above for example, regarding the SNR curve, conversion from a low SNR characteristic to a high SNR characteristic is conceivable.
- the gradation conversion of (E) above a process of converting an image subjected to dynamic conversion to an image without gradation conversion is conceivable.
- the noise reduction processing of the other signal processing of (F) above for example, a processing of converting an image with noise reduction processing into an image without noise reduction processing is conceivable, as in the case of the SNR curve.
- a typical signal processing pipeline may be prepared as a preset for each application and for each typical database such as learning data. Also, the preset may be selected using a technique such as machine learning.
- the characteristic conversion according to the fifth embodiment specifically includes the following two types of characteristic conversion processing.
- the first characteristic conversion process is a conversion that approximates the RAW image from sensor A to the RAW image from sensor B.
- the characteristic conversion unit 330 converts the RAW image data from the sensor A so that the SNR approximates that of the RAW image data from the sensor B based on the SNR curve of the sensor B.
- differential noise addition or noise reduction processing may be performed.
- the property conversion unit 330 performs HDR decomposition on the RAW image data from the sensor A, performs property conversion processing on each decomposed image, and HDR synthesizes each image subjected to the property conversion processing. good too.
- the characteristic conversion unit 330 may change the noise distribution of the RAW image data of the sensor A and perform characteristic conversion processing to approximate the noise characteristic of the RAW image data of the sensor A to the noise characteristic of the RAW image data of the sensor B. .
- the second characteristic conversion process is a conversion that approximates a general RGB image to a RAW image from sensor B.
- the property conversion section 330 may, for example, pseudo-generate RAW image data from sensor B from RGB image data.
- the characteristic conversion unit 330 performs addition of differential noise or noise reduction processing on the RGB image data based on the SNR curve of the sensor B so that the SNR approximates that of the RAW image data obtained by the sensor B. you can
- the characteristic conversion unit 330 may apply, for example, noise reduction processing to pseudo RAW image data generated from RGB image data to approximate the pseudo RAW image data to a noiseless state.
- the characteristic conversion section 330 may replace the noise characteristic of the pseudo RAW image data generated from the RGB image data with a previously prepared noise characteristic.
- the characteristic conversion section 330 may estimate the noise characteristic of the pseudo RAW image data generated from the RGB image data by learning. Further, the characteristic conversion unit 330 may prepare a preset of RGB characteristics in advance and estimate the RGB characteristics of target RGB image data.
- FIG. 48 is a schematic diagram for explaining conversion processing relating to optical linearity that can be applied to the first example of the fifth embodiment.
- the sensor output value may not increase linearly.
- optical linearity a nonlinear increase in the sensor output value when the brightness increases linearly.
- section (a) has the output value of sensor A on the vertical axis and the brightness on the horizontal axis, and shows an example of the optical linearity of sensor A with a characteristic line 601 .
- the vertical axis represents the output value of sensor B
- the horizontal axis represents brightness
- a characteristic line 602 shows an example of the optical linearity of sensor B.
- a characteristic line 600 indicates the characteristic when the output value of sensor A or B changes linearly with respect to brightness.
- sensor A and sensor B show different optical linearity, especially in an area where the brightness increases from around a certain brightness. know that you have.
- the vertical axis indicates the output value of sensor B after conversion
- the horizontal axis indicates the output value of sensor A before conversion.
- the change in the converted output value of sensor B corresponds to the change in the output value of sensor A.
- Characteristic converter 330 can obtain the characteristic indicated by characteristic line 603 according to the known characteristics of sensors A and B indicated by characteristic lines 601 and 602, respectively.
- a characteristic line 604 indicates the characteristic when the change in the brightness of the output value of the sensor B before and after the conversion is the same.
- the characteristic conversion unit 330 converts the characteristic of the characteristic line 602 of section (b) according to the characteristic line 603 of section (c) of FIG. This transformation yields the characteristic of sensor B after transformation, indicated by characteristic line 605 in section (d). It can be seen that the characteristic of sensor B indicated by characteristic line 605 approximates the characteristic of sensor A indicated by characteristic line 601 in section (a).
- characteristic conversion section 330 converts the relationship between the brightness and output value of sensor B to the brightness and output value of sensor A. It is possible to transform the characteristics of sensor B so as to approximate the relationship of .
- FIG. 49A is a schematic diagram for explaining an example of conversion processing of an SNR curve that can be applied to the first example of the fifth embodiment;
- section (a) shows examples of changes in SNR with respect to output values of sensors A and B, where the vertical axis represents SNR and the horizontal axis represents sensor output values. On the vertical axis, the noise becomes smaller toward the upper direction.
- the characteristic of SNR change with respect to the output value is called an SNR curve.
- Characteristic line 610 is the SNR curve of sensor A
- characteristic line 611 is the SNR curve of sensor B. This example shows an example where the noise is sensor A>sensor B.
- Section (b) of FIG. 49A shows the difference in the characteristics of sensors A and B indicated by characteristic lines 610 and 611 in section (a).
- the vertical axis indicates the SNR difference ⁇ SNR
- the horizontal axis indicates the sensor output value.
- the noise is sensor A>sensor B, so the difference ⁇ SNR changes on the negative side as indicated by the characteristic line 612 in section (b).
- the characteristic conversion unit 330 can convert the SNR of the sensor B to approximate the SNR of the sensor A by adding noise according to the output value of the sensor B.
- the characteristic conversion unit 330 needs to know a noise model such as a noise histogram.
- FIG. 49B is a schematic diagram for explaining another example of conversion processing of the SNR curve applicable to the first example of the fifth embodiment. Since the meaning of each part of section (a) is the same as that of section (a) of FIG. 49A, description thereof is omitted here.
- characteristic line 610 ′ is the SNR curve of sensor A
- characteristic line 611 ′ is the SNR curve of sensor B. This example shows an example where the noise is sensor B>sensor A.
- Section (b) of FIG. 49B shows the difference in the characteristics of sensors A and B indicated by characteristic lines 610' and 611' in section (a). Since the meaning of each part of section (b) is the same as that of section (b) of FIG. 49A, description thereof will be omitted here.
- the noise is sensor A>sensor B, so the difference ⁇ SNR changes on the positive side as shown by the characteristic line 613 in section (b).
- the characteristic conversion unit 330 can convert the SNR of the sensor B to approximate the SNR of the sensor A by performing noise reduction processing according to the output value of the sensor B.
- FIG. 50 is a schematic diagram for explaining noise histogram conversion processing applicable to the first example of the fifth embodiment.
- Section (a) of FIG. 50 is a graph equivalent to section (a) of FIG.
- the axis is the sensor output value. On the vertical axis, the noise becomes smaller toward the upper direction.
- a characteristic line 610 is the SNR curve of the sensor A
- a characteristic line 611 is the SNR curve of the sensor B.
- FIG. This example shows an example where the noise is sensor A>sensor B.
- Section (b) of FIG. 50 shows an example of a noise histogram for the sensor output value (I 0 ) in section (a) of FIG.
- the vertical axis is the frequency and the horizontal axis is the noise level.
- a characteristic line 606 indicates the noise histogram of sensor A, and a characteristic line 607 indicates the noise histogram of sensor B.
- the characteristic conversion unit 330 adds differential noise to the output of sensor B according to the output value I x of sensor B.
- a transformation that approximates the noise histogram of sensor B to the noise histogram of sensor A is possible.
- bit length conversion process (C) above will be described.
- the bit length conversion process is a conversion process related to the static conversion among the tone conversions of (E) above.
- FIG. 51 is a schematic diagram for explaining bit length conversion processing applicable to the second embodiment of the fifth embodiment.
- the vertical axis indicates the signal value after quantization
- the horizontal axis indicates the signal value (true value) before quantization.
- the right side of FIG. 51 shows an example of the signal value after quantization of sensor A
- the left side shows an example of the signal value of sensor B after quantization.
- the sensor A outputs the true value indicated by the characteristic line 615 as a signal value quantized to a bit length of 16 bits, that is, 16 gradations.
- the sensor B similarly outputs the true value indicated by the characteristic line 615 as a signal value quantized to a bit length of 4 bits, that is, to 4 gradations.
- the characteristic conversion unit 330 can uniquely execute the process of converting the 16-bit output signal value of the sensor A into the 4-bit output signal value of the sensor B. .
- the characteristic conversion unit 330 cannot uniquely execute the process of converting the output signal value of the sensor B into the output signal value of the sensor A.
- the characteristic converter 330 generates an output signal value with a bit length of 16 bits by interpolating or estimating a value between 4 bits of the output signal value of the sensor B, and converts the output signal value of the sensor B to Conversion to approximate the output signal value of sensor A is performed.
- bit length of the data being handled may vary at various points in the signal processing pipeline for image data.
- bit length changes before and after HDR synthesis for image data or before and after bit compression.
- the bit length conversion processing according to the second example of the fifth embodiment can be applied to these locations where the bit length changes.
- FIG. 52 is a schematic diagram for explaining conversion processing for converting image data before HDR synthesis into image data after HDR synthesis, which is applicable to the second embodiment of the fifth embodiment.
- the vertical axis indicates the quantized signal value and the horizontal axis indicates the brightness.
- Section (a) of FIG. 52 is a diagram showing an example of image data before HDR synthesis.
- the signal values are quantized with 4 gradations.
- a long-time exposure with the longest exposure time a short-time exposure with the shortest exposure time
- a medium-time exposure with an intermediate exposure time between the long-time exposure and the short-time exposure Three image data are acquired according to the range of brightness.
- an image obtained by long-time exposure will be referred to as a long-exposure image
- an image obtained by medium-time exposure will be referred to as a medium-exposure image
- an image obtained by short-time exposure will be referred to as a short-exposure image.
- Section (a) shows an example of long-time exposure image data 616L, an example of medium-time exposure image data 616M, and an example of short-time exposure image data 616S, respectively.
- the brightness range of the image data 616L is used as a reference
- the brightness range of the image data 616M is twice the range of the image data 616L
- the brightness range of the image data 616S is the range of the image data 616L. is four times as large as
- Section (b) of FIG. 52 is an example of performing gain adjustment on the image data 616M, 616L and 616S of section (a) in order to perform HDR synthesis.
- the characteristic conversion unit 330 quantizes the signal value with 16 gradations, and increases the gain of the image data 616L by 1 (image data 617L) and the gain of the image data 616M by 2 according to the range of each exposure image. (image data 617M), and the gain of image data 616S is 4 times (image data 617S).
- Section (c) of FIG. 52 shows an example in which the image data 617L, 617M and 617S that have been gain-adjusted in section (b) are selected and synthesized according to brightness.
- the maximum gradation is the 16th gradation
- the minimum gradation is the 0th gradation.
- the characteristic conversion unit 330 selects the image data 617L as the data of the 0th to 3rd gradations for each gradation, as shown as the image data 618L.
- the characteristic conversion unit 330 selects the data of the fourth to sixth gradations every two gradations, as indicated by the image data 618M.
- the characteristic conversion unit 330 selects the image data 617S as the data of the 8th to 16th gradations every 4 gradations, as indicated by the image data 618S.
- the property conversion unit 330 can combine these image data 618L, 618M and 618S to obtain image data after HDR combination.
- the property conversion unit 330 can uniquely convert image data before HDR synthesis into image data after HDR synthesis.
- HDR synthesis algorithm shown in sections (a) to (c) of FIG. 52 is an example, and is not limited to this example.
- FIG. 53 is a schematic diagram for explaining conversion processing for converting image data after HDR synthesis into image data before HDR synthesis, which is applicable to the second embodiment of the fifth embodiment.
- the vertical axis indicates the quantized signal value and the horizontal axis indicates the brightness.
- Section (a) of FIG. 53 is a diagram showing an example of image data after HDR synthesis.
- the data after HDR synthesis is data obtained by synthesizing the image data 618L, 618M, and 618S of the long-exposure image, medium-exposure image, and short-exposure image described in section (c) of FIG. ing.
- Section (b) of FIG. 53 is an example of performing gain adjustment on each of the image data 618L, 618M, and 618S in order to cancel the HDR synthesis and obtain a signal value quantized with a bit length of 4 bits.
- image data 618L has a gain of 1 times the original image data 617L
- image data 618M has a gain of 2 times the original image data 617M
- image data 618S has a gain of 2 times the original image data 617S. It is said that the gain is 4 times that of . Therefore, the characteristic conversion unit 330 generates image data 619L, 619M and 619S by multiplying the gains by 1, 1/2 and 1/4 for the image data 618L, 618M and 618S.
- Section (c) of FIG. 53 shows each image before HDR synthesis by signal values quantized with a bit length of 4 bits, based on each image data 619L, 619M and 619S that have been gain-adjusted in section (b). It shows an example of generated data.
- the image data 619L, 619M, and 619S have data missing portions due to processing during synthesis. This missing portion cannot be generated uniquely due to characteristics such as noise and optical linearity.
- the characteristic conversion section 330 interpolates or estimates data missing areas in the image data 619L, 619M and 619S as indicated by the image data 620L, 620M and 620S, and combines them with the image data 619L, 619M and 619S, respectively. .
- the characteristic conversion unit 330 performs decomposition, gain adjustment, interpolation or estimation of missing portions on the image data after HDR synthesis in this way, thereby converting the image data before HDR synthesis into Each image data can be generated.
- the HDR decomposition processing shown in sections (a) to (c) of FIG. 53 corresponds to the HDR synthesis algorithm described using sections (a) to (c) of FIG. Therefore, when a different algorithm is used as the HDR synthesis algorithm, processing is performed according to the different algorithm.
- FIG. 54 is a schematic diagram showing an example of static tone conversion applicable to the second embodiment of the fifth embodiment.
- the vertical axis indicates gradation after gradation conversion
- the horizontal axis indicates gradation before gradation conversion.
- a characteristic line 630 indicates the characteristic when the gradation is the same before and after conversion.
- Section (a) of FIG. 54 shows an example of the gradation conversion function 631 for sensor A. Also, section (b) shows an example of the gradation conversion function 632 for sensor B. FIG. It can be seen that the gradation conversion function 631 and the gradation conversion function 632 perform different gradation conversions.
- the characteristic conversion unit 330 can convert the gradation characteristics of the output signal of sensor B to approximate the gradation characteristics of the output signal of sensor A, for example. is.
- FIG. 55 is a schematic diagram showing an example of shading correction applicable to the second embodiment of the fifth embodiment.
- section (a) shows an example of an image 640 based on the sensor A output signal
- section (b) shows an example of an image 641 based on the sensor B output signal.
- the lower part shows an example of the relationship between the position on the A-A' line of the image 640 shown in the upper part and the level due to gain or offset.
- the sensor A has a shading characteristic such that the peripheral portion of the image 640 has low luminance and the central portion has high luminance.
- the lower part shows an example of the relationship between the position on the line B-B' and the level of the image 641 before conversion by shading correction shown in the upper part.
- the sensor B has a shading characteristic in which the brightness is high at the left end of the drawing and becomes low toward the right end in the image 641 .
- section (c) shows an example of coefficients for converting the shading characteristics of the image 641 captured by sensor B before conversion into the shading characteristics of the image 640 captured by sensor A.
- the characteristic converter 330 can obtain the shading correction value indicated by the characteristic line 652 in section (c) by subtracting the shading characteristic value of the characteristic line 650 from the shading characteristic value of the characteristic line 651 .
- the characteristic conversion unit 330 applies the shading correction value indicated by the characteristic line 652 to the shading characteristic indicated by the characteristic line 561 to obtain the characteristic line 650 by the sensor A as indicated by the characteristic line 650′ in section (d). can obtain shading characteristics approximated to
- the shading characteristics of sensor A and the shading characteristics of sensor B are known, the shading characteristics of sensor B can be converted to approximate the shading characteristics of sensor A.
- the conversion unit 301j performs recognition processing based on the signal read from the first sensor having the first pixel characteristic or the first signal characteristic. for subjecting the first recognizer or the first data set to a recognition process based on a second pixel characteristic that differs from the first pixel characteristic or a second signal characteristic that differs from the first signal characteristic; , or a converter that converts to a second data set.
- the conversion unit 301j includes a first recognizer for learning a first recognizer that performs recognition processing based on the first signal read out from the first sensor in the first readout unit.
- the processing according to the sixth embodiment is the inverse processing of the processing according to each example of the fifth embodiment described above. That is, the processing according to the sixth embodiment corresponds to the processing of converting the specialized evaluation data 304 into the existing evaluation data 303 by the conversion unit 301 in the data generation unit 30 of the learning system 3 shown in FIG. 2B.
- the configuration of the conversion unit 301j described using FIG. 47 can be applied as the conversion unit 301 that performs the conversion.
- the image 60 input to the conversion unit 301j is an image based on the specialized evaluation data 304 acquired by the specialized recognition sensor.
- the image 61 output from the conversion unit 301 j is an image in which the specialized evaluation data 304 is approximated to the existing evaluation data 303 .
- Each example of the fifth embodiment described above can be applied to the sixth embodiment after exchanging the input data and the output data for the conversion unit 301j.
- the existing learning data 300 and the image 60 can be applied to the input data
- the specialized learning data 302 and the image 61 can be applied to the output data.
- bit length conversion processing (see FIG. 51) and conversion processing in HDR synthesis (FIGS. 52 and 53), static tone conversion processing (see FIG. 54), and shading correction processing (see FIG. 55) can be applied.
- the conversion unit 301j performs recognition processing based on the signal read from the first sensor having the first pixel characteristic or the first signal characteristic, for subjecting the first recognizer or the first data set to a recognition process based on a second pixel characteristic that differs from the first pixel characteristic or a second signal characteristic that differs from the first signal characteristic; , or a converter that converts to a second data set.
- the conversion unit 301j is based on the second signal read from the second sensor that is different from the first sensor in at least one of the readout unit, the pixel characteristic, and the signal characteristic, It also functions as a generator that generates a signal corresponding to the first signal read from the first sensor.
- a seventh embodiment of the present disclosure will be described.
- a network of specialized recognizers is generated based on a network of existing recognizers. That is, in the seventh embodiment, similar to the above-described third embodiment, the network of the existing recognizer and the network of the specialized recognizer have the same output for the frame-based network and the non-frame-based network. Train a specialized recognizer so that
- the explanation is given assuming that the existing recognizer network is a frame-based network, and the specialized recognizer network is a non-frame-based network.
- the network of specialized recognizers may be a network with special signal characteristics for recognition.
- the specialized recognizer is replaced with the existing recognizer or Generated based on other data.
- the process according to the seventh embodiment corresponds to the process of converting the existing recognizer 310 into the specialized recognizer 312 by the NW converter 311 in the recognizer generator 31 of the learning system 3 shown in FIG. 2B.
- case #1 in FIG. A specialized recognizer is trained by ordinary distillation.
- the process for case #1 can apply the distillation process described with reference to FIG. 28 as the first example of the third embodiment, so the description is omitted here.
- the seventh embodiment if there are an existing recognizer, existing correct data, and specialized correct data of case #4 in FIG. Generate input data and generate specialized input data based on the generated existing input data. After generating the existing input data and the specialized input data in this manner, a distillation process is performed to generate a specialized recognizer.
- the processing for this case #4 is the distillation processing using the existing image generated based on the existing recognizer and the specialized image, which was described using FIGS. 31A and 31B as the fourth example of the third embodiment. can be applied, so the description here is omitted.
- the seventh embodiment it is possible to easily provide a specialized recognizer to a user who has an existing recognizer network but does not have a specialized recognizer network. Become.
- the NW conversion unit 311 performs recognition processing based on the signal read from the first sensor having the first pixel characteristic or the first signal characteristic.
- the first recognizer or the first data set for recognition processing based on a second pixel characteristic different from the first pixel characteristic or a second signal characteristic different from the first signal characteristic; It functions as a recognizer of 2 or a transformer that transforms to a second dataset.
- the NW conversion unit 311 performs recognition processing based on the first signal read from the first sensor, based on the output of the first recognizer. It also functions as a conversion unit that learns a second recognizer that performs recognition processing based on a second signal read from a second sensor having different characteristics.
- a first example of the eighth embodiment is an example of adding preprocessing to the specialized recognizer so as to approximate the output of the existing recognizer to the output of the specialized recognizer.
- each process according to the sixth embodiment described above can be applied.
- each process according to the sixth embodiment is a reverse process of each process according to the fifth embodiment. Therefore, as the preprocessing for the specialized recognizer in the first example of the eighth embodiment, the reverse processing of each example can be applied to the fifth embodiment described above.
- the preprocessing converts the specialized evaluation data 304 into the existing evaluation data 303 by the conversion unit 301 in the data generation unit 30 of the learning system 3 shown in FIG. 2B, for example.
- the conversion unit 301 that performs conversion related to the preprocessing the configuration of the conversion unit 301j described using FIG. 47 can be applied.
- bit length conversion processing (see FIG. 51) and conversion processing in HDR synthesis (FIGS. 52 and 53), static tone conversion processing (see FIG. 54), and shading correction processing (see FIG. 55) can be applied.
- the data corresponding to the specialized recognizer is converted into the data corresponding to the existing recognizer by the preprocessing for the existing recognizer, and this converted image You are inputting data into an existing recognizer. Therefore, the output of the existing recognizer can be approximated to the output of the specialized recognizer.
- FIG. 56 is a schematic diagram for schematically explaining the processing according to the second example of the eighth embodiment.
- Sections (a) and (b) of FIG. 56 schematically show some of the existing recognizers.
- the existing recognizer includes layers 570a 1 , 570a 2 , .
- layers 570a 1 and 570a 2 are also indicated as layer #1 and layer #2, respectively.
- These layers 570a 1 , 570a 2 , . . . are all NW layers for normal characteristics corresponding to frame-based data.
- Layer 570a 1 includes filter 571a 1 , batch normalization 572a 1 , activation function 573a 1 , .
- layer 570a 2 includes filter 571a 2 , batch normalization 572a 2 , activation function 573a 2 , .
- batch normalization is indicated as BN.
- Section (a) of FIG. 56 shows a case where normal characteristic data is input to layer 570a 1 .
- Typical characteristic data is frame-based image data, for example output from an existing sensor.
- the layer 570a 1 subjects the input normal characteristic data to processing by a filter 571a 1 , a batch normalization 572a 1 and an activation function 573a 1 , and outputs an intermediate output #1-1.
- Intermediate output #1-1 output from layer 570a 1 is input to layer 570a 2 .
- the layer 570a 2 performs each process on the input intermediate output #1-1 by the filter 571a 2 , the batch normalization 572a 2 and the activation function 573a 2 to obtain the intermediate output # 2 is output.
- Section (b) of FIG. 56 shows a case where specialized characteristic data is input to layer 570a 1 .
- Specialized characteristic data is non-frame-based image data output from, for example, recognition specialized sensors.
- Layer 570a 1 performs each processing by filter 571a 1 , batch normalization 572a 1 and activation function 573a 1 on the input specialized characteristic data, and outputs intermediate output #1-2. This intermediate output #1-2 is different from the intermediate output #1-1 in section (a).
- filter 571a 1 In a second example of the eighth embodiment , filter 571a 1 , batch normalization 572a 1 and activation function At least one coefficient of 573a 1 is changed.
- Section (c) of FIG. 56 shows an example of layer 570b in which the coefficients of filter 571a 1 , batch normalization 572a 1 and activation function 573a 1 are modified in layer 570a 1 .
- Layer 570b includes filter 571a 1 , batch normalization 572a 1 and activation function 573a 1 coefficient modified filter 571b, batch normalization 572b and activation function 573b.
- the layer 570b can be considered to be the layer 570a 1 in the NW for normal characteristics converted to the layer in the NW for special characteristics.
- intermediate outputs #1-3 output from layer 570b in which at least one of filter 571b, batch normalization 572b, and activation function 573b has changed coefficients are approximated to intermediate outputs #1-2. become a thing.
- the filter 571b, the batch normalization 572b, and the activation function 573b in the layer 570b are all transformed, but this is for explanation purposes only. , but not limited to this example. That is, in layer 570b, at least one of filter 571b, batch normalization 572b, and activation function 573b should have its coefficient changed.
- FIG. 57 is an example functional block diagram for explaining the function of the NW conversion unit 311d applicable to the second example of the eighth embodiment.
- NW conversion section 311d includes coefficient conversion section 575 and characteristic analysis section 576 .
- NW 502 for normal characteristics is input to coefficient conversion section 575 .
- the NW 502 for normal characteristics includes, for example, the layers 570a 1 , 570a 2 , .
- the normal characteristic data and the specialized characteristic data are input to characteristic analysis section 576 .
- the characteristic analysis unit 576 analyzes the input normal characteristic data and special characteristic data. Based on the analysis result of the characteristic analysis unit 576, the coefficient conversion unit 575 converts the filter 571a 1 included in the layer 570a 1 included in the input normal characteristic NW 502, the batch normalization 572a 1 and the activation function 573a 1 Change the coefficients in at least one.
- the coefficient conversion unit 575 outputs the NW in which the coefficient in the layer 570a 1 is changed in the NW 502 for normal characteristics as the NW 503 for special characteristics.
- the coefficient conversion unit 575 can change the filter coefficient of the filter 571a 1 to 1/N times.
- the analysis result of the characteristic analysis unit 576 indicates that the normal characteristic data is a 3-channel signal of RGB data, and that the specialized characteristic data is 1-channel data of only Y (luminance).
- the coefficient conversion unit 575 can change the filter coefficient of the filter 571a 1 from the coefficient for 3 channels to the coefficient for 1 channel.
- the analysis result of the characteristic analysis unit 576 indicates that the frequency characteristic of the signal based on the normal characteristic data is different from the frequency characteristic of the signal based on the specialized characteristic data. For example, if the analysis result of the characteristic analysis unit 576 indicates that the signal based on the specialized characteristic data is low-band amplified with respect to the signal based on the normal characteristic data, the coefficient conversion unit 575 causes the filter 571a 1 to can be multiplied by a filter that provides low-frequency reduction.
- the coefficient conversion section 575 causes the filter 571a 1 to On the other hand, it can be multiplied by a filter that performs high-frequency reduction.
- coefficient conversion is performed so that intermediate output #1-1 itself matches intermediate output #1-2, but this is not limited to this example.
- the coefficients of batch normalization 572a 1 may be changed so that intermediate output #1-2 and intermediate output #1-1 have matching statistics. More specifically, coefficient conversion section 575 converts the “average value/variance value” of the feature amount of intermediate output #1-1 to the feature amount of intermediate output #1-2 according to the batch normalization shown in the following equation (1).
- the coefficient of batch normalization 572a 1 can be changed to match the "mean/variance" of .
- F out indicates the feature amount after batch normalization
- F in indicates the feature amount before batch normalization
- AVG(F in ) indicates the average value of the feature amount in the database
- ⁇ (F in ) indicates the variance value of the feature amount in the database.
- Gain indicates a gain term
- Offset indicates an offset term.
- the database is a database of normal characteristic data or special characteristic data.
- the characteristic analysis unit 576 performs the calculation of Equation (1) for each of the normal characteristic data and the specialized characteristic data, and obtains the characteristic amount F out based on the normal characteristic data and the characteristic amount F out based on the specialized characteristic data.
- the coefficient conversion unit 575 performs AVG(F in ) and ⁇ (F in ).
- normalization processing in layers is not limited to batch normalization.
- normalization processes such as group normalization, layer normalization, instance normalization, etc. can be applied.
- the coefficients in the layers included in the existing recognizer network are changed based on the analysis results of the normal characteristic data and the specialized characteristic data. Therefore, the output of the existing recognizer can be approximated to the output of the specialized recognizer.
- the existing recognizer network is converted into a specialized recognizer network by changing the layers or filters included in the existing recognizer network.
- FIG. 58 is a schematic diagram for schematically explaining the processing according to the third example of the eighth embodiment. Sections (a) and (b) of FIG. 58 are the same as sections (a) and (b) of FIG. 56 described above, and will not be described in detail here.
- filter 571a 1 In a third example of the eighth embodiment , filter 571a 1 , batch normalization 572a 1 and activation function 573a 1 is changed.
- Section (c) of FIG. 58 shows an example of layer 570c in which filter 571a 1 , batch normalization 572a 1 and activation function 573a 1 are modified in layer 570a 1 .
- Layer 570b includes filter 571a 1 , batch normalization 572a 1 and activation function 573a 1 modified filter 571c, batch normalization 572c and activation function 573c.
- the layer 570c can be considered to be the layer 570a 1 in the NW for normal characteristics converted to the layer in the NW for special characteristics.
- intermediate outputs #1-4 output from layer 570c in which at least one of filter 571c, batch normalization 572c, and activation function 573c have been modified are approximated to intermediate outputs #1-2. Become.
- layer 570c filter 571c, batch normalization 572c, and activation function 573c are all shown modified from layer 570a 1 , but this is for illustration purposes only. Therefore, it is not limited to this example. That is, layer 570c may have at least one of filter 571c, batch normalization 572c, and activation function 573c changed from layer 570a 1 .
- FIG. 59 is an example functional block diagram for explaining the function of the NW conversion unit 311e applicable to the third example of the eighth embodiment.
- NW converter 311 e includes layer converter 577 and characteristic analyzer 576 .
- NW 502 for normal characteristics is input to layer conversion section 577 .
- the NW 502 for normal characteristics includes, for example, the layers 570a 1 , 570a 2 , .
- the normal characteristic data and the specialized characteristic data are input to characteristic analysis section 576 .
- the characteristic analysis unit 576 analyzes the input normal characteristic data and special characteristic data. Based on the analysis result by the characteristic analysis unit 576, the layer conversion unit 577 converts each element included in the layer 570a 1 included in the input normal characteristic NW 502, that is, the filter 571a 1 , the batch normalization 572a 1 and the activation function 573a 1 is changed.
- the layer conversion unit 577 outputs the NW in which the element in the layer 570a 1 is changed in the normal characteristic NW 502 as the special characteristic NW 503 .
- the layer conversion unit 577 can change the activation function 573a 1 of the layer 570a 1 to the exponential response activation function 573c.
- the layer conversion unit 577 may add an exponential response activation function to the first stage.
- the layer conversion unit 577 may change the activation function 573a 1 to an approximation function that approximates an exponential response.
- the layer conversion unit 577 can change the activation function 573a 1 of the layer 570a 1 to a logarithmic response activation function 573c.
- the layer conversion section 577 may add a logarithmic response activation function to the first stage.
- the layer conversion unit 577 may change the activation function 573a 1 to an approximation function that approximates a logarithmic response.
- the layer elements included in the existing recognizer network are changed based on the analysis results of the normal characteristic data and the specialized characteristic data. Therefore, the output of the existing recognizer can be approximated to the output of the specialized recognizer.
- the conversion unit 301j and the NW conversion units 311d and 311e according to the eighth embodiment use the signal read from the first sensor having the first pixel characteristic or the first signal characteristic, a first recognizer or first dataset based on a second pixel characteristic that differs from the first pixel characteristic or a second signal characteristic that differs from the first signal characteristic, for performing recognition processing based on It functions as a second recognizer or converter that converts to a second data set for recognition processing.
- the conversion unit 301j and the NW conversion units 311d and 311e according to the eighth embodiment use the output of the first recognizer that performs recognition processing based on the first signal read from the first sensor.
- a conversion unit that converts a processing parameter related to the recognition processing of a second recognizer that performs recognition processing based on a second signal read from a second sensor having a characteristic different from that of the first sensor also works.
- a ninth embodiment of the present disclosure will be described.
- a control rule for executing recognition processing by a specialized recognizer is generated based on existing learning data for the existing recognizer.
- the processing according to the first example of the ninth embodiment is processing for generating a specialized control rule 313 based on the existing learning data 300 by the conversion unit 301 in the data generation unit 30 of the learning system 3 shown in FIG. 2B. becomes. More specifically, in the first example of the ninth embodiment, the conversion unit 301 obtains a statistic based on the existing learning data 300 .
- FIG. 60 is a schematic diagram for schematically explaining the processing according to the first example of the ninth embodiment.
- transforming section 301 k includes statistic estimation section 700 .
- Existing learning data 400 is input to the conversion unit 301k. It should be noted that hereinafter, unless otherwise specified, the existing learning data 400 includes a plurality of existing learning data each composed of a combination of the image 401 and the correct data 402 . For example, the existing learning data 400 here refers to all of the plurality of existing learning data stored in the database.
- the statistic estimation unit 700 calculates the statistic based on the information within the range indicated by the control range 710 in the existing learning data 400 according to the control range 710 for the specialized recognizer that is the target of the control rule. 711 is estimated. Although the details will be described later, the data generation unit 30 generates a control rule for controlling the specialized recognizer based on this statistic 711 .
- the type of statistics estimated by the statistics estimation unit 700 is not particularly limited as long as it is general.
- the statistic estimation unit 700 calculates a statistic 711 suitable for controlling the specialized recognizer based on the existing learning data 400 and the control range 710 .
- a first example of the first embodiment is an example of obtaining the statistic 711 based on information for each line.
- FIG. 61 is a schematic diagram for explaining processing according to the first example of the first example of the ninth embodiment.
- the transformation unit 301k-1 includes a statistic estimation unit 700a.
- the sub-sampling line control range 712 indicates, for example, the range in which sub-sampling (line division) is performed for each line within one frame in units of lines.
- the statistic estimation unit 700a obtains a statistic 711a within the range indicated by the sub-sample line control range 712. For example, in the existing learning data 400a, when the position of the target object in each image 401 is described in each corresponding correct data 402, the statistic estimation unit 700a determines which position of each image 401 includes the target object. It is possible to estimate whether
- Section (b) of FIG. 61 shows an example of the statistic 711a obtained by the statistic estimator 700a.
- the vertical axis is the line
- the horizontal axis is the frequency
- the statistic 711a indicates the appearance frequency of the target object for each line.
- the target object appears frequently in the upper and lower portions of the image 401, and appears less frequently in the central portion.
- the recognizer can control which part of the imaged image of one frame is to be focused on for recognition processing.
- a second example of the first embodiment is an example of obtaining a brightness change model as a statistic according to the brightness of each image 70 included in the existing learning data 400 .
- FIG. 62 is a schematic diagram for explaining processing according to the second example of the first example of the ninth embodiment.
- transforming section 301k-2 includes statistic estimating section 700b and brightness estimating section 714.
- the learning data 70b also includes each image 401 and each correct data 402 arranged in chronological order.
- the brightness estimation unit 714 estimates the brightness of each image 401 based on each image 401 and each correct data 402 .
- Each image 401 may include a mixture of brightness information and non-brightness information.
- the brightness estimation unit 714 estimates the change in brightness of each image 401 in time series, and obtains the adjustment range of brightness based on the estimated change in brightness.
- the brightness estimator 714 passes the obtained brightness adjustment range as a gain control range 713 to the statistic estimator 700b.
- the statistic estimation unit 700b obtains the statistic from the existing learning data 400b, for example, as described with reference to FIG. 61 in the first example of the first embodiment. generates a brightness variation model 715 for estimating brightness variation. That is, the statistic estimation unit 700b generates a brightness change model 715 based on the time-series information for the brightness distribution within one frame. The recognizer can use this brightness variation model 715 to control brightness (eg, sensor gain) online.
- brightness variation model 715 for estimating brightness variation. That is, the statistic estimation unit 700b generates a brightness change model 715 based on the time-series information for the brightness distribution within one frame.
- the recognizer can use this brightness variation model 715 to control brightness (eg, sensor gain) online.
- a second example of the ninth embodiment is an example of performing scheduling control using the statistics 711 generated in the first example of the ninth embodiment described above.
- FIG. 63 is a schematic diagram for schematically explaining the control processing according to the second example of the ninth embodiment.
- the conversion section 301l includes a scheduling section 740.
- the scheduling unit 740 performs control for controlling the specialized recognizer or the specialized recognition sensor based on the statistics 711 generated by the conversion unit 301k according to the first example of the ninth embodiment described above, for example. Generate directive 741 .
- the imaging control unit 13 may control the imaging operation by the imaging unit 11 according to the control command 741.
- the recognition unit 20 may control recognition processing according to the control command 741 .
- FIG. 64 is a schematic diagram for explaining processing according to the first example of the second example of the ninth embodiment.
- the scheduling unit 740a performs line control based on the statistic 711a obtained from the information for each line described using FIG.
- the scheduling unit 740a schedules line control according to the appearance frequency distribution indicated by the statistic 711a, and generates a control command 741a for commanding control of, for example, a recognition specialized sensor and a specialized recognizer.
- the scheduling unit 740a generates a control command 741a for controlling, for example, the interval between lines to be read according to the appearance frequency of the target object based on the statistic 711a.
- This control command 741a is applied to, for example, the imaging control unit 13 and the recognition unit 20 in FIG. 2A.
- FIG. 65 is a schematic diagram for explaining processing according to the second example of the second example of the ninth embodiment.
- the scheduling unit 740b adds a random element to the input statistic 711 according to the randomness information 742 to generate the control command 741b.
- the recognition process controlled by the control command 741 generated according to the statistics based on the learning data is vulnerable to changes in the input data, for example. Therefore, by including a random element in the control command 741 and controlling, for example, reading out randomly specified lines in the frame, it is possible to improve the strength against changes in input data and the like. .
- FIG. 66 is a schematic diagram for explaining processing according to the third example of the second example of the ninth embodiment.
- the scheduling unit 740c generates a control command 741c based on the statistic 711 and the sub-sample line control constraint information 743.
- FIG. 66 is a schematic diagram for explaining processing according to the third example of the second example of the ninth embodiment.
- the scheduling unit 740c generates a control command 741c based on the statistic 711 and the sub-sample line control constraint information 743.
- the sub-sample line control constraint information 743 is a constraint condition that cannot be expressed by the statistic 711.
- the statistic 711a shown in FIG. 67 there is a case where it is desired to perform duplicate readout of the same line in a line range in which the distribution of the appearance frequency of the target object is high. In this case, it is not possible to perform different exposures overlapping in time on the same line, so it is necessary to consider hardware readout control.
- the scheduling unit 740 c can reflect such constraints related to hardware control in the control based on the statistic 711 using the sub-sample line control constraint information 743 .
- FIG. 68 is a sequence diagram for explaining read control applicable to the third example of the second example of the ninth embodiment.
- Section (a) of FIG. 68 shows the first read control
- section (b) shows the second read control.
- the vertical axis indicates lines and the horizontal axis indicates time.
- the imaging control unit 13 controls the imaging operation of the imaging unit 11 through the first control or the second control according to the control command 741c generated by the scheduling unit 740c.
- the first read control according to section (a) of FIG. 68 will be described.
- the scheduling unit 740c generates a control command 741c that instructs the imaging control unit 13 to start the second exposure of the target line after the first exposure of the target line is completed. do.
- readout control by the control command 741c that performs the first readout control is as follows. Referring to section (a) of FIG. 68, exposure is started on the target line at time t 0 and finished at time t 1 . From the time t1 when the exposure ends, it becomes possible to read the pixel signal from each pixel of the target line. From the time t2 when reading from the target line is finished, the second exposure for the target line can be started. The exposure of the line next to the target line can be started from the time t3 when the second exposure and readout are completed.
- the second read control according to section (b) of FIG. 68 will be described.
- the scheduling unit 740c starts exposure of the target line and then sequentially starts exposure of each line.
- a control command 741c is generated to instruct the imaging control unit 13 to start re-exposure of .
- the re-exposure of the target line is performed by interrupting the successive exposure of each line.
- readout control by the control command 741c that performs the second readout control is as follows. Referring to section (b) of FIG. 68, exposure is started on line #1, which is the target line, at time t 0 . Each line L#2, L#3, L#4, L#5, L in sequence at times t20 , t21 , t22 , t23 , t24 , ... at predetermined intervals from this time t0 . Exposure of #6, . . . is started. The exposure interval of each line L#2, L#3, . . . corresponds to, for example, the frame rate and the number of lines in one frame.
- line L#1 exposure ends at time t11 and reading starts.
- the line L#1 can be re-exposed.
- the time t 12 is the time after the time t 24 when the exposure of the line L#6 is started and the time before the time when the exposure of the line L#7 (not shown) is originally started. Therefore, the re-exposure of line L#1 is interrupted between the exposure of line L#6 and the exposure of line L#7.
- each time one line is re-exposed a delay corresponding to the time from the exposure start time t0 to the readout end time t2 for the exposure occurs.
- the second readout control exposes other lines during the exposure waiting time required for re-exposure, so the overall delay can be shortened compared to the first readout control.
- a second example of the ninth embodiment is an example of generating control learning data for learning control of a recognizer based on existing learning data.
- FIG. 69 is a schematic diagram for explaining the principle of processing according to the third example of the ninth embodiment.
- conversion section 301p includes control learning data generation section 720 .
- the image 401 and the correct data 402 included in the existing learning data 400c are generally data that have been observed or observed.
- the control learning data generation unit 720 generates control learning data 721 for the recognizer to learn control, for example, based on the existing learning data 400c. At this time, the control learning data generator 720 needs to generate the control learning data 721 so that it can be observed during learning.
- FIG. 70 is a schematic diagram for more specifically explaining the processing according to the third example of the ninth embodiment.
- the conversion unit 301q includes an image transformation unit 730, a sampling unit 731, a control learning unit 733, a control generation unit 734, and a time series generation unit 735.
- data can be generated interactively in response to control learning requests.
- the time-series generation unit 735 generates information for reflecting the time-series on the image based on the time-series information 737 and the control information passed from the control generation unit 734 .
- the time-series generation unit 735 generates movement information in an image, for example, as the information.
- the time-series generation unit 735 generates the movement information, for example, using the movement information 41 using the camera movement information 41 described in the fourth example of the first embodiment with reference to FIGS. 17A and 17B.
- Information generation methods can be applied.
- the time-series generation unit 735 generates movement information using the subject movement information 75 described in the fifth example of the first embodiment with reference to FIGS. 18A and 18B. generation method can be applied.
- the image transformation unit 730 transforms the image 401 in the existing learning data 400c and the correct data 402 using interpolation or the like based on the movement information in the images generated by the time-series generation unit 735.
- the image transformation unit 730 passes the existing learning data 400 c that has undergone transformation processing to the sampling unit 731 .
- the sampling unit 731 samples the existing learning data 400c passed from the image transforming unit 730 according to the control information generated by the control generating unit 734 . As a result, the sampling unit 731 acquires data (images) to be learned by the control learning unit 733 in the existing learning data 400c.
- the control learning unit 733 learns control by the controller (control generation unit 734) based on the control result image 732 in a predetermined control range 736.
- the control generation unit 734 generates control information for controlling sampling by the sampling unit 731 according to control learning by the control learning unit 733 based on the control result image 732 .
- the control generation unit 734 passes the generated control information to the time series generation unit 735 and the sampling unit 731 .
- FIG. 71 is a schematic diagram for explaining control information generated by the control generation unit 734 in the third example of the ninth embodiment.
- the control information includes, as an information type, information indicating the position (line) and time (timing) at which the sampling unit 731 performs sub-sampling. At this time, the range of positions for sub-sampling is predetermined by the control range 736 .
- the control generation unit 734 generates the first, second, third and fourth lines at times #1, #2, #3 and #4 arranged in chronological order, respectively. Generates control information indicating control for performing subsampling of .
- the control information further includes information indicating the position and timing of sub-sampling in learning based on the existing learning data 400c.
- the control generation unit 734 determines the sub-sampling position and timing during this authentication process through control learning by the control learning unit 733 . For example, the control generation unit 734 performs subsampling of each row of the first to fourth rows during the recognition process on the x 1st row, the x 2nd row, and the x 3rd row, which are determined by control learning. Generate control information to be executed at each timing of the th and x 4th lines.
- the x 1st row, the x 2nd row, the x 3rd row, and the x 4th row can be applied in any order to the 1st to 4th rows at the time of subsampling.
- FIG. 72 is a schematic diagram for explaining learning processing in the third example of the ninth embodiment.
- the control learning section 733 causes the control generation section 734 to learn based on the control result image 732 .
- the control generation unit 734 designates lines within the range indicated by the control range 736 according to this learning as lines to be sampled, and the sampling unit 731 performs sub-sampling of the lines according to this designation to generate a control result image. 732 is obtained.
- the control learning unit 733 causes the control generation unit 734 to learn based on this control result image 732 .
- control by the conversion unit 301q may be generated in advance or freely generated online.
- the existing learning data 400c is sampled based on the results of learning using the sampled control result image. Therefore, the control generation unit 734 can generate control learning data based on the results of interactive learning.
- control learning data is collected using a dummy control rule for executing recognition processing by a specialized recognizer, and then learning using the control learning data is performed using a dummy control rule.
- This is an example in which the learning is performed independently of the learning based on the control law.
- FIG. 73 is a schematic diagram for explaining processing according to the fourth example of the ninth embodiment.
- the conversion unit 301r according to the fourth example of the ninth embodiment includes conversion units 301r-1 and 301r-2 that are executed independently of each other.
- the conversion unit 301r-1 includes an environment generation unit 790.
- the environment generator 790 generates an environment for the target specialized recognizer.
- the environment means the input (image 401) in the specialized recognizer and the output (correct data 402) is output.
- the environment generation unit 790 generates control learning data 792 using dummy control data 791, which is dummy control data, based on the existing learning data 400c.
- the dummy control data 791 may be fixed control data or random control data for performing random control.
- the dummy control data 791 can be prepared for each pattern of the existing learning data 400c, for example.
- environment generator 790 selects dummy control data 791 according to the pattern of existing learning data 400c to generate control learning data 792.
- the conversion unit 301r-2 includes a control learning unit 793.
- the control learning unit 793 generates a control rule 795 for executing recognition processing by the specialized recognizer based on the control learning data 792 generated by the environment generation unit 790 in the conversion unit 301r-1.
- the control learning unit 793 can use the control constraint information 794 in generating the control law 795 .
- the control constraint information 794 is, for example, information indicating constraint conditions that cannot be expressed based on the existing learning data 400c.
- constraints related to hardware control such as the sub-sample line control constraint information 743 described in the third example of the second example of the ninth embodiment can be applied. .
- the conversion units 301k (conversion units 310k-1 and 301k-2) to 301r according to the ninth embodiment perform recognition processing based on the first signal read from the first sensor. It functions as a generator that generates control information for controlling one recognizer based on a data set or the second recognizer for performing recognition processing by a second recognizer different from the first recognizer.
- the conversion units 301k (conversion units 310k-1 and 301k-2) to 301r according to the ninth embodiment perform recognition processing based on the first signal read out from the first sensor in the first readout unit. is read out from a second sensor that differs from the first sensor in at least one of the readout unit, the signal characteristic, and the pixel characteristic, based on the first learning data for training the first recognizer that performs It also functions as a generator that generates second learning data for training a second recognizer that performs recognition processing based on the second signal.
- a control rule for executing recognition processing by a specialized recognizer is generated using the output of a module to be incorporated into the existing recognizer during learning of the existing recognizer.
- the conversion unit 301 in the data generation unit 30 of the learning system 3 shown in FIG. 2B generates the specialized control rule 313 based on the specialized learning data 302. be processed.
- FIGS. 74A and 74B A first example of the tenth embodiment will be schematically described using FIGS. 74A and 74B.
- FIG. 74A is a schematic diagram schematically showing learning processing by an existing recognizer according to the first example of the tenth embodiment.
- a recognizer 750 performs recognition processing corresponding to a frame-based image, and corresponds to the existing recognizer.
- Recognizer 750 is included, for example, in recognizer 20 in FIG. 2A.
- the recognizer 750 includes a common section 751 , a reference information output section 752 and a recognition processing section 753 .
- the recognizer 750 has a layer that constitutes a reference information output unit 752 inserted at a predetermined position in a plurality of layers for extracting feature amounts in the recognizer 750 .
- each layer before the reference information output unit 752 in the recognizer 750 constitutes the common unit 751
- each layer after the reference information output unit 752 constitutes the recognition processing unit 753 .
- the recognition processing unit 753 can be a portion that is updated by learning
- the common unit 751 can be a portion that is not updated by learning.
- the recognition processing unit 753 further executes recognition processing based on the feature amount extracted from each layer.
- the reference information output unit 752 is a configuration added to a general existing recognizer in the first example of the tenth embodiment.
- the reference information output unit 752 outputs reference information for reference when generating the specialized control rule 313 based on the feature amount extracted by the common unit 751 .
- the recognizer 750 is input with existing learning data 400 including pre-prepared images 401, 401, . . . and correct data 402, 402, .
- the common unit 751 extracts feature amounts from each layer from the input existing learning data and outputs them as intermediate feature amounts.
- the intermediate feature amount is input to the recognition processing section 753 via the reference information output section 752 .
- the learning unit 760 causes the reference information output unit 752 and the recognition processing unit 753 to learn based on the existing learning data 400 .
- the reference information output unit 752 can, for example, learn about an attention area to be recognized in the feature amount extracted by the common unit 751 .
- the learning unit 760 may be configured outside the recognition unit 20 .
- FIG. 74B is a schematic diagram schematically showing the processing of evaluation data by the recognizer 750 according to the first example of the tenth embodiment.
- recognizer 750 has been trained by learning section 760 described in FIG. 74A.
- the control information generation unit 761 and the image generation unit 766 may be included in the recognition unit 20 in FIG. 2A, for example.
- the control information generation unit 761 generates reference information output from the reference information output unit 752, a control range 762 indicating a range in which imaging control is performed on the imaging unit 11, and an observed image that is an image of a subject captured by the imaging unit 11. 765, and control information for instructing the imaging control unit 13 to control the imaging unit 11 is generated. Note that an image prepared in advance may be applied as an initial image for the observed image 765 .
- the imaging control unit 13 controls the imaging operation by the imaging unit 11 according to the control range 762 and the control information generated by the control information generation unit 761.
- the imaging control unit 13 may control, for example, the designation of the line to be exposed among the lines in the imaging unit 11, the exposure time of each line, the order of exposure, the reading method, and the like.
- the imaging control unit 13 can control the imaging operation of the imaging unit 11 so as to perform the line division and sub-sampling described above according to the control information.
- the observed image 654 is data of one line when the imaging unit 11 performs imaging according to control information indicating line division, for example.
- the observed image 765 is an image exposed and read out by the imaging unit 11 according to control information generated using reference information output from the reference information output unit 752 .
- the reference information is, for example, information learned about the attention area to be recognized in the feature quantity extracted by the common unit 751 .
- observed image 765 can be viewed as non-frame-based data read from recognition-specific sensors.
- the observed image 765 is input to the image generator 766 and the control information generator 761 .
- the image generator 766 performs, for example, accumulation and interpolation processing of the observed image 765 to generate a recognized image 767 as a frame-based image. Recognized image 767 is provided to recognizer 750 and input to common section 751 . The recognized image 767 is used, for example, as evaluation data for the recognizer 750 as an existing recognizer.
- a first example of the tenth embodiment will be described using a more specific example.
- an attention technique that spatially clarifies an attention area is applied, and an attention map indicating the use area is used as reference information output by the reference information output unit 752 .
- FIG. 75 is a schematic diagram for explaining learning processing by an existing recognizer according to the first example of the tenth embodiment. Note that in sections (a) and (b) of FIG. 75, the existing learning data 400 and the learning unit 760 shown in FIG. 74A are omitted.
- Section (a) of FIG. 75 schematically shows the configuration of a recognizer 750a according to the first embodiment of the tenth embodiment.
- the recognizer 750a includes a common section 751, a reference information output section 752a, and a recognition processing section 753, similar to the recognizer 750 shown in FIG. 74A.
- Section (b) of FIG. 75 shows the configuration of the recognizer 750a in more detail.
- Existing learning data 400 (not shown) is input to a common unit 751 in the recognizer 750a.
- the common part 751 extracts feature amounts based on the existing learning data 400 by each layer of the common part 751 .
- An intermediate feature quantity 774 output from the final layer 773 (layer #i) in the common section 751 is input to the reference information output section 752a.
- the reference information output unit 752a includes an attention generation layer 771 and a multiplier 770. Intermediate features 774 are fed to the multiplied input of multiplier 770 and attention generation layer 771 .
- the attention generation layer 771 generates an attention map 772 as reference information based on the intermediate feature amount 774 .
- information can be applied in which the value of the area corresponding to the feature amount to be recognized is "1" and the value of the area not to be recognized is "0".
- the attention map 772 generated by the attention generation layer 771 is input to the multiplication input terminal of the multiplier 770 .
- the multiplier 770 multiplies the attention map 772 by the intermediate feature amount 774 input to the multiplicand input terminal.
- the feature amount of the area not targeted for recognition processing among the intermediate feature amounts 774 is set to "0", and the amount of calculation in the latter stage can be reduced.
- the output of the multiplier 770 is input to the first layer 775 (layer #i+1) of the recognition processing section 753 .
- a learning unit 760 learns the recognition processing unit 753 based on the output of the multiplier 770, for example. Also, the learning unit 760 may learn the attention generation layer 771 based on the intermediate feature amount 774 .
- FIG. 76A is a schematic diagram more specifically showing the processing regarding the evaluation data by the recognizer 750a according to the first example of the tenth embodiment.
- recognizer 750a corresponds to recognizer 750 in FIG. 74B described above.
- the reference information output section 752 in the recognizer 750 shown in FIG. 74B is replaced with an attention generation layer 771.
- the control information generation unit 761 shown in FIG. 74B is replaced with an attention area selection unit 776.
- FIG. 76A is a schematic diagram more specifically showing the processing regarding the evaluation data by the recognizer 750a according to the first example of the tenth embodiment.
- recognizer 750a corresponds to recognizer 750 in FIG. 74B described above.
- the reference information output section 752 in the recognizer 750 shown in FIG. 74B is replaced with an attention generation layer 771.
- the control information generation unit 761 shown in FIG. 74B is replaced with an attention area selection unit 776.
- the attention area selection unit 776 instructs the imaging control unit 13 to control the imaging unit 11 based on the attention map 772 generated by the attention generation layer 771, the control range 762, and the observed image 765. Generate control information. At this time, the attention area selection unit 776 selects an attention area indicated by the attention map 772 from the image range indicated by the control range 762, and controls the imaging unit 11 to read out the selected attention area. Generate control information. Note that an image prepared in advance may be applied as an initial image for the observed image 765 .
- the imaging control unit 13 controls imaging operations including pixel signal readout processing by the imaging unit 11 according to the control range 762 and the control information generated by the attention area selection unit 776 .
- the imaging control unit 13 controls the imaging operation of the imaging unit 11 so that the pixel signals of the attention area selected by the attention area selection unit 776 based on the attention map 772 are read from the imaging unit 11 .
- the imaging unit 11 performs imaging and readout of pixel signals under the control of the imaging control unit 13 , and outputs a captured image based on the readout pixel signals as an observed image 765 .
- Observation image 765 is input to image generation section 766 and attention area selection section 776 .
- the image generation unit 766 generates a recognition image 767 for the recognizer 750 to perform recognition processing based on the observed image 765 .
- the recognized image 767 is supplied to the recognizer 750 a and input to the common section 751 .
- the attention generating layer 771 generates an attention map 772 based on the intermediate feature quantity 774 (not shown) extracted by the common part 751 based on the input recognition image 767 .
- the intermediate feature quantity 774 output from the common unit 751 is input to the recognition processing unit 753 via the attention generation layer 771 .
- the recognition processing unit 753 executes authentication processing based on the intermediate feature amount 774 .
- the recognized image 767 is used, for example, as evaluation data for the recognizer 750a.
- FIG. 76B is a schematic diagram for more specifically explaining the processing by the attention area selection unit 776 according to the first example of the tenth embodiment.
- section (a) shows an example of processing by the attention area selection unit 776 .
- the region-of-interest selection unit 776 calculates a region of interest based on the cross-sectional information for which imaging control is possible in the input attention map 772 (step S40).
- the controllable cross section is a cross section in the vertical direction in the captured image when the imaging unit 11 performs readout on a line-by-line basis.
- Section (b) of FIG. 76B shows a specific example of an attention map 772 .
- the attention map 772 indicates the value "1" portion to be recognized in the captured image in white, and the value "0" portion not to be recognized in black.
- the attention map 772 shows that target areas 772a 1 , 772a 2 and 772a 3 to be recognized are included.
- the attention area selection unit 776 integrates the attention map 772 in the line direction to generate attention area information 772b indicating the attention area.
- the attention area information 772b indicates lines in the vertical direction and indicates integrated values of the values of the target area in the horizontal direction. According to the region-of-interest information 772b, it can be seen that portions with large integrated values exist at the vertical position of the target region 772a 1 and at the vertical positions of the target regions 772a 2 and 772a 3 .
- the attention area selection unit 776 determines the line to be read and the readout order of the lines based on the attention area information 772b (step S41).
- the region-of-interest selection unit 776 may determine the line to be read according to the integrated value of the values of the target region. For example, the region-of-interest selection unit 776 can generate control information so that lines are read out at denser intervals as the integrated value is larger, and are read out at sparse intervals as the integrated value is smaller. .
- the region-of-interest selection unit 776 may generate control information such that exposure and readout are performed multiple times on the same line at positions where the integrated value is equal to or greater than a predetermined value.
- the readout control described with reference to FIG. 68 in the third example of the second example of the ninth embodiment can be applied to multiple times of exposure and readout control on the same line.
- the attention area selection unit 776 passes the control information thus generated to the imaging control unit 13 .
- the imaging control unit 13 controls exposure and reading of pixel signals in the imaging unit 11 according to control information.
- the attention area selection unit 776 generates control information for the imaging control unit 13 to control the imaging unit 11 based on the attention map 772, but this is not limited to this example.
- the region-of-interest selection unit 776 can generate the control information based on a saliency map that indicates saliency in the image.
- the reference information output unit 752 is incorporated in the recognizer 750 in this way, and the reference information output unit 752 is trained using the existing learning data 400. Control information for controlling imaging is generated based on the reference information output from the reference information output unit 752 . Therefore, it becomes possible to more efficiently execute the processing related to the evaluation data.
- a second example of the tenth embodiment uses an existing recognizer as it is to generate a control rule for executing recognition processing by a specialized recognizer. More specifically, in the second example of the tenth embodiment, imaging control is performed without incorporating the above-described reference information output unit 752 to generate evaluation data.
- FIG. 77 is a schematic diagram schematically showing the processing regarding the evaluation data by the existing recognizer according to the second example of the tenth embodiment.
- recognizer 750b corresponds to recognizer 750 in FIG. 74B described above.
- the recognizer 750b includes the common section 751 and the recognition processing section 753 and does not include the reference information output section 752 described above.
- the control information generator 761a acquires the information indicating the attention area from the recognizer 750b (for example, path 768a). .
- the control information generation unit 761a uses the acquired information indicating the attention area as reference information, and controls the imaging unit 11 to the imaging control unit 13 based on the reference information, the control range 762, and the observed image 765. It is possible to generate control information for instructing.
- control information generation unit 761 a can generate control information for instructing the imaging control unit 13 to control the imaging unit 11 based on the observed image 765 or the recognition image 767 .
- the control information generator 761a acquires an observed image 765 or a recognized image 767 (path 768b or 768c), and converts the acquired observed image 765 or recognized image 767 into spatial frequency information.
- the control information generation unit 761a uses this spatial frequency information as reference information, and instructs the imaging control unit 13 to control the imaging unit 11 based on the reference information, the control range 762, and the observed image 765. It is possible to generate control information for
- the control information generator 761 may, for example, thin out data (for example, lines) whose spatial frequency is equal to or less than a predetermined value.
- reference information based on the observed image 765 or the recognized image 767 is not limited to spatial frequency information.
- the control information generator 761a can use, for example, the color information in the observed image 765 or the recognized image 767 as reference information.
- information that can be obtained from an existing recognizer or information that can be obtained from a captured image is used as reference information to generate control information for controlling imaging. are doing. Therefore, without changing the configuration of the existing recognizer, it becomes possible to more efficiently execute the processing related to the evaluation data.
- the conversion unit 301 uses a data set or first 1 recognizer performs recognition processing based on a second signal read from a second sensor that differs from the first sensor in at least one of a readout unit, a pixel characteristic, and a signal characteristic. It functions as a generation unit that generates control information for controlling the second recognizer to perform.
- the recognition unit 20 outputs control information for controlling the first recognizer that performs recognition processing based on the first signal read from the first sensor to the first A second recognizer different from the recognizer functions as a data set for performing recognition processing or as a generation unit that generates data based on the second recognizer.
- the conversion unit 301 is based on a second signal read from a second sensor that is different from the first sensor in at least one of the readout unit, pixel characteristics, and signal characteristics, It also functions as a generator that generates a signal corresponding to the first signal read from the first sensor.
- a control law is generated for each #5.
- the distillation process described in the third embodiment is applied to the generation of the control law.
- the processing according to each example of the eleventh embodiment corresponds to the processing of generating the specialized control rule 313 by the NW conversion unit 311 in the recognizer generation unit 31 of the learning system 3 shown in FIG. 2B.
- FIG. 78 is a diagram corresponding to FIG. 25 described above, and is a schematic diagram showing the classification of the processes according to the eleventh embodiment.
- the processing related to existing input data and specialized input data includes processing for converting existing input data into specialized input data, processing for converting specialized input data into existing input data, and It can be classified into the process of converting to data and the process of converting to data.
- processing such as conversion can be classified into processing for conversion only and processing for conversion and generation, as in the third embodiment.
- control constraints are added to the distillation process for learning the specialized recognizer in each of cases #2 to #5.
- control constraints are added to the distillation process for training the specialized recognizer.
- the first example of the eleventh embodiment corresponds to case #1 described with reference to FIGS. 23 and 78, and corresponds to the configuration shown in FIG. 28 as a processing configuration. That is, in the first example of the eleventh embodiment, when an existing recognizer other than a specialized recognizer, existing input data, specialized input data, existing correct data, and specialized correct data are available, This is an example of generating a specialized recognizer and a control rule for controlling the specialized recognizer. In the first example of the eleventh embodiment, the general distillation process described above can be applied.
- FIG. 79 is a schematic diagram for explaining processing according to the first example of the eleventh embodiment.
- FIG. 79 corresponds to the configuration of FIG. 28 described in the first example of the third embodiment. Section 782 and are added.
- the existing recognizer 410 executes recognition processing based on the image 401 included in the existing learning data 400 and outputs an existing recognition output 411.
- the sampling unit 780 samples the image 441 included in the specialized learning data 440 in accordance with the control information generated by the control rule generation unit 781, and outputs data obtained by sampling the image 441 to the specialized recognizer 420.
- the specialized recognizer 420 executes recognition processing based on the data output from the sampling section 780 and outputs a specialized recognition output 421 .
- the inter-recognition output error calculator 430 obtains the error between the existing recognition output 411 and the specialized recognition output 421, performs calculations to minimize the distance between the existing recognition output 411 and the specialized recognition output 421, and minimizes the distance between the existing recognition output 411 and specialized recognition output 421. Find the error 431 .
- the recognition output error calculation unit 430 feeds back the calculated minimized error 431 to the specialized recognizer 420 and the control rule generation unit 781 by, for example, the error backpropagation method, and the specialized recognizer 420 and the control rule generation unit Section 781 is updated.
- the inter-recognized-output error calculator 430 optimizes the specialized recognizer 420 by re-learning the specialized recognizer 420 so as to reduce the minimization error 431 .
- control constraint estimation unit 782 estimates control constraints based on a control range 783 that indicates the range in which imaging control is performed on the imaging unit 11 .
- a control constraint is, for example, a constraint condition that cannot be expressed by information based on the output of the existing recognizer 410 or the specialized recognizer 420 .
- the control constraint estimator 782 infers constraints in hardware readout control in the imaging unit 11 as control constraints.
- the control rule generating unit 781 generates the control constraint estimated by the control constraint estimating unit 782, the minimization error 431 fed back from the recognition output error calculating unit 430, the image 441, and the image 441 sampled by the sampling unit 780.
- Control information for controlling the specialized recognizer 420 is generated based on the obtained data.
- control rule generation unit 781 can generate sampling control information for controlling the sampling of the image 441 by the specialized recognizer 420 .
- the control rule generator 781 includes the generated sampling control information in control information for controlling the specialized recognizer 420 .
- the specialized recognizer 420 is optimized using the existing recognition output 411 and the specialized recognition output 421 based on the image 401 included in the existing learning data 400 and the image 441 contained in the specialized learning data 440.
- regular training using correct answer data 402 and 442 may optimize specialized recognizer 420 .
- the optimization based on the images 401 and 441 and the optimization based on the correct data 402 and 442 may be executed at the same time.
- FIG. 80 is a schematic diagram for explaining processing according to the second example of the eleventh embodiment.
- the second example of the eleventh embodiment like the first example of the eleventh embodiment described above, corresponds to case #1 described with reference to FIGS.
- the configuration corresponds to the configuration shown in FIG.
- the processing according to the second example of the eleventh embodiment is the same as the first example of the eleventh embodiment described above, except that the existing recognizer 410 ′ as a substitute for the specialized recognizer 420 is not trained. is the same as the embodiment of That is, in the second example of the eleventh embodiment, when an existing recognizer other than a specialized recognizer, existing input data, specialized input data, existing correct data, and specialized correct data are available, It is an example of generating a control law for controlling a specialized recognizer. In a second example of the eleventh embodiment, the general distillation process described above can be applied.
- the sampling unit 780 samples the image 441 included in the specialized learning data 440 according to the control information generated by the control rule generation unit 781, and outputs data obtained by sampling the image 441 to the existing recognizer 410'.
- the existing recognizer 410' performs recognition processing based on the data output from the sampling unit 780, and outputs an existing recognition output 411'.
- the existing recognition output 411 ′ is the recognition output corresponding to the recognition specialized sensor that has performed recognition processing on the image 441 included in the specialized learning data 440 by the existing recognizer 410 ′.
- the inter-recognition output error calculator 430 obtains the error between the existing recognition output 411 and the existing recognition output 411′, performs calculation to minimize the distance between the existing recognition output 411 and the existing recognition output 411′, and minimizes the distance between the existing recognition outputs 411 and 411′. Find the error 431 .
- the inter-recognition-output error calculator 430 feeds back the calculated minimization error 431 to the control rule generator 781 by, for example, error backpropagation, and updates the control rule generator 781 .
- the retraining of the existing recognizer 410' due to the minimization error 431 can be omitted.
- the minimization error 431 may be used to make the existing recognizer 410' learn, or the existing recognizer 410' may be adjusted (fine-tuned). Each parameter of the existing recognizer 410 ′ that has been learned or adjusted is reflected in the existing recognizer 410 .
- a control constraint estimation unit 782 estimates control constraints based on the control range 783 .
- the control rule generating unit 781 generates the control constraint estimated by the control constraint estimating unit 782, the minimization error 431 fed back from the recognition output error calculating unit 430, the image 441, and the image 441 sampled by the sampling unit 780.
- Control information for controlling the specialized recognizer 420 (not shown) is generated based on the obtained data.
- control rule generation unit 781 can generate sampling control information for controlling the sampling of the image 441 by the specialized recognizer 420 .
- the control law generator 781 includes the generated sampling control information in control information for controlling the specialized recognizer 420 .
- the error backpropagation method described above can be applied if the operation is described in a way that each component can be differentiated. For example, when the control law is "change of gain", the processing is multiplication, so differentiation is possible. In this case, it is possible to learn the control law by the distillation process. On the other hand, it is difficult to differentiate, for example, line readout processing in line division and pixel-by-pixel readout processing in sub-sampling.
- the first implementation method of the distillation process related to the control law is an example when the operation on the sample is described by a differentiable method.
- a first implementation provides a differentiable description of sample manipulation and control.
- the derivative is calculated by the usual backpropagation method. In this case, it is conceivable to update the weights in the specialized recognizer 420 according to the differentiation.
- the second implementation method of the distillation process related to the control law is an example when the operation on the sample is difficult to differentiate.
- a method of describing the operation in an approximate expression, differentiating (softening) with the approximate expression, and carrying out the distillation process can be considered.
- a softmax function can be applied as an approximation formula.
- the third implementation method of the distillation process related to the control law is an example of the case where the operation on the sample is difficult to differentiate and approximation is also difficult. For example, there are cases in which the softening is not appropriate, or the softening does not produce the desired performance. In this case, the control law is learned using reinforcement learning.
- the third example of the eleventh embodiment corresponds to case #2 described with reference to FIGS. 23 and 78, and corresponds to the configuration shown in FIG. 29 as a processing configuration. That is, in the third example of the eleventh embodiment, an existing recognizer, existing input data, existing correct data, and specialized correct data exist, and if there is no specialized input data, the specialized recognizer and the relevant This is an example of generating a control law for controlling a specialized recognizer. In the third example of the eleventh embodiment, similar to the second example of the third embodiment, specialized input data is generated from existing input data, and then distillation is performed.
- FIG. 81 is a schematic diagram for explaining processing according to the third example of the eleventh embodiment.
- FIG. 81 corresponds to the configuration of FIG. 29 described in the second example of the third embodiment. Section 782 and are added. 29 and 79 will be omitted as appropriate.
- the existing recognizer 410 executes recognition processing based on the image 401 included in the existing learning data 400 and outputs an existing recognition output 411.
- the existing/specialized conversion unit 460 converts the image 401 corresponding to the existing recognizer 410 to the specialized recognizer 420 in the same manner as the method described using FIG. 29 in the second example of the third embodiment. Convert to the corresponding image 441a.
- the sampling unit 780 samples the image 441a converted from the image 401 by the existing/specialized conversion unit 460 according to the control information generated by the control rule generation unit 781, and sends the data obtained by sampling the image 441a to the specialized recognizer 420.
- output to The specialized recognizer 420 executes recognition processing based on the data output from the sampling section 780 and outputs a specialized recognition output 421 .
- the inter-recognition output error calculator 430 obtains a minimized error 431 based on the existing recognition output 411 and the specialized recognition output 421 .
- the recognition output error calculation unit 430 feeds back the calculated minimized error 431 to the specialized recognizer 420 and the control rule generation unit 781 by, for example, the error backpropagation method, and the specialized recognizer 420 and the control rule generation unit Section 781 is updated.
- control constraint estimation unit 782 estimates control constraints based on a control range 783 that indicates the range in which imaging control is performed on the imaging unit 11 .
- the control rule generation unit 781 generates the control constraint estimated by the control constraint estimation unit 782, the minimization error 431 fed back from the recognition output error calculation unit 430, the image 441a, and the image 441a sampled by the sampling unit 780.
- Control information for controlling the specialized recognizer 420 is generated based on the obtained data.
- the fourth example of the eleventh embodiment corresponds to case #3 described with reference to FIGS. 23 and 78, and corresponds to the configuration shown in FIG. 30 as a processing configuration. That is, in the fourth example of the eleventh embodiment, an existing recognizer, specialized input data, existing correct data, and specialized correct data exist, and if there is no existing input data, the specialized recognizer and the relevant This is an example of generating a control law for controlling a specialized recognizer. In the fourth example of the eleventh embodiment, similar to the third example of the third embodiment, existing input data is generated from specialized input data, and then distillation is performed.
- FIG. 82 is a schematic diagram for explaining processing according to the fourth example of the eleventh embodiment.
- FIG. 82 corresponds to the configuration of FIG. 30 described in the third example of the third embodiment, and a sampling unit 780 and a control rule generation unit 781 are added to the configuration of FIG. ing. 82, the control constraint estimator 782 for estimating the control constraint based on the control range 783 is omitted from the configuration shown in FIG. 81 and the like. 30 and 79 will be omitted as appropriate.
- the specialized/existing converter 461 converts an image 441 corresponding to the specialized recognizer 420 into an image 401a corresponding to the existing recognizer 410.
- the existing recognizer 410 performs recognition processing based on the image 401 a and outputs an existing recognition output 411 .
- the sampling unit 780 samples the image 441 in accordance with the control information generated by the control rule generation unit 781 and outputs the sampled data of the image 441 to the specialized recognizer 420 .
- the specialized recognizer 420 executes recognition processing based on the data output from the sampling section 780 and outputs a specialized recognition output 421 .
- the inter-recognition output error calculator 430 obtains a minimized error 431 based on the existing recognition output 411 and the specialized recognition output 421 .
- the recognition output error calculation unit 430 feeds back the calculated minimized error 431 to the specialized recognizer 420 and the control rule generation unit 781 by, for example, the error backpropagation method, and the specialized recognizer 420 and the control rule generation unit Section 781 is updated.
- the control rule generation unit 781 generates the specialized recognizer 420 based on the minimization error 431 fed back from the recognition output error calculation unit 430, the image 441, and the data obtained by sampling the image 441 by the sampling unit 780. Generate control information for control.
- the fifth example of the eleventh embodiment corresponds to case #4 described with reference to FIGS. 23 and 78, and corresponds to the configuration shown in FIG. 31A as a processing configuration. That is, in the fifth example of the eleventh embodiment, if there are an existing recognizer, existing correct data, and specialized correct data, and there is no existing input data and specialized input data, the specialized recognizer and the relevant This is an example of generating a control law for controlling a specialized recognizer.
- the existing input data is generated based on the existing recognizer, and the specialized input data is generated based on the generated existing input data. Generate data. Distillation is performed after the existing input data and specialized input data are generated in this manner.
- FIG. 83 is a schematic diagram for explaining processing according to the fifth example of the eleventh embodiment.
- FIG. 83 corresponds to the configuration of FIG. 31A described in the fourth example of the third embodiment, and a sampling unit 780 and a control rule generation unit 781 are added to the configuration of FIG. 31A. ing. 83, the control constraint estimator 782 for estimating the control constraint based on the control range 783 is omitted from the configuration shown in FIG. 81 and the like. In the following description, the same content as the description of FIGS. 31A and 82 will be omitted as appropriate.
- the recognition image extraction unit 470 extracts and generates an image 401 b corresponding to the existing recognizer 410 from the existing recognizer 410 .
- the existing/specialized converter 460 converts the image 401 b into an image 441 b corresponding to the specialized recognizer 420 .
- the sampling unit 780 samples the image 441 b in accordance with the control information generated by the control rule generation unit 781 and outputs the sampled data of the image 441 b to the specialized recognizer 420 .
- the specialized recognizer 420 executes recognition processing based on the data output from the sampling section 780 and outputs a specialized recognition output 421 .
- the inter-recognition output error calculator 430 obtains a minimized error 431 based on the existing recognition output 411 and the specialized recognition output 421 .
- the recognition output error calculation unit 430 feeds back the calculated minimized error 431 to the specialized recognizer 420 and the control rule generation unit 781 by, for example, the error backpropagation method, and the specialized recognizer 420 and the control rule generation unit Section 781 is updated.
- the control rule generation unit 781 generates the specialized recognizer 420 based on the minimized error 431 fed back from the recognition output error calculation unit 430, the image 441b, and the data obtained by sampling the image 441b by the sampling unit 780. Generate control information for control.
- the sixth example of the eleventh embodiment corresponds to case #5 described with reference to FIGS. 23 and 78, and corresponds to the configuration shown in FIG. 32 as a processing configuration. That is, in the sixth example of the eleventh embodiment, if there are an existing recognizer, existing correct data, and specialized correct data, and there is no existing input data and specialized input data, the specialized recognizer and the relevant This is an example of generating a control law for controlling a specialized recognizer.
- specialized input data is generated by a predetermined method, and based on the generated specialized input data, existing Distillation is performed after generating the input data.
- FIG. 84 is a schematic diagram for explaining processing according to the sixth example of the eleventh embodiment.
- FIG. 84 corresponds to the configuration of FIG. 32 described in the fifth example of the third embodiment, and a sampling unit 780 and a control rule generation unit 781 are added to the configuration of FIG. ing. 84, the control constraint estimator 782 for estimating the control constraint based on the control range 783 is omitted from the configuration shown in FIG. 81 and the like. 32 and 82 will be omitted as appropriate.
- the image generator 462 generates an image 441c corresponding to the specialized recognizer 420 by a predetermined method such as random or CG.
- the specialized/existing conversion unit 461 converts the image 441c into the image 401a corresponding to the existing recognizer 410.
- the existing recognizer 410 performs recognition processing based on the image 401 a converted from the image 441 c by the specialization/existing converter 461 and outputs an existing recognition output 411 .
- the sampling unit 780 samples the image 441 c in accordance with the control information generated by the control rule generation unit 781 and outputs data obtained by sampling the image 441 c to the specialized recognizer 420 .
- the specialized recognizer 420 executes recognition processing based on the data output from the sampling section 780 and outputs a specialized recognition output 421 .
- the inter-recognition output error calculator 430 obtains a minimized error 431 based on the existing recognition output 411 and the specialized recognition output 421 .
- the recognition output error calculation unit 430 feeds back the calculated minimized error 431 to the specialized recognizer 420 and the control rule generation unit 781 by, for example, the error backpropagation method, and the specialized recognizer 420 and the control rule generation unit Section 781 is updated.
- the control rule generation unit 781 generates the specialized recognizer 420 based on the minimized error 431 fed back from the recognition output error calculation unit 430, the image 441c, and the data obtained by sampling the image 441c by the sampling unit 780. Generate control information for control.
- FIG. 85 is a schematic diagram for explaining processing according to a modification of the sixth example of the eleventh embodiment.
- the image generation unit 462 generates the image 441 c corresponding to the specialized recognizer 420 .
- the image generation unit 462 generates an image 401c corresponding to the existing recognizer 410.
- FIG. The method of generating the image 401c by the image generation unit 462 is not limited to a specific assignment, but random generation or CG generation can be applied as described above.
- the existing recognizer 410 executes recognition processing based on the image 401c generated by the image generator 462 and outputs an existing recognition output 411.
- the existing/specialized converter 460 converts the image 401c into an image 441d corresponding to the specialized recognizer 420 in the same manner as the method described using FIG. 29 in the second example of the third embodiment. .
- the sampling unit 780 samples the image 441d obtained by converting the image 401c by the existing/specialized conversion unit 460 according to the control information generated by the control rule generation unit 781, and sends the sampled data of the image 441d to the specialized recognizer 420.
- output to The specialized recognizer 420 executes recognition processing based on the data output from the sampling section 780 and outputs a specialized recognition output 421 .
- the inter-recognition output error calculator 430 obtains a minimized error 431 based on the existing recognition output 411 and the specialized recognition output 421 .
- the recognition output error calculation unit 430 feeds back the calculated minimized error 431 to the specialized recognizer 420 and the control rule generation unit 781 by, for example, the error backpropagation method, and the specialized recognizer 420 and the control rule generation unit Section 781 is updated.
- the control rule generation unit 781 generates the specialized recognizer 420 based on the minimized error 431 fed back from the recognition output error calculation unit 430, the image 441d, and the data obtained by sampling the image 441d by the sampling unit 780. Generate control information for control.
- the eleventh embodiment it is possible to easily provide a specialized recognizer to a user who has an existing frame-based recognizer but does not have a non-frame-based specialized recognizer. becomes possible.
- the specialized recognizer is trained including a control rule for controlling the specialized recognizer, it is possible to improve the accuracy of recognition processing in the specialized recognizer.
- the NW conversion unit 311 receives control information for controlling the first recognizer that performs recognition processing based on the first signal read from the first sensor.
- a second recognizer different from the first recognizer functions as a data set for performing recognition processing or a generator that generates based on the second recognizer.
- the NW conversion unit 311 performs recognition processing based on the first signal read from the first sensor, based on the output of the first recognizer. It also functions as a conversion unit that learns a second recognizer that performs recognition processing based on a second signal read from a second sensor having different characteristics.
- At least one network of existing recognizers is used so that the output of the recognizer matches or approximates when using an existing sensor and when using a recognition-specific sensor.
- a specialized recognizer is generated by converting the unit of processing (layer, filter, etc.) by, for example, the NW conversion unit 311 .
- FIG. 86 is a schematic diagram schematically showing processing according to the twelfth embodiment.
- Section (a) of FIG. 86 schematically shows the configuration of an existing recognizer 810 according to existing technology.
- the existing recognizer 810 includes a pre-processing unit 811, a middle-processing unit 812, and a post-processing unit 813, which are processing units.
- Each of the pre-processing unit 811, middle-processing unit 812, and post-processing unit 813 includes one or more layers.
- An existing sensor output 800 output from a frame-based existing sensor is input to an existing recognizer 810 .
- the existing recognizer 810 performs predetermined processing (e.g., feature amount extraction processing) on the input existing sensor output 800 in a pre-processing section 811, a middle-stage processing section 812, and a post-processing section 813, respectively, and outputs the existing recognition output. 801 is output.
- predetermined processing e.g., feature amount extraction processing
- Section (b) of FIG. 86 schematically shows the configuration of the specialized recognizer 820 according to the twelfth embodiment.
- the specialized recognizer 820 includes a pre-processing unit 811, a conversion mid-stage processing unit 821, and a post-processing unit 813, which are processing units.
- pre-processing section 811 and post-processing section 813 included in specialized recognizer 820 are assumed to be equivalent to pre-processing section 811 and post-processing section 813 included in existing recognizer 810 .
- a non-frame-based recognition specialized sensor (not shown) has its imaging operation controlled according to the control information 822 generated by the conversion middle-stage processing unit 821 .
- a specialized sensor output 802 output from the recognition specialized sensor is input to a specialized recognizer 820 .
- the specialized recognizer 820 performs predetermined processing on the input specialized sensor output 802 in a pre-processing unit 811, a conversion middle-stage processing unit 821, and a post-processing unit 813, and outputs an existing recognition output 803. .
- the processing for the output of the pre-processing unit 811 by the conversion middle-stage processing unit 821 is equivalent to the processing by the middle-stage processing unit 812 shown in section (a).
- the existing recognition output 803 is based on the specialized sensor output 802 in which the specialized recognition sensor controls the imaging operation according to the control information 822, and corresponds to the existing recognition output 801 in section (a). Become.
- the conversion middle-stage processing unit 821 generates control information 822 for controlling the recognition-specialization sensor according to the conversion processing for the specialized sensor output 802 input from the pre-processing unit 811 .
- the control information 822 is control information for controlling the recognition specialized sensor so that the output of the specialized recognizer 820 based on the specialized sensor output 802 approximates the existing recognition output 801 based on the existing sensor output 800 by the existing recognizer 810. including.
- the error of the existing recognition output 803 shown in section (b) with respect to the existing recognition output 801 shown in section (a) is obtained.
- the transform middle-stage processing unit 821 generates control information 822 such that this error is minimized using, for example, the error backpropagation method.
- the processing unit for example, layer or layer group
- the existing recognizer 810 is converted to the specialized recognizer 820. Convert.
- control information 822 for controlling the recognition specialized sensor is generated in the converted unit of processing.
- the specialized recognizer 820 can output the existing recognition output 803 corresponding to the existing recognition output 801 by the existing recognizer 810 based on the output of the recognition specialized sensor. can be generated.
- the middle-stage processing section 812 is focused on and converted. is not limited to For example, among the pre-processing unit 811, the middle-processing unit 812, and the post-processing unit 813 included in the existing recognizer 810, the pre-processing unit 811 or the post-processing unit 813 may be focused on for conversion. Further, for example, conversion may be performed by paying attention to a plurality of processing units among the pre-processing unit 811 , middle-processing unit 812 and post-processing unit 813 included in the existing recognizer 810 . Further, for example, a finer processing unit may be defined as the processing unit of the existing recognizer 810, and one or more processing units among the processing units may be converted.
- FIG. 87 is a schematic diagram for explaining processing according to the first example of the twelfth embodiment.
- attention is focused on the conversion middle stage processing section 821 shown in section (b) of FIG.
- the transformation middle-stage processing unit 821a includes a middle-stage processing unit 812, a control feature quantity generation unit 823, and a control information generation unit 824a.
- the middle-stage processing unit 812 is equivalent to the middle-stage processing unit 812 included in the existing recognizer 810 shown in section (a) of FIG.
- a pre-processing unit 811 and a post-processing unit 813 included in the existing recognizer 810 shown in section (a) of FIG. shall be distributed.
- the image output from the recognition specialized sensor 830 is subjected to predetermined processing including feature amount extraction processing by a pre-processing unit 811 (not shown), output as a pre-stage output, and input to a transformation middle-stage processing unit 821a.
- the middle-stage processing unit 812 extracts a feature amount from the input pre-stage output and outputs it as a middle-stage output.
- the middle-stage output is input to, for example, a post-processing section 813 (not shown).
- the middle-stage processing unit 812 passes the feature amount extracted from the previous-stage output to the control feature amount generation unit 823 .
- the control feature amount generation unit 823 estimates a region of interest in the image output from the recognition specialized sensor 830 based on the feature amount passed from the intermediate processing unit 812 .
- the control feature amount generation unit 823 sets the estimated attention area as a control target, and extracts a feature amount based on the attention area.
- the control feature amount generation unit 823 outputs the extracted feature amount as a control feature amount.
- the control information generation unit 824a generates control information 822a for controlling the imaging operation of the recognition specialized sensor 830 based on the control feature amount output from the control feature amount generation unit 823.
- FIG. 88 is a schematic diagram for explaining processing according to the second example of the twelfth embodiment.
- attention is focused on the conversion middle stage processing section 821 shown in section (b) of FIG.
- the transform middle-stage processing unit 821b includes a middle-stage processing unit 812, a control feature amount generation unit 823, a required characteristic estimation unit 825, and a control information generation unit 824b.
- the middle-stage processing unit 812 is equivalent to the middle-stage processing unit 812 included in the existing recognizer 810 shown in section (a) of FIG.
- a pre-processing unit 811 and a post-processing unit 813 included in the existing recognizer 810 shown in section (a) of FIG. shall be distributed.
- the pre-stage output based on the image output from the recognition specialized sensor 830 is input to the conversion middle-stage processing section 821b.
- the middle-stage processing unit 812 extracts a feature amount from the input pre-stage output and outputs it as a middle-stage output.
- the middle-stage output is input to, for example, a post-processing section 813 (not shown).
- the required characteristic estimation unit 825 acquires pixel characteristics and/or signal characteristics from the recognition specialized sensor 830 .
- the required characteristic estimation unit 825 estimates the characteristics required to obtain the existing recognition output 803 based on the pixel characteristics and/or signal characteristics acquired from the recognition specialized sensor 830 . For example, when the output of the existing sensor has linear characteristics and the output of the recognition specialized sensor 830 has logarithmic characteristics, the necessary characteristic estimator 825 presumes that an exponential characteristic signal is required.
- the required characteristic estimation unit 825 passes required characteristic information indicating the estimated characteristics to the control information generation unit 824b.
- the control information generation unit 824b generates control information 822b for controlling the imaging operation of the recognition specialized sensor 830 based on the necessary characteristic information passed from the necessary characteristic estimation unit 825.
- the control information generation unit 824b selects one or more of various characteristics such as gain, exposure, characteristic selection, saturation level switching, and spectral characteristics related to the imaging operation of the recognition specialized sensor 830. Information for control can be generated.
- the NW conversion unit 311 receives control information for controlling the first recognizer that performs recognition processing based on the first signal read from the first sensor.
- a second recognizer different from the first recognizer functions as a data set for performing recognition processing or a generator that generates based on the second recognizer.
- the NW conversion unit 311 performs recognition processing based on the first signal read from the first sensor, based on the output of the first recognizer. It also functions as a conversion unit that converts processing parameters related to recognition processing of the second recognizer that performs recognition processing based on a second signal read from a second sensor having different characteristics.
- the present technology can also take the following configuration.
- (1) Based on the first learning data for learning the first recognizer that performs recognition processing based on the first signal read from the first sensor in the first read unit, for the first sensor Second learning data for training a second recognizer that performs recognition processing based on a second signal read from a second sensor that has at least one different readout unit, signal characteristics, and pixel characteristics. a generator that generates comprising Information processing equipment.
- (2) the second sensor differs from the first sensor in at least the readout unit among the readout unit, the signal characteristic, and the pixel characteristic; the first readout unit is one frame, and the second readout unit of the second sensor is smaller than the one frame;
- the information processing device according to (1) above.
- the generating unit generating the second learning data by converting the first learning data according to the second readout unit; The information processing device according to (2) above.
- the generating unit generating a plurality of second images each having a different time from each of the plurality of first images based on a plurality of first images from a plurality of the first signals having different times; generating the second training data based on the plurality of second images;
- the information processing apparatus according to (2) or (3).
- the generating unit Based on one image by the first signal, generating a plurality of second images each having a different time from the one image, and generating the second learning data based on the plurality of second images.
- the information processing apparatus according to (2) or (3).
- the generating unit generating the plurality of second images based on information indicative of movement of the first sensor; The information processing device according to (5) above.
- the generating unit generating the plurality of second images based on information indicating movement of a subject included in the one image; The information processing apparatus according to (5) or (6).
- the generating unit estimating a statistic corresponding to the control range for the first signal based on the first learning data and a control range indicating a control range for the first sensor; and based on the estimated statistic generating the control information;
- the information processing apparatus according to any one of (2) to (7) above.
- the generating unit generating the control information for controlling the timing of reading from the second sensor based on the statistic; The information processing device according to (8) above.
- the generating unit generating the control information for controlling the timing according to the appearance frequency; The information processing device according to (9) above. (11) The generating unit adding a random element to the timing to generate the control information; The information processing apparatus according to (9) or (10). (12) The generating unit generating the control information further using a control constraint that is a constraint when controlling the first sensor; The information processing apparatus according to any one of (8) to (11). (13) The generating unit Sampling at sampling positions according to the control information is performed on information obtained by adding time-series information to the first learning data, and the control information is acquired in accordance with learning using the sampled information.
- the information processing device Updated, updating the time-series information and the sampling position according to the updated control information;
- the information processing device according to (8) above.
- the generating unit generating control learning data for learning control of the second recognizer using dummy control information for the first learning data, and controlling the control according to learning using the generated control learning data; generate information, The information processing device according to (8) above.
- the generating unit If there is a lack of information in the first pixel characteristic or the first signal characteristic of the first sensor with respect to the second pixel characteristic or the second signal characteristic of the second sensor, the first converting the first learning data into the second learning data by approximating the pixel characteristic or the first signal characteristic to the second pixel characteristic or the second signal characteristic;
- the information processing apparatus according to any one of (2) to (14).
- the conversion unit Using linear interpolation to interpolate missing information of the first pixel characteristics or the first signal characteristics due to the missing information with respect to the second pixel characteristics or the second signal characteristics performing said approximation; The information processing device according to (15) above.
- the conversion unit If the missing information due to the missing information is noise information, the approximation is performed by adding noise to the first pixel characteristic or the first signal characteristic.
- the information processing device according to (15) above.
- the conversion unit If the missing information due to the missing information is SNR (Signal-Noise Ratio), performing the approximation by performing noise reduction processing on the first pixel characteristic or the first signal characteristic; The information processing device according to (15) above.
- the conversion unit If there is a lack of information in the first pixel characteristic or the first signal characteristic of the first sensor with respect to the second pixel characteristic or the second signal characteristic of the second sensor, the lack of information Converting the first learning data to the second learning data by estimating missing information by The information processing apparatus according to any one of (2) to (14). (20) The conversion unit When the correspondence relationship between the first pixel characteristic or the first signal characteristic of the first sensor and the second pixel characteristic or the second signal characteristic of the second sensor is unknown, based on the preset information transforming the first pixel characteristic or the first signal characteristic into the second pixel characteristic or the second signal characteristic; The information processing apparatus according to any one of (2) to (14).
- the conversion unit uses noise characteristics as the preset information;
- the conversion unit uses a signal processing pipeline as the preset information.
- the conversion unit When the correspondence relationship between the first pixel characteristic or the first signal characteristic of the first sensor and the second pixel characteristic or the second signal characteristic of the second sensor is unknown, the first pixel inferring the second pixel characteristic or the second signal characteristic to which the characteristic or the first signal characteristic is transformed; The information processing apparatus according to any one of (2) to (4).
- the conversion unit estimates noise characteristics and converts the first pixel characteristics or the first signal characteristics into the second pixel characteristics or the second signal characteristics using the estimated noise characteristics. do, The information processing device according to (23) above.
- the conversion unit estimates a signal processing pipeline, and converts the first pixel characteristic or the first signal characteristic to the second pixel characteristic or the second pixel characteristic using the estimated signal processing pipeline. Convert to signal characteristics, The information processing device according to (23) above.
- the first pixel characteristic and the second pixel characteristic are photolinearities of the first signal and the second signal;
- the information processing apparatus according to any one of (15) to (25).
- the first pixel characteristic and the second pixel characteristic are noise characteristics of the first signal and the second signal;
- the information processing apparatus according to any one of (15) to (26).
- the first signal characteristic and the second signal characteristic are is the bit length of the first signal and the second signal;
- the information processing apparatus according to any one of (15) to (27).
- the first signal characteristic and the second signal characteristic are presence or absence of high dynamic range synthesis in the first signal and a second signal corresponding to the second recognizer or the second data set;
- the information processing apparatus according to any one of (15) to (28).
- the first signal characteristic and the second signal characteristic are static gradation characteristics of the first signal and the second signal;
- the information processing apparatus according to any one of (15) to (29).
- the first signal characteristic and the second signal characteristic are shading characteristics in the first signal and the second signal;
- (32) executed by a processor, Based on the first learning data for learning the first recognizer that performs recognition processing based on the first signal read from the first sensor in the first read unit, for the first sensor Second learning data for training a second recognizer that performs recognition processing based on a second signal read from a second sensor that has at least one different readout unit, signal characteristics, and pixel characteristics. a generation step that generates having Information processing methods.
- a generation step that generates Information processing program for executing (34) Based on the first learning data for learning the first recognizer that performs recognition processing based on the first signal read from the first sensor in the first read unit, for the first sensor Second learning data for training a second recognizer that performs recognition processing based on a second signal read from a second sensor that has at least one different readout unit, signal characteristics, and pixel characteristics.
- a learning device having a generator that generates a recognition device including the second recognizer; including, Information processing system.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Image Analysis (AREA)
- Character Discrimination (AREA)
Abstract
Description
1.実施形態の概要
1-1.実施形態の構成について
1-2.各実施形態に共通して適用可能な構成
2.実施形態に適用可能な技術
3.DNNについて
3-1.CNNの概要
3-2.RNNの概要
3-3.実施形態に適用可能な処理
4.第1の実施形態
4-1.第1の実施形態の第1の実施例
4-1-1.ライン分割による学習データ生成の第1の例
4-1-2.ライン分割による学習データ生成の第2の例
4-1-3.ライン分割による学習データ生成の第3の例
4-1-4.ライン分割による学習データ生成の第4の例
4-1-5.ライン分割による学習データ生成の第5の例
4-2.第1の実施形態の第2の実施例
4-2-1.サブサンプルによる学習データ生成の第1の例
4-2-2.サブサンプルによる学習データ生成の第2の例
4-2-3.サブサンプルによる学習データ生成の第3の例
4-2-4.サブサンプルによる学習データ生成の第4の例
4-2-5.サブサンプルによる学習データ生成の第5の例
4-2-6.サブサンプルによる学習データ生成の第6の例
4-3.第1の実施形態の第3の実施例
4-4.第1の実施形態の第4の実施例
4-5.第1の実施形態の第5の実施例
5.第2の実施形態
5-1.第2の実施形態の第1の実施例
5-1-1.ライン分割による評価データから生成する第1の例
5-1-2.ライン分割による評価データから生成する第2の例
5-1-3.ライン分割による評価データから生成する他の例
5-2.第2の実施形態の第2の実施例
5-2-1.サブサンプルによる評価データから生成する第1の例
5-2-2.サブサンプルによる評価データから生成する第2の例
5-2-3.サブサンプルによる評価データから生成する他の例
5-3.第2の実施形態の第3の実施例
5-3-1.フォーマット変換により評価データを生成する第1の例
5-3-2.フォーマット変換により評価データを生成する第2の例
5-4.第2の実施形態の第4の実施例
5-5.第2の実施形態の第5の実施例
5-5-1.既存評価データの出力タイミングの第1の例
5-5-2.既存評価データの出力タイミングの第2の例
5-5-3.既存評価データの出力タイミングの第3の例
6.第3の実施形態
6-1.第3の実施形態に適用可能な蒸留処理について
6-2.第3の実施形態の第1の実施例
6-3.第3の実施形態の第2の実施例
6-4.第3の実施形態の第3の実施例
6-5.第3の実施形態の第4の実施例
6-6.第3の実施形態の第5の実施例
7.第4の実施形態
7-1.第4の実施形態の第1の実施例
7-1-1.第1の実施例の第1の変形例
7-1-2.第1の実施例の第2の変形例
7-2.第4の実施形態の第2の実施例
7-2-1.第2の実施例の第1の変形例
7-2-2.第2の実施例の第2の変形例
7-3.第4の実施形態の第3の実施例
7-4.第4の実施形態の第4の実施例
8.第5の実施形態
8-1.変換部による変換処理の概略
8-2.第5の実施形態の第1の実施例
8-3.第5の実施形態の第2の実施例
9.第6の実施形態
10.第7の実施形態
11.第8の実施形態
11-1.第8の実施形態の第1の実施例
11-2.第8の実施形態の第2の実施例
11-3.第8の実施形態の第3の実施例
12.第9の実施形態
12-1.第9の実施形態の第1の実施例
12-1-1.第9の実施形態の第1の実施例の第1の例
12-1-2.第9の実施形態の第1の実施例の第2の例
12-2.第9の実施形態の第2の実施例
12-2-1.第9の実施形態の第2の実施例の第1の例
12-2-2.第9の実施形態の第2の実施例の第2の例
12-2-3.第9の実施形態の第2の実施例の第3の例
12-3.第9の実施形態の第3の実施例
12-4.第9の実施形態の第4の実施例
13.第10の実施形態
13-1.第10の実施形態の第1の実施例
13-2.第10の実施形態の第2の実施例
14.第11の実施形態
14-1.第11の実施形態の第1の実施例
14-2.第11の実施形態の第2の実施例
14-3.第11の実施形態の第3の実施例
14-4.第11の実施形態の第4の実施例
14-5.第11の実施形態の第5の実施例
14-6.第11の実施形態の第6の実施例
14-6-1.第6の実施例の変形例
15.第12の実施形態
15-1.第12の実施形態の第1の実施例
15-2.第12の実施形態の第2の実施例
(1-1.実施形態の構成について)
先ず、本開示の実施形態の概要について説明する。本開示は、画像認識器能を実現するための構成を組み込んだセンサ(認識特化センサと呼ぶ)による画像認識処理と、当該構成を持たない既存技術によるセンサ(既存センサと呼ぶ)による画像認識処理と、の間の互換性を確保するための技術に関する。
図1は、各実施形態に共通して適用可能な情報処理システムの一例の構成を示す模式図である。図1において、情報処理システム1は、認識システム2と、学習システム3とを含む。認識システム2は、センサ部10と認識部20とを含む。
次に、実施形態に適用可能な技術について説明する。図2Aおよび図2Bを用いて、実施形態に係る情報処理システム1の構成について、より詳細に説明する。
次に、本開示の各実施形態に適用可能な機械学習の一手法としてのDNN(Deep Neural Network)を用いた認識処理について、概略的に説明する。各実施形態では、DNNのうち、CNN(Convolutional Neural Network)と、RNN(Recurrent Neural Network)とを用いて画像データに対する認識処理を行う。以下、「画像データに対する認識処理」を、適宜、「画像認識処理」などと呼ぶ。
先ず、CNNについて、概略的に説明する。CNNによる画像認識処理は、一般的には、例えば行列状に配列された画素による画像情報に基づき画像認識処理を行う。図6は、CNNによる画像認識処理を概略的に説明するための図である。認識対象のオブジェクトである数字の「8」を描画した画像50の全体の画素情報51に対して、所定に学習されたCNN52による処理を施す。これにより、認識結果53として数字の「8」が認識される。
次に、RNNについて、概略的に説明する。図8Aおよび図8Bは、時系列の情報を用いない場合の、DNNによる識別処理(認識処理)の例を概略的に示す図である。この場合、図8Aに示されるように、1つの画像をDNNに入力する。DNNにおいて、入力された画像に対して識別処理が行われ、識別結果が出力される。
次に、本開示の各実施形態に適用可能な処理について、概略的に説明する。図11は、本開示の各実施形態に適用可能な認識処理を概略的に説明するための模式図である。図11において、ステップS1で、撮像部11(図2A参照)により、認識対象となる対象画像の撮像を開始する。
次に、本開示の第1の実施形態について説明する。第1の実施形態では、上述したように、既存認識器に係るフレームベースの画像データを、特化認識器に対応するサブサンプルあるいはライン分割による非フレームベースの画像データに変換する。
先ず、第1の実施形態の第1の実施例について説明する。第1の実施形態の第1の実施例は、既存画像データを、ライン分割による特化画像データに変換する例である。
第1の実施形態の第1の実施例に適用可能な、既存学習データ300から特化学習データ302を生成する第1の例について説明する。図13Aは、第1の実施形態の第1の実施例に適用可能な特化学習データ302の生成の第1の例を示す模式図である。この第1の例では、既存画像データに基づく特化画像データを適用させる対象の特化認識器が、1フレームの画像データを1ライン単位で分割した特化画像データに基づき認識処理を行うものとしている。
第1の実施形態の第1の実施例に適用可能な、既存学習データ300から特化学習データ302を生成する第2の例について説明する。図13Bは、第1の実施形態の第1の実施例に適用可能な特化学習データ302の生成の第2の例を示す模式図である。この第2の例では、既存画像データに基づく特化画像データを適用させる対象の特化認識器が、1フレームの画像データを隣接する複数ライン単位で分割した特化画像データに基づき認識処理を行うものとしている。
第1の実施形態の第1の実施例に適用可能な、既存学習データ300から特化学習データ302を生成する第3の例について説明する。図13Cは、第1の実施形態の第1の実施例に適用可能な特化学習データ302の生成の第3の例を示す模式図である。この第3の例では、既存画像データに基づく特化画像データを適用させる対象の特化認識器が、1フレームの画像データを、各ラインL#1、L#2、L#3、…の一部を単位として分割した特化画像データに基づき認識処理を行うものとしている。
第1の実施形態の第1の実施例に適用可能な、既存学習データ300から特化学習データ302を生成する第4の例について説明する。図13Dは、第1の実施形態の第1の実施例に適用可能な特化学習データ302の生成の第4の例を示す模式図である。この第4の例では、既存画像データに基づく特化画像データを適用させる対象の特化認識器が、1フレームの画像データを、各ラインL#1、L#2、L#3、…を、ライン単位で、所定の間隔ごとに分割した特化画像データに基づき認識処理を行うものとしている。
第1の実施形態の第1の実施例に適用可能な、既存学習データ300から特化学習データ302を生成する第5の例について説明する。図13Eは、第1の実施形態の第1の実施例に適用可能な特化学習データ302の生成の第5の例を示す模式図である。この第5の例では、既存画像データに基づく特化画像データを適用させる対象の特化認識器が、1フレームの画像データを、各ラインL#1、L#2、L#3、…を所定の間隔ごとに分割した2本のラインを含む特化画像データに基づき認識処理を行うものとしている。
次に、第1の実施形態の第2の実施例について説明する。第1の実施形態の第2の実施例は、既存画像データを、サブサンプルによる特化画像データに変換する例である。
第1の実施形態の第2の実施例に適用可能な特化画像データによる特化学習データ302の生成の第1の例について説明する。図15Aは、第1の実施形態の第2の実施例に適用可能な特化学習データ302の生成の第1の例を示す模式図である。既存センサを用いた認識処理に対応する既存画像データによる既存学習データ300は、同図のセクション(a)に模式的に示されるように、1フレームが、それぞれ複数の画素pxが行列状の配列で配置されて構成されている。
第1の実施形態の第2の実施例に適用可能な特化画像データによる特化学習データ302の生成の第2の例について説明する。図15Bは、第1の実施形態の第2の実施例に適用可能な特化学習データ302の生成の第2の例を示す模式図である。既存センサを用いた認識処理に対応する既存画像データによる既存学習データ300は、同図のセクション(a)に模式的に示されるように、1フレームが、それぞれ複数の画素pxが行列状の配列で配置されて構成されている。
第1の実施形態の第2の実施例に適用可能な特化画像データによる特化学習データ302の生成の第3の例について説明する。図15Cは、第1の実施形態の第2の実施例に適用可能な特化学習データ302の生成の第3の例を示す模式図である。既存センサを用いた認識処理に対応する既存画像データによる既存学習データ300は、同図のセクション(a)に模式的に示されるように、1フレームが、それぞれ複数の画素pxが行列状の配列で配置されて構成されている。
第1の実施形態の第2の実施例に適用可能な特化画像データによる特化学習データ302の生成の第4の例について説明する。図15Dは、第1の実施形態の第2の実施例に適用可能な特化学習データ302の生成の第4の例を示す模式図である。既存センサを用いた認識処理に対応する既存画像データによる既存学習データ300は、同図のセクション(a)に模式的に示されるように、1フレームが、それぞれ複数の画素pxが行列状の配列で配置されて構成されている。
第1の実施形態の第2の実施例に適用可能な特化画像データによる特化学習データ302の生成の第5の例について説明する。図15Eは、第1の実施形態の第2の実施例に適用可能な特化学習データ302の生成の第5の例を示す模式図である。既存センサを用いた認識処理に対応する既存画像データによる既存学習データ300は、同図のセクション(a)に模式的に示されるように、1フレームが、それぞれ複数の画素pxが行列状の配列で配置されて構成されている。
第1の実施形態の第2の実施例に適用可能な、既存学習データ300から特化学習データ302を生成する第6の例について説明する。図15Fは、第1の実施形態の第2の実施例に適用可能な特化学習データ302の生成の第6の例を示す模式図である。既存センサを用いた認識処理に対応する既存画像データによる既存学習データ300は、同図のセクション(a)に模式的に示されるように、1フレームが、それぞれ複数の画素pxが行列状の配列で配置されて構成されている。
次に、第1の実施形態の第3の実施例について説明する。第1の実施形態の第3の実施例では、時刻の異なる2枚のフレーム画像(既存学習データ300)から補間画像を生成し、生成した補間画像に対してライン分割あるいはサブサンプリングを行う。図16Aおよび図16Bを用いて、第1の実施形態の第3の実施例による学習データの変換処理について説明する。
次に、第1の実施形態の第4の実施例について説明する。第1の実施形態の第4の実施例では、1枚のフレーム画像(既存学習データ300)からそれぞれ時刻の異なる複数の補間画像を生成し、生成した複数の補間画像に対して、ライン分割あるいはサブサンプリングを行う。このとき、第1の実施形態の第4の実施例では、当該フレーム画像を撮像する際のカメラの動きに基づき、当該複数の補間画像を生成する。
次に、第1の実施形態の第5の実施例について説明する。第1の実施形態の第5の実施例では、1枚のフレーム画像(既存学習データ300)からそれぞれ時刻の異なる複数の補間画像を生成し、生成した複数の補間画像に対して、ライン分割あるいはサブサンプリングを行う。このとき、第1の実施形態の第5の実施例では、当該フレーム画像における被写体の動きを推測して、当該複数の補間画像を生成する。
次に、本開示の第2の実施形態について説明する。第2の実施形態では、上述したように、認識特化センサに係る非フレームベースの画像データによる評価データを、既存認識器に係るフレームベースの画像データによる評価データに変換する。
先ず、第2の実施形態の第1の実施例について説明する。第2の実施形態の第1の実施例は、ライン分割による非フレームベースの特化評価データを、フレームベースの既存評価データに変換する例である。図19A、図19Bおよび図19Cを用いて、第2の実施形態の第1の実施例について説明する。
第2の実施形態の第1の実施例に適用可能な、特化評価データ304から既存評価データ303を生成する第1の例について説明する。この第1の例では、特化評価データ304がライン分割によるラインごとのデータからなり、既存評価データ303を、このラインごとの特化評価データ304に基づき生成する。
第2の実施形態の第1の実施例に適用可能な、特化評価データ304から既存評価データ303を生成する第2の例について説明する。この第2の例では、特化評価データ304がライン分割によるラインごとのデータからなり、既存評価データ303を、このラインごとの特化評価データ304に基づき生成する。ここで、この第2の例では、特化評価データ304が、ライン分割による、間引きされたラインによるデータからなるものとしている。
上述した第1および第2の例では、特化評価データ304がライン分割によるラインごとのデータからなり、ライン分割された各ラインによる特化評価データ304が順次に変換部301eに入力されるように説明したが、これはこの例に限定されない。
次に、第2の実施形態の第2の実施例について説明する。第2の実施形態の第2の実施例は、サブサンプリングによる非フレームベースの特化評価データを、フレームベースの既存評価データに変換する例である。図20A、図20Bおよび図20Cを用いて、第2の実施形態の第2の実施例について説明する。
第2の実施形態の第2の実施例に適用可能な、特化評価データ304から既存評価データ303を生成する第1の例について説明する。
第2の実施形態の第2の実施例に適用可能な、特化評価データ304から既存評価データ303を生成する第2の例について説明する。
上述した第1および第2の例では、各位相Pφ#1、Pφ#2、Pφ#3およびPφ#4に応じた位置でサブサンプリングした特化評価データ304Pφ#1、304Pφ#2、304Pφ#3および304Pφ#4が変換部301fに入力されるように説明したが、これはこの例に限定されない。
次に、第2の実施形態の第3の実施例について説明する。第2の実施形態の第2の実施例は、サブサンプリングによる非フレームベースの特化評価データのフォーマットを変換して、フレームベースの既存評価データを生成する例である。図21A、図21Bおよび図21Cを用いて、第2の実施形態の第3の実施例について説明する。
第2の実施形態の第3の実施例に適用可能な、フォーマット変換により評価データを生成する第1の例について説明する。この第1の例は、ライン間引きによるライン分割を行うことで生成された特化評価データ304から既存評価データ303を生成する例である。
第2の実施形態の第3の実施例に適用可能な、フォーマット変換により評価データを生成する第2の例について説明する。この第1の例は、サブサンプルにより画素を抽出することで生成された特化評価データ304から既存評価データ303を生成する例である。
次に、第2の実施形態の第4の実施例について説明する。第2の実施形態の第4の実施例は、上述した第2の実施形態の第1および第2の実施例と、第3の実施例と、を組み合わせるようにした例である。ここでは、説明のため、第2の実施形態の第1および第2の実施例を纏めて蓄積方式と呼び、第3の実施例を非蓄積方式と呼ぶ。
ここで、(1)解像度、(2)信頼度および(3)処理遅延の各項目に対する、蓄積方式および非蓄積方式の評価について説明する。なお、解像度は、既存評価データの画像としての解像度を示す。信頼度は、既存評価データを用いて評価した既存認識器による認識処理の結果に対する信頼度を示す。また、処理遅延は、変換部301に特化評価データ304を入力したタイミングに対する、入力された当該特化評価データ304に基づく既存評価データ303が変換部301から出力されるタイミングの遅延を示す。
・所定以上の大きさの物体:非蓄積方式>蓄積方式
・所定未満の大きさの物体:蓄積方式>非蓄積方式
・所定以上の動きの物体:非蓄積方式>蓄積方式
・所定未満の動きの物体:蓄積方式>非蓄積方式
次に、蓄積方式による既存評価データ303の生成と、非蓄積方式による既存評価データ303の生成とを並列的に実行する際の、両者のデータの統合方法について説明する。
次に、第2の実施形態の第5の実施例について説明する。第2の実施形態の第5の実施例は、変換部301が既存評価データ303を出力する出力タイミングに関する。図22A~図22Eを用いて、第2の実施形態の第5の実施例について説明する。
第2の実施形態の第5の実施例に係る、既存評価データ303出力タイミングの第1の例について説明する。この第1の例は、蓄積判定部326は、蓄積部323に1フレームの全ての領域の特化評価データ304が蓄積された場合に、既存評価データ303を出力する例である。
第2の実施形態の第5の実施例に係る、既存評価データ303出力タイミングの第2の例について説明する。この第2の例は、蓄積判定部326は、蓄積部323に1フレームの領域のうち所定の割合以上の領域に特化評価データ304が蓄積された場合に、既存評価データ303を出力する例である。
第2の実施形態の第5の実施例に係る、既存評価データ303出力タイミングの第3の例について説明する。この第3の例は、蓄積判定部326は、固定的な時間ごとに既存評価データ303を出力する例である。
次に、本開示の第3の実施形態について説明する。第3の実施形態では、上述したように、既存認識器のネットワークと、特化認識器のネットワークとで同等の出力が得られるように、特化認識器を学習させる例である。
ここで、第3の実施形態に適用可能な蒸留処理について、概略的に説明する。図24は、第3の実施形態に適用可能な蒸留処理を説明するための模式図である。学習済みの(A)既存認識器に対して、(B)既存認識器向け入力データ(既存入力データ)を入力する。(A)既存認識器は、(B)既存認識器向け入力データに対して認識処理を実行し、(C)既存認識出力を出力する。一方、未学習の(D)特化認識器に対して、(E)特化認識器向け入力データ(特化入力データ)を入力する。(D)特化認識器は、(E)特化認識器向け入力データに対して認識処理を実行し、(F)特化認識出力を出力する。
先ず、第3の実施形態の第1の実施例について説明する。第3の実施形態の第1の実施例は、図23を用いて説明したケース#1に対応するもので、特化認識器以外の、既存認識器、既存入力データ、特化入力データ、既存正解データおよび特化正解データが揃っている場合に特化認識器を生成する例である。第3の実施形態の第1の実施例では、上述した一般的な蒸留処理を適用することができる。
次に、第3の実施形態の第2の実施例について説明する。第3の実施形態の第2の実施例は、図23を用いて説明したケース#2に対応するもので、既存認識器、既存入力データ、既存正解データおよび特化正解データが存在し、特化入力データが無い場合に特化認識器を生成する例である。この場合には、既存入力データから特化入力データを生成し、その上で、蒸留を行う。
次に、第3の実施形態の第3の実施例について説明する。第3の実施形態の第3の実施例は、図23を用いて説明したケース#3に対応するもので、既存認識器、特化入力データ、既存正解データおよび特化正解データが存在し、既存入力データが無い場合に特化認識器を生成する例である。この場合には、特化入力データから既存入力データを生成し、その上で、蒸留を行う。
次に、第3の実施形態の第4の実施例について説明する。第3の実施形態の第4の実施例は、図23を用いて説明したケース#4に対応するもので、既存認識器、既存正解データおよび特化正解データが存在し、既存入力データおよび特化入力データが無い場合に特化認識器を生成する例である。第3の実施形態の第4の実施例では、既存認識器に基づき既存入力データを生成し、生成した既存入力データに基づき特化入力データを生成する。このように既存入力データおよび特化入力データを生成した上で、蒸留を行う。
ここで、認識画像抽出部470が既存認識器410から画像401bを抽出、生成する方法について説明する。
次に、第3の実施形態の第5の実施例について説明する。第3の実施形態の第5の実施例は、図23を用いて説明したケース#5に対応するもので、既存認識器、既存正解データおよび特化正解データが存在し、既存入力データおよび特化入力データが無い場合に特化認識器を生成する例である。第3の実施形態の第5の実施例では、特化入力データを所定の方法にて生成し、生成した特化入力データに基づき既存入力データを生成した上で、蒸留を行う。
次に、本開示の第4の実施形態について説明する、第4の実施形態では、上述したように、既存認識器のネットワークを、特化認識器のネットワークに変換する。第4の実施形態では、例えば、ネットワークに含まれる少なくとも1つのレイヤにおいて用いられるフィルタを変換することで、既存認識器のネットワークの特化認識器のネットワークへの変換を実現する。
先ず、第4の実施形態の第1の実施例について説明する。第4の実施形態の第1の実施例は、非フレームベースNW501が、ライン分割による特化学習データ302に対応する例である。第4の実施形態の第1の実施例では、NW変換部311は、非フレームベースNW501による認識出力がフレームベースNW500による認識出力と略一致するように、非フレームベースNW501を作成する。
次に、第4の実施形態の第1の実施例の第1の変形例について説明する。第4の実施形態の第1の実施例の第1の変形例は、上述した第4の実施形態の第1の実施例において、特化認識器の部分的なNW出力が既存認識器の出力と一致するように、蒸留処理を行う例である。より具体的には、この第1の実施例の第1の変形例では、フレームベースNW500および非フレームベースNW501の複数のレイヤにおいて、任意のレイヤの出力が一致するように、蒸留処理を行う。
次に、第4の実施形態の第1の実施例の第2の変形例について説明する。上述した第4の実施形態の第1の実施例では、非フレームベースNW501の1フレーム分の特徴量531と、フレームベースNW500の1フレーム分の特徴量521と、に基づき蒸留処理を行っていたが、これはこの例に限定されない。第4の実施形態の第1の実施例の第2の変形例は、非フレームベースNWによるライン単位の特徴量531と、フレームベースNW500によるフレーム単位の特徴量521の一部と、に基づき蒸留処理を行う例である。
次に、第4の実施形態の第2の実施例について説明する。第4の実施形態の第2の実施例は、非フレームベースNW501が、サブサンプルによる特化学習データ302に対応する例である。第4の実施形態の第2の実施例でも上述の第1の実施例と同様に、NW変換部311は、非フレームベースNW501による認識出力がフレームベースNW500による認識出力と略一致するように、非フレームベースNW501を作成する。
次に、第4の実施形態の第2の実施例の第1の変形例について説明する。第4の実施形態の第2の実施例の第1の変形例は、上述した第4の実施形態の第2の実施例において、特化認識器の部分的なNW出力が既存認識器の出力と一致するように、蒸留処理を行う例である。
次に、第4の実施形態の第2の実施例の第2の変形例について説明する。上述では、NW再構成部512は、レイヤ#2から出力された特徴量541Pφ#1に基づき非フレームベースNW501の再構成を行っているが、これはこの例に限定されない。第4の実施形態の第2の実施例の第2の変形例では、NW再構成部512は、レイヤ#2より後のレイヤの出力に基づき非フレームベースNW501の再構成を行うようにしている。
次に、第4の実施形態の第3の実施例について説明する。第4の実施形態の第3の実施例は、フレームベースNW500において、画像の受容野に該当する領域について選択的に計算を行い、当該フレームベースNW500の更新および蓄積を行うようにした例である。このように、フレームベースNW500において受容野に対して限定的に処理を行うことで、非フレームベースNW501における処理を効率化することが可能となる。
次に、第4の実施形態の第4の実施例について説明する。上述の第4の実施形態の第1~第3の実施例では、NWの前半においてレイヤの変換を行うように説明したが、これはこの例に限定されない、第4の実施形態の第4の実施例は、フレームベースNWに対して非フレームベースNWを追加するようにした例である。
次に、本開示の第5の実施形態について説明する。第5の実施形態では、上述したように、既存認識器310のための学習データの特性を、特化認識器312のネットワークに想定される特性に変換する。
第5の実施形態に係る変換部301jによる変換処理について、概略的に説明する。変換部301jは、画像60から画像61への変換に関し、相互に直接的な変換が不可能な画素特性あるいは信号特性の変換を行う。この場合において、変換部301jが変換対象とする特性として、次の2つの種類が考えられる。
(ア)情報が欠落していて、一意に変換することが困難な特性。
(イ)情報の欠落は無いが、対応関係が不明なため一意に変換することが困難な特性。
第5の実施形態に係る特性変換の具体例について、概略的に説明する。第5の実施形態に係る特性変換は、具体的には、次の2種類の特性変換処理を含む。
次に、第5の実施形態の第1の実施例について説明する。第5の実施形態の第1の実施例では、変換対象となる画像の特性が、画像を取得(撮像)するセンサの特性に依存する例について、より具体的に説明する。
先ず、上記(A)の光線形性に関する変換処理について説明する。図48は、第5の実施形態の第1の実施例に適用可能な、光線形性に関する変換処理を説明するための模式図である。被写体の明るさ(輝度)が線形に増加した場合にセンサ出力値が線形に増加しない場合がある。ここでは、明るさが線形に増加した場合のセンサ出力値の非線形な増加を、光線形性と呼ぶ。
次に、上記(B)のノイズ特性の変換処理について説明する。
次に、第5の実施形態の第2の実施例について説明する。第5の実施形態の第2の実施例では、変換対象となる画像の特性が、画像データに対する信号処理における信号特性に依存する例について、より具体的に説明する。
先ず、上記(C)のビット長の変換処理について説明する。ビット長の変換処理は、上記(E)の階調変換のうち静的変換と関連する変換処理となる。
次に、上記(D)の、HDR合成における変換処理について説明する。
次に、上記(E)の階調変換における、静的変換処理について説明する。ガンマ補正など、1フレームの画像の全体に一律に階調変換が行われる場合がある。ここでは、この、1フレームの画像全体に対する一律の階調変換を、静的階調変換と呼ぶ。
次に、上記(E)の階調変換における、動的変換処理について説明する。動的階調変換の一つであるローカルトーンマッピングなどにおいては、1フレームの画像の領域ごとに異なる階調変換が行われる。ここでは、この1フレームの画像の領域ごとに異なる階調変換を、動的階調変換と呼ぶ。この動的階調変換は、一般に複雑な処理となるため、変換前の状態に一意に戻すことは、困難である。
次に、上記(F)のその他処信号処理における、シェーディング補正処理について説明する。1フレームの画像データにおいて、空間的な位置に応じたゲインやオフセットが加えられる場合がある。ここでは、この、空間的な位置に応じて加えられるゲインやオフセットを、シェーディングと呼ぶ。
次に、本開示の第6の実施形態について説明する。第6の実施形態では、上述したように、既存認識器310のネットワークに入力される評価データの特性を、当該ネットワークに想定される特性に変換する。
次に、本開示の第7の実施形態について説明する。第7の実施形態では、上述したように、特化認識器のネットワークを、既存認識器のネットワークに基づき生成する。すなわち、第7の実施形態では、上述した第3の実施形態と同様に、既存認識器のネットワークと特化認識器のネットワークフレームベース用のネットワークと、非フレームベース用のネットワークとで同等の出力が得られるように、特化認識器を学習させる。
次に、本開示の第8の実施形態について説明する。第8の実施形態では、上述したように、既存認識器のネットワークを、特化認識器のネットワークに変換する。
先ず、第8の実施形態の第1の実施例について説明する。第8の実施形態の第1の実施例は、既存認識器の出力を特化認識器の出力に近似させるように、特化認識器に対して前処理を追加する例である。
次に、第8の実施形態の第2に実施例について説明する。第8の実施形態の第2の実施例では、既存認識器のネットワークに含まれるレイヤにおける係数を変更することで、既存認識器のネットワークの特化認識器のネットワークへの変換を実現する。
係数変換部575によるフィルタ571a1におけるフィルタ係数の変換処理について、より具体的に説明する。
係数変換部575によるバッチ正規化572a1における係数の変換処理について、より具体的に説明する。
次に、第8の実施形態の第3に実施例について説明する。第8の実施形態の第3の実施例では、既存認識器のネットワークに含まれるレイヤあるいはフィルタを変更することで、既存認識器のネットワークの特化認識器のネットワークへの変換を実現する。
レイヤ変換部577による、レイヤ570a1の要素の変更処理について、より具体的に説明する。
次に、本開示の第9の実施形態について説明する。第9の実施形態では、上述したように、既存認識器のための既存学習データに基づき、特化認識器による認識処理を実行するための制御則を生成する。
先ず、第9の実施形態の第1の実施例について説明する。第9の実施形態の第1の実施例では、制御則を生成するための情報の生成について説明する。第9の実施形態の第1の実施例に係る処理は、図2Bに示した学習システム3のデータ生成部30における変換部301により、既存学習データ300に基づき特化制御則313を生成する処理となる。より具体的には、第9の実施形態の第1の実施例では、変換部301には、既存学習データ300に基づく統計量を求める。
次に、第9の実施形態の第1の実施例の第1の例について説明する。第1の実施例の第1の例は、ラインごとの情報に基づき統計量711を求める例である。
次に、第9の実施形態の第1の実施例の第2の例について説明する。第1の実施例の第2の例は、既存学習データ400に含まれる各画像70の明るさに応じて、統計量としての明るさ変化モデルを求める例である。
次に、第9の実施形態の第2の実施例について説明する。第9の実施形態の第2の実施例は、上述した第9の実施形態の第1の実施例で生成した統計量711を用いて、スケジューリング制御を行う例である。
第9の実施形態の第2の実施例の第1の例について説明する。図64は、第9の実施形態の第2の実施例の第1の例による処理を説明するための模式図である。図64に示される変換部301mにおいて、スケジューリング部740aは、図61を用いて説明した、ラインごとの情報から求めた統計量711aに基づき、ライン制御を行う。
次に、第9の実施形態の第2の実施例の第2の例について説明する。図65は、第9の実施形態の第2の実施例の第2の例による処理を説明するための模式図である。図65に示される変換部301nにおいて、スケジューリング部740bは、入力された統計量711に対してランダムネス情報742に応じてランダムの要素を加えて、制御指令741bを生成する。
次に、第9の実施形態の第2の実施例の第3の例について説明する。図66は、第9の実施形態の第2の実施例の第3の例による処理を説明するための模式図である。図66に示される変換部301oにおいて、スケジューリング部740cは、統計量711と、サブサンプルライン制御制約情報743と、に基づき制御指令741cを生成する。
次に、第9の実施形態の第3の実施例について説明する。第9の実施形態の第2の実施例は、既存学習データに基づき、認識器の制御を学習するための制御学習データを生成する例である。
次に、第9の実施形態の第4の実施例について説明する。第9の実施形態の第4の実施例は、特化認識器による認識処理を実行するためのダミーの制御則を用いて制御学習データを収集し、その後、制御学習データによる学習を、ダミーの制御則による学習とは独立に実行するようにした例である。
次に、本開示の第10の実施形態について説明する。第10の実施形態では、上述したように、認識特化センサの出力データに基づき特化認識器による認識処理を実行するための制御則を生成する。
先ず、第10の実施形態の第1の実施例について説明する。第10の実施形態の第1の実施例では、既存認識器の学習時に当該既存認識器に組み込むモジュールの出力を利用して、特化認識器による認識処理を実行するための制御則を生成する。第10の実施形態の第1の実施例に係る処理は、図2Bに示した学習システム3のデータ生成部30における変換部301により、特化学習データ302に基づき特化制御則313を生成する処理となる。
第10の実施形態の第1の実施例について、より具体的な例を用いて説明する。この例では、空間的に注目領域を明示化するアテンション技術を適用し、当該使用領域を示すアテンションマップを、参考情報出力部752が出力する参考情報として用いる。
次に、第10の実施形態の第2の実施例について説明する。第10の実施形態の第2の実施例は、既存認識器をそのまま用いて、特化認識器による認識処理を実行するための制御則を生成する。より具体的には、第10の実施形態の第2の実施例では、上述した参考情報出力部752を組み込まずに撮像の制御を行い、評価データを生成する。
次に、本開示の第11の実施形態について説明する。第11の実施形態では、上述したように、既存認識器の出力に基づき特化認識器による認識処理を実行するための制御則を生成する。
先ず、第11の実施形態の第1の実施例について説明する。第11の実施形態の第1の実施例は、図23および図78を用いて説明したケース#1に対応するもので、処理構成としては、図28示した構成に対応する。すなわち、第11の実施形態の第1の実施例では、特化認識器以外の、既存認識器、既存入力データ、特化入力データ、既存正解データおよび特化正解データが揃っている場合に、特化認識器と、当該特化認識器を制御するための制御則と、を生成する例である。第11の実施形態の第1の実施例では、上述した一般的な蒸留処理を適用することができる。
次に、第11の実施形態の第2の実施例について説明する。図80は、第11の実施形態の第2の実施例に係る処理を説明するための模式図である。第11の実施形態の第2の実施例は、上述した第11の実施形態の第1の実施例と同様に、図23および図78を用いて説明したケース#1に対応するもので、処理構成としては、図79に示した構成に対応する。
ここで、制御則に関する蒸留処理について説明する。制御則生成部781により生成される制御則に対して蒸留処理を実施することが可能である。換言すれば、特化認識器420に対する蒸留処理を、当該特化認識器420に適用された制御則を含めて、実行することが可能である。この制御則の蒸留の第1、第2および第3の実施方法について、図79を用いて説明した構成を例にとって説明する。
次に、第11の実施形態の第3の実施例について説明する。第11の実施形態の第3の実施例は、図23および図78を用いて説明したケース#2に対応するもので、処理構成としては、図29に示した構成に対応する。すなわち、第11の実施形態の第3の実施例では、既存認識器、既存入力データ、既存正解データおよび特化正解データが存在し、特化入力データが無い場合に特化認識器と、当該特化認識器を制御するための制御則と、を生成する例である。第11の実施形態の第3の実施例では、第3の実施形態の第2の実施例と同様に、既存入力データから特化入力データを生成し、その上で、蒸留を行う。
次に、第11の実施形態の第4の実施例について説明する。第11の実施形態の第4の実施例は、図23および図78を用いて説明したケース#3に対応するもので、処理構成としては、図30示した構成に対応する。すなわち、第11の実施形態の第4の実施例では、既存認識器、特化入力データ、既存正解データおよび特化正解データが存在し、既存入力データが無い場合に特化認識器と、当該特化認識器を制御するための制御則と、を生成する例である。第11の実施形態の第4の実施例では、第3の実施形態の第3の実施例と同様に、特化入力データから既存入力データを生成し、その上で、蒸留を行う。
次に、第11の実施形態の第5の実施例について説明する。第11の実施形態の第5の実施例は、図23および図78を用いて説明したケース#4に対応するもので、処理構成としては、図31Aに示した構成に対応する。すなわち、第11の実施形態の第5の実施例では、既存認識器、既存正解データおよび特化正解データが存在し、既存入力データおよび特化入力データが無い場合に特化認識器と、当該特化認識器を制御するための制御則と、を生成する例である。
次に、第11の実施形態の第6の実施例について説明する。第11の実施形態の第6の実施例は、図23および図78を用いて説明したケース#5に対応するもので、処理構成としては、図32に示した構成に対応する。すなわち、第11の実施形態の第6の実施例では、既存認識器、既存正解データおよび特化正解データが存在し、既存入力データおよび特化入力データが無い場合に特化認識器と、当該特化認識器を制御するための制御則と、を生成する例である。
次に、第11の実施形態の第6の実施例の変形例について説明する。図85は、第11の実施形態の第6の実施例の変形例による処理を説明するための模式図である。
次に、本開示の第12の実施形態について説明する。第12の実施形態では、上述したように、既存センサを用いた場合と認識特化センサを用いた場合とで認識器の出力が一致または近似するように、既存認識器のネットワークの少なくとも1つの処理単位(レイヤ、フィルタなど)を例えばNW変換部311により変換することで、特化認識器を生成する。
次に、第12の実施形態の第1の実施例について説明する。図87は、第12の実施形態の第1の実施例に係る処理を説明するための模式図である。図87では、図86のセクション(b)に示した変換中段処理部821に注目して示している。
次に、第12の実施形態の第2の実施例について説明する。図88は、第12の実施形態の第2の実施例に係る処理を説明するための模式図である。図88では、図86のセクション(b)に示した変換中段処理部821に注目して示している。
(1)
第1のセンサから第1の読み出し単位で読み出された第1の信号に基づき認識処理を行う第1の認識器を学習させるための第1の学習データに基づき、前記第1のセンサに対して読み出し単位と信号特性と画素特性とのうち少なくとも1つが異なる第2のセンサから読み出された第2の信号に基づき認識処理を行う第2の認識器を学習させるための第2の学習データを生成する生成部、
を備える、
情報処理装置。
(2)
前記第2のセンサは、前記第1のセンサに対して、前記読み出し単位と前記信号特性と前記画素特性とのうち少なくとも前記読み出し単位が異なり、
前記第1の読み出し単位は1フレームであり、前記第2のセンサの第2の読み出し単位は前記1フレームより小さい、
前記(1)に記載の情報処理装置。
(3)
前記生成部は、
前記第1の学習データを、前記第2の読み出し単位に応じて変換することで、前記第2の学習データを生成する、
前記(2)に記載の情報処理装置。
(4)
前記生成部は、
互いに時刻が異なる複数の前記第1の信号による複数の第1の画像に基づき、前記複数の第1の画像それぞれと時刻が異なる複数の第2の画像を生成し、
前記複数の第2の画像に基づき前記第2の学習データを生成する、
前記(2)または(3)に記載の情報処理装置。
(5)
前記生成部は、
前記第1の信号による1つの画像に基づき、前記1つの画像とそれぞれ時刻が異なる複数の第2の画像を生成し、前記複数の第2の画像に基づき前記第2の学習データを生成する、
前記(2)または(3)に記載の情報処理装置。
(6)
前記生成部は、
前記第1のセンサの動きを示す情報に基づき前記複数の第2の画像を生成する、
前記(5)に記載の情報処理装置。
(7)
前記生成部は、
前記1つの画像に含まれる被写体の動きを示す情報に基づき前記複数の第2の画像を生成する、
前記(5)または(6)に記載の情報処理装置。
(8)
前記生成部は、
前記第1の学習データと、前記第1のセンサに対する制御の範囲を示す制御範囲とに基づき、前記第1の信号に関する前記制御範囲に対応した統計量を推定し、推定した前記統計量に基づき前記制御情報を生成する、
前記(2)乃至(7)の何れかに記載の情報処理装置。
(9)
前記生成部は、
前記統計量に基づき、前記第2のセンサからの読み出しタイミングを制御するための前記制御情報を生成する、
前記(8)に記載の情報処理装置。
(10)
前記統計量は、前記第1の信号に含まれる認識対象のラインごとの出現頻度を示し、
前記生成部は、
前記出現頻度に応じて前記タイミングを制御するための前記制御情報を生成する、
前記(9)に記載の情報処理装置。
(11)
前記生成部は、
前記タイミングに対してランダム要素を加えて前記制御情報を生成する、
前記(9)または(10)に記載の情報処理装置。
(12)
前記生成部は、
前記第1のセンサを制御する際の制約である制御制約をさらに用いて、前記制御情報を生成する、
前記(8)乃至(11)の何れかに記載の情報処理装置。
(13)
前記生成部は、
前記第1の学習データに対して時系列情報を付加した情報に対して前記制御情報に従ったサンプリング位置でのサンプリングを行い、サンプリングを行った前記情報を用いた学習に応じて前記制御情報を更新し、
更新した前記制御情報に応じて前記時系列情報と前記サンプリング位置とを更新する、
前記(8)に記載の情報処理装置。
(14)
前記生成部は、
前記第1の学習データに対してダミー制御情報を用いて前記第2の認識器の制御を学習するための制御学習データを生成し、生成した前記制御学習データを用いた学習に応じて前記制御情報を生成する、
前記(8)に記載の情報処理装置。
(15)
前記生成部は、
前記第2のセンサの第2の画素特性または第2の信号特性に対して、前記第1のセンサの第1の画素特性または第1の信号特性に情報の欠落がある場合、前記第1の画素特性または前記第1の信号特性の前記第2の画素特性または前記第2の信号特性への近似を行うことで、前記第1の学習データを前記第2の学習データに変換する、
前記(2)乃至(14)の何れかに記載の情報処理装置。
(16)
前記変換部は、
前記第1の画素特性または前記第1の信号特性の、前記第2の画素特性または前記第2の信号特性に対して前記情報の欠落により欠落した情報を、線形補間を用いて補間することで前記近似を行う、
前記(15)に記載の情報処理装置。
(17)
前記変換部は、
前記情報の欠落により欠落した情報がノイズ情報である場合、前記第1の画素特性または前記第1の信号特性にノイズを付加することで前記近似を行う、
前記(15)に記載の情報処理装置。
(18)
前記変換部は、
前記情報の欠落により欠落した情報がSNR(Signal-Noise Ratio)である場合、前記第1の画素特性または前記第1の信号特性にノイズリダクション処理を施すことで、前記近似を行う、
前記(15)に記載の情報処理装置。
(19)
前記変換部は、
前記第2のセンサの第2の画素特性または第2の信号特性に対して、前記第1のセンサの第1の画素特性または第1の信号特性に情報の欠落がある場合、前記情報の欠落により欠落した情報を推測することで、前記第1の学習データを前記第2の学習データに変換する、
前記(2)乃至(14)の何れかに記載の情報処理装置。
(20)
前記変換部は、
前記第1のセンサの第1の画素特性または第1の信号特性と、前記第2のセンサの第2の画素特性または第2の信号特性との対応関係が不明の場合、プリセット情報に基づき前記第1の画素特性または前記第1の信号特性を前記第2の画素特性または前記第2の信号特性に変換する、
前記(2)乃至(14)の何れかに記載の情報処理装置。
(21)
前記変換部は、前記プリセット情報としてノイズ特性を用いる、
前記(20)に記載の情報処理装置。
(22)
前記変換部は、前記プリセット情報として信号処理パイプラインを用いる、
前記(20)に記載の情報処理装置。
(23)
前記変換部は、
前記第1のセンサの第1の画素特性または第1の信号特性と、前記第2のセンサの第2の画素特性または第2の信号特性との対応関係が不明の場合、前記第1の画素特性または前記第1の信号特性が変換される前記第2の画素特性または前記第2の信号特性を推測する、
前記(2)乃至(4)の何れかに記載の情報処理装置。
(24)
前記変換部は、ノイズ特性を推測し、推測された該ノイズ特性を用いて前記第1の画素特性または前記第1の信号特性を、前記第2の画素特性または前記第2の信号特性に変換する、
前記(23)に記載の情報処理装置。
(25)
前記変換部は、信号処理パイプラインを推測し、推測された該信号処理パイプラインを用いて前記第1の画素特性または前記第1の信号特性を、前記第2の画素特性または前記第2の信号特性に変換する、
前記(23)に記載の情報処理装置。
(26)
前記第1の画素特性および前記第2の画素特性は、前記第1の信号および前記第2の信号の光線形性である、
前記(15)乃至(25)の何れかに記載の情報処理装置。
(27)
前記第1の画素特性および前記第2の画素特性は、
前記第1の信号および前記第2の信号のノイズ特性である、
前記(15)乃至(26)の何れかに記載の情報処理装置。
(28)
前記第1の信号特性および前記第2の信号特性は、
前記第1の信号および前記第2の信号のビット長である、
前記(15)乃至(27)の何れかに記載の情報処理装置。
(29)
前記第1の信号特性および前記第2の信号特性は、
前記第1の信号、および、前記第2の認識器または前記第2のデータセットに対応する第2の信号におけるハイダイナミックレンジ合成の有無である、
前記(15)乃至(28)の何れかに記載の情報処理装置。
(30)
前記第1の信号特性および前記第2の信号特性は、
前記第1の信号および前記第2の信号の静的な階調特性である、
前記(15)乃至(29)の何れかに記載の情報処理装置。
(31)
前記第1の信号特性および前記第2の信号特性は、
前記第1の信号および前記第2の信号におけるシェーディング特性である、
前記(15)乃至(30)の何れかに記載の情報処理装置。
(32)
プロセッサにより実行される、
第1のセンサから第1の読み出し単位で読み出された第1の信号に基づき認識処理を行う第1の認識器を学習させるための第1の学習データに基づき、前記第1のセンサに対して読み出し単位と信号特性と画素特性とのうち少なくとも1つが異なる第2のセンサから読み出された第2の信号に基づき認識処理を行う第2の認識器を学習させるための第2の学習データを生成する生成ステップ、
を有する、
情報処理方法。
(33)
プロセッサに、
第1のセンサから第1の読み出し単位で読み出された第1の信号に基づき認識処理を行う第1の認識器を学習させるための第1の学習データに基づき、前記第1のセンサに対して読み出し単位と信号特性と画素特性とのうち少なくとも1つが異なる第2のセンサから読み出された第2の信号に基づき認識処理を行う第2の認識器を学習させるための第2の学習データを生成する生成ステップ、
を実行させるための情報処理プログラム。
(34)
第1のセンサから第1の読み出し単位で読み出された第1の信号に基づき認識処理を行う第1の認識器を学習させるための第1の学習データに基づき、前記第1のセンサに対して読み出し単位と信号特性と画素特性とのうち少なくとも1つが異なる第2のセンサから読み出された第2の信号に基づき認識処理を行う第2の認識器を学習させるための第2の学習データを生成する生成部を有する学習装置と、
前記第2の認識器を含む認識装置と、
を含む、
情報処理システム。
2 認識システム
3 学習システム
10,10a,10b,10c,10d,10e センサ部
11 撮像部
12,301,301a,301b,301c,301d,301e,301f,301g,301h,301i,301j,301k,301k-1,301k-2,301l,301m,301n,301o,301p,301q,301r,301r-1,301r-2 変換部
13 撮像制御部
20 認識部
30 データ生成部
31 認識器生成部
40 カメラ情報
41 カメラ動き情報
60,61,401,401a,401b,441a,441b,441c,520,522,522Pφ#1,522Pφ#2,522Pφ#3,522Pφ#4,530,540Pφ#1,550 画像
611,612,613,631,632,641,642,661,662,663,671,672,673 補間画像
74 他センサ情報
75 被写体動き情報
300,300a,300b,400,400a,400b,400c 既存学習データ
302,302L#1,302L#2,302L#3,302L#4,302L#(3+n/2),302L#(1+n/2),302L#(2+n/2),302L#(4+n/2),302Ls#1,302Ls#2,302Ls#3,302Lp#1,302Lp#2,302Lp#3,302Lpr#1,302Lpr#2,302Lpr#3,302Pφ#1-1,302Pφ#2-1,302Pφ#1-2,302Pφ#1,302Pφ#2,302Pφ#3,302Pφ#4,302Ar#1-1,302Ar#1-2,302Ar#2-2,302Ar#4-2,302Ar#2-4,302Ar#4-4,302Pt#1-1,302Pt#2-1,302Pt#1-2,302Pt#2-2,302Rd#m_1,302Rd#m_2,302Rd#m_n,302Rd#(m+1)_1,440 特化学習データ
303,303Lt,303(1),303(2),303(10),303(11),303(12),303(ALL) 既存評価データ
304,304L#1,304L#2,304L#3,304L#4,304L#5,304L#6,304L#7,304L#8,304L#9,304L#10,304L#11,304L#12,304Pφ#1,304Pφ#2,304Pφ#3,304Pφ#4,304Lt 特化評価データ
310,410,410’,810 既存認識器
311,311a,311b,311c,311d,311e NW変換部
312,420,820 特化認識器
313 特化制御則
320,320a,320b フレームデータ分割部
321a,321b,321c 補間画像生成部
322 蓄積・更新処理部
323 蓄積部
324 フォーマット変換部
325 蓄積処理部
326 蓄積判定部
3301,3302,330N 特性変換部
402,442 正解データ
411,411’,801,803 既存認識出力
421 特化認識出力
430 認識出力間誤差計算部
431 最小化誤差
460 既存/特化変換部
461 特化/既存変換部
462,766 画像生成部
470 認識画像抽出部
500,500a,500b フレームベースNW
501,501b 非フレームベースNW
502 通常特性用NW
503 特化特性用NW
510 フィルタ変換レイヤ選択部
511a,511b フィルタ変換部
512 NW再構成部
513,516 2次元フィルタ
514 水平フィルタ
515 垂直フィルタ
517Pφ#1,517Pφ#2,517Pφ#3,517Pφ#4,571a1,571a2,571b,571c フィルタ
518 マスク処理追加レイヤ選択部
519 マスク処理追加部
521,531,541,541Pφ#1,551,580,582a,582b,582c,583,584,586Pφ#1,586Pφ#2,586Pφ#3,586Pφ#4,587 特徴量
561a,562a 注目領域
561b,562b 受容野
570a1,570a2,570b,570c レイヤ
572a1,572a2,572b,572c バッチ正規化
573a1,573a2,573b,573c 活性化関数
575 係数変換部
576 特性解析部
577 レイヤ変換部
700,700a,700b 統計量推定部
710,736,783 制御範囲
711,711a 統計量
712 サブサンプルライン制御範囲
713 ゲイン制御範囲
714 明るさ推定部
720 制御学習データ生成部
721,792 制御学習データ
730 画像変形部
731,780 サンプリング部
732 制御結果画像
733,793 制御学習部
734 制御生成部
735 時系列生成部
737 時系列情報
740,740a,740b,740c スケジューリング部
741,741a,741b,741c 制御指令
742 ランダムネス情報
743 サブサンプルライン制御制約情報
750,750a,750b 認識器
751 共通部
752,752a 参考情報出力部
753 認識処理部
760 学習部
761,761a,824a,824b 制御情報生成部
762 制御範囲
765 観測画像
767 認識画像
768a,768b,768c 経路
770 乗算器
771 アテンション生成レイヤ
772 アテンションマップ
772a1,772a2,772a3 対象領域
772b 注目領域情報
774 中間特徴量
776 注目領域選択部
781 制御則生成部
782 制御制約推定部
790 環境生成部
791 ダミー制御データ
794 制御制約情報
795 制御則
800 既存センサ出力
811 前段処理部
812 中段処理部
813 後段処理部
821,821a 変換中段処理部
822,822a,822b 制御情報
823 制御特徴量生成部
825 必要特性推定部
830 認識特化センサ
Claims (20)
- 第1のセンサから第1の読み出し単位で読み出された第1の信号に基づき認識処理を行う第1の認識器を学習させるための第1の学習データに基づき、前記第1のセンサに対して読み出し単位と信号特性と画素特性とのうち少なくとも1つが異なる第2のセンサから読み出された第2の信号に基づき認識処理を行う第2の認識器を学習させるための第2の学習データを生成する生成部、
を備える、
情報処理装置。 - 前記第2のセンサは、前記第1のセンサに対して、前記読み出し単位と前記信号特性と前記画素特性とのうち少なくとも前記読み出し単位が異なり、
前記第1の読み出し単位は1フレームであり、前記第2のセンサの第2の読み出し単位は前記1フレームより小さい、
請求項1に記載の情報処理装置。 - 前記生成部は、
前記第1の学習データを、前記第2の読み出し単位に応じて変換することで、前記第2の学習データを生成する、
請求項2に記載の情報処理装置。 - 前記生成部は、
互いに時刻が異なる複数の前記第1の信号による複数の第1の画像に基づき、前記複数の第1の画像それぞれと時刻が異なる複数の第2の画像を生成し、
前記複数の第2の画像に基づき前記第2の学習データを生成する、
請求項2に記載の情報処理装置。 - 前記生成部は、
前記第1の信号による1つの画像に基づき、前記1つの画像とそれぞれ時刻が異なる複数の第2の画像を生成し、前記複数の第2の画像に基づき前記第2の学習データを生成する、
請求項2に記載の情報処理装置。 - 前記生成部は、
前記第1のセンサの動きを示す情報に基づき前記複数の第2の画像を生成する、
請求項5に記載の情報処理装置。 - 前記生成部は、
前記1つの画像に含まれる被写体の動きを示す情報に基づき前記複数の第2の画像を生成する、
請求項5に記載の情報処理装置。 - 前記生成部は、
前記第1の学習データと、前記第1のセンサに対する制御の範囲を示す制御範囲とに基づき、前記第1の信号に関する前記制御範囲に対応した統計量を推定し、推定した前記統計量に基づき前記第2の認識器による認識処理を制御するための制御情報を生成する、
請求項2に記載の情報処理装置。 - 前記生成部は、
前記統計量に基づき、前記第2のセンサからの読み出しのタイミングを制御するための前記制御情報を生成する、
請求項8に記載の情報処理装置。 - 前記生成部は、
前記第1のセンサを制御する際の制約である制御制約をさらに用いて、前記制御情報を生成する、
請求項8に記載の情報処理装置。 - 前記生成部は、
前記第1の学習データに対して時系列情報を付加した情報に対して前記制御情報に従ったサンプリング位置でのサンプリングを行い、サンプリングを行った前記情報を用いた学習に応じて前記制御情報を更新し、
更新した前記制御情報に応じて前記時系列情報と前記サンプリング位置とを更新する、
請求項8に記載の情報処理装置。 - 前記生成部は、
前記第1の学習データに対してダミー制御情報を用いて前記第2の認識器の制御を学習するための制御学習データを生成し、生成した前記制御学習データを用いた学習に応じて前記制御情報を生成する、
請求項8に記載の情報処理装置。 - 前記生成部は、
前記第2のセンサの第2の画素特性または第2の信号特性に対して、前記第1のセンサの第1の画素特性または第1の信号特性に情報の欠落がある場合、前記第1の画素特性または前記第1の信号特性の前記第2の画素特性または前記第2の信号特性への近似を行うことで、前記第1の学習データを前記第2の学習データに変換する、
請求項2に記載の情報処理装置。 - 前記生成部は、
前記第2のセンサの第2の画素特性または第2の信号特性に対して、前記第1のセンサの第1の画素特性または第1の信号特性に情報の欠落がある場合、前記情報の欠落により欠落した情報を推測することで、前記第1の学習データを前記第2の学習データに変換する、
請求項2に記載の情報処理装置。 - 前記生成部は、
前記第1のセンサの第1の画素特性または第1の信号特性と、前記第2のセンサの第2の画素特性または第2の信号特性との対応関係が不明の場合、プリセット情報に基づき前記第1の画素特性または前記第1の信号特性を前記第2の画素特性または前記第2の信号特性に変換する、
請求項2に記載の情報処理装置。 - 前記生成部は、
前記第1のセンサの第1の画素特性または第1の信号特性と、前記第2のセンサの第2の画素特性または第2の信号特性との対応関係が不明の場合、前記第1の画素特性または前記第1の信号特性が変換される前記第2の画素特性または前記第2の信号特性を推測する、
請求項2に記載の情報処理装置。 - 前記第1の画素特性および前記第2の画素特性は、前記第1の信号および前記第2の信号の光線形性、ノイズ特性と、ビット長と、前記第1の信号、および、前記第2の認識器または前記第2の学習データに対応する第2の信号におけるハイダイナミックレンジ合成の有無と、静的な階調特性と、シェーディング特性と、のうち少なくとも1つである、
請求項13に記載の情報処理装置。 - プロセッサにより実行される、
第1のセンサから第1の読み出し単位で読み出された第1の信号に基づき認識処理を行う第1の認識器を学習させるための第1の学習データに基づき、前記第1のセンサに対して読み出し単位と信号特性と画素特性とのうち少なくとも1つが異なる第2のセンサから読み出された第2の信号に基づき認識処理を行う第2の認識器を学習させるための第2の学習データを生成する生成ステップ、
を有する、
情報処理方法。 - プロセッサに、
第1のセンサから第1の読み出し単位で読み出された第1の信号に基づき認識処理を行う第1の認識器を学習させるための第1の学習データに基づき、前記第1のセンサに対して読み出し単位と信号特性と画素特性とのうち少なくとも1つが異なる第2のセンサから読み出された第2の信号に基づき認識処理を行う第2の認識器を学習させるための第2の学習データを生成する生成ステップ、
を実行させるための情報処理プログラム。 - 第1のセンサから第1の読み出し単位で読み出された第1の信号に基づき認識処理を行う第1の認識器を学習させるための第1の学習データに基づき、前記第1のセンサに対して読み出し単位と信号特性と画素特性とのうち少なくとも1つが異なる第2のセンサから読み出された第2の信号に基づき認識処理を行う第2の認識器を学習させるための第2の学習データを生成する生成部を有する学習装置と、
前記第2の認識器を含む認識装置と、
を含む、
情報処理システム。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2023570890A JPWO2023127614A1 (ja) | 2021-12-28 | 2022-12-21 | |
CN202280084927.0A CN118489253A (zh) | 2021-12-28 | 2022-12-21 | 信息处理设备、信息处理方法、信息处理程序和信息处理系统 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2021-213709 | 2021-12-28 | ||
JP2021213709 | 2021-12-28 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023127614A1 true WO2023127614A1 (ja) | 2023-07-06 |
Family
ID=86998907
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2022/047000 WO2023127614A1 (ja) | 2021-12-28 | 2022-12-21 | 情報処理装置、情報処理方法、情報処理プログラムおよび情報処理システム |
Country Status (3)
Country | Link |
---|---|
JP (1) | JPWO2023127614A1 (ja) |
CN (1) | CN118489253A (ja) |
WO (1) | WO2023127614A1 (ja) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2017102838A (ja) * | 2015-12-04 | 2017-06-08 | トヨタ自動車株式会社 | 物体認識アルゴリズムの機械学習のためのデータベース構築システム |
JP2019032821A (ja) * | 2017-06-26 | 2019-02-28 | コニカ ミノルタ ラボラトリー ユー.エス.エー.,インコーポレイテッド | ニューラルネットワークによる画風変換を用いたデータオーグメンテーション技術 |
JP2020030681A (ja) * | 2018-08-23 | 2020-02-27 | ファナック株式会社 | 画像処理装置 |
WO2020045685A1 (ja) * | 2018-08-31 | 2020-03-05 | ソニー株式会社 | 撮像装置、撮像システム、撮像方法および撮像プログラム |
-
2022
- 2022-12-21 WO PCT/JP2022/047000 patent/WO2023127614A1/ja active Application Filing
- 2022-12-21 CN CN202280084927.0A patent/CN118489253A/zh active Pending
- 2022-12-21 JP JP2023570890A patent/JPWO2023127614A1/ja active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2017102838A (ja) * | 2015-12-04 | 2017-06-08 | トヨタ自動車株式会社 | 物体認識アルゴリズムの機械学習のためのデータベース構築システム |
JP2019032821A (ja) * | 2017-06-26 | 2019-02-28 | コニカ ミノルタ ラボラトリー ユー.エス.エー.,インコーポレイテッド | ニューラルネットワークによる画風変換を用いたデータオーグメンテーション技術 |
JP2020030681A (ja) * | 2018-08-23 | 2020-02-27 | ファナック株式会社 | 画像処理装置 |
WO2020045685A1 (ja) * | 2018-08-31 | 2020-03-05 | ソニー株式会社 | 撮像装置、撮像システム、撮像方法および撮像プログラム |
JP2020039123A (ja) | 2018-08-31 | 2020-03-12 | ソニー株式会社 | 撮像装置、撮像システム、撮像方法および撮像プログラム |
Non-Patent Citations (2)
Title |
---|
GAURAV KUMAR NAYAKKONDA REDDY MOPURIVAISAKH SHAJR. VENKATESH BABUANIRBAN CHAKRABORTY, ZERO-SHOT KNOWLEDGE DISTILLATION IN DEEP NETWORKS, 20 May 2019 (2019-05-20) |
KARTIKEYA BHARDWAJNAVEEN SUDARADU MARCULESCU, DREAM DISTILLATION: A DATA-INDEPENDENT MODEL COMPRESSION FRAMEWORK, 17 May 2019 (2019-05-17) |
Also Published As
Publication number | Publication date |
---|---|
CN118489253A (zh) | 2024-08-13 |
JPWO2023127614A1 (ja) | 2023-07-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhao et al. | Hierarchical regression network for spectral reconstruction from RGB images | |
CN110969577B (zh) | 一种基于深度双重注意力网络的视频超分辨率重建方法 | |
CN115442515B (zh) | 图像处理方法和设备 | |
CN111833246B (zh) | 基于注意力级联网络的单帧图像超分辨方法 | |
Guo et al. | Deep wavelet prediction for image super-resolution | |
CN110717851A (zh) | 图像处理方法及装置、神经网络的训练方法、存储介质 | |
CN112070664B (zh) | 一种图像处理方法以及装置 | |
CN109064405A (zh) | 一种基于双路径网络的多尺度图像超分辨率方法 | |
CN111553867B (zh) | 一种图像去模糊方法、装置、计算机设备及存储介质 | |
CN112132741A (zh) | 一种人脸照片图像和素描图像的转换方法及系统 | |
CN111369466A (zh) | 一种基于可变形卷积的卷积神经网络的图像畸变矫正增强方法 | |
Saleh et al. | Adaptive uncertainty distribution in deep learning for unsupervised underwater image enhancement | |
CN116547694A (zh) | 用于对模糊图像去模糊的方法和系统 | |
CN115004220B (zh) | 用于原始低光图像增强的神经网络 | |
Ma et al. | Gaussian pyramid of conditional generative adversarial network for real-world noisy image denoising | |
CN114549361A (zh) | 一种基于改进U-Net模型的去图像运动模糊方法 | |
WO2023127614A1 (ja) | 情報処理装置、情報処理方法、情報処理プログラムおよび情報処理システム | |
WO2023127653A1 (ja) | 情報処理装置、情報処理方法、情報処理プログラムおよび情報処理システム | |
WO2023127616A1 (ja) | 情報処理装置、情報処理方法、情報処理プログラムおよび情報処理システム | |
WO2023127612A1 (ja) | 情報処理装置、情報処理方法、情報処理プログラムおよび情報処理システム | |
WO2023127613A1 (ja) | 情報処理装置、情報処理方法、情報処理プログラムおよび情報処理システム | |
WO2023127654A1 (ja) | 情報処理装置、情報処理方法、情報処理プログラムおよび情報処理システム | |
WO2023127615A1 (ja) | 情報処理装置、情報処理方法、情報処理プログラムおよび情報処理システム | |
CN115311149A (zh) | 图像去噪方法、模型、计算机可读存储介质及终端设备 | |
CN115187488A (zh) | 图像处理方法及装置、电子设备、存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22915838 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2023570890 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 202427049286 Country of ref document: IN |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2022915838 Country of ref document: EP Effective date: 20240729 |