WO2023140026A1 - Dispositif de traitement d'informations - Google Patents

Dispositif de traitement d'informations Download PDF

Info

Publication number
WO2023140026A1
WO2023140026A1 PCT/JP2022/047125 JP2022047125W WO2023140026A1 WO 2023140026 A1 WO2023140026 A1 WO 2023140026A1 JP 2022047125 W JP2022047125 W JP 2022047125W WO 2023140026 A1 WO2023140026 A1 WO 2023140026A1
Authority
WO
WIPO (PCT)
Prior art keywords
pixel
pixels
data
learning
pixel data
Prior art date
Application number
PCT/JP2022/047125
Other languages
English (en)
Japanese (ja)
Inventor
俊哉 由良
Original Assignee
ソニーセミコンダクタソリューションズ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ソニーセミコンダクタソリューションズ株式会社 filed Critical ソニーセミコンダクタソリューションズ株式会社
Publication of WO2023140026A1 publication Critical patent/WO2023140026A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N25/00Circuitry of solid-state image sensors [SSIS]; Control thereof
    • H04N25/10Circuitry of solid-state image sensors [SSIS]; Control thereof for transforming different wavelengths into image signals

Definitions

  • This technology relates to an information processing device that generates a learning model for inference using image data as input data, and a technology for an information processing device that performs inference processing using the learning model.
  • Inference processing within an image sensor generally obtains an inference result by providing a learning model with image data obtained by applying interpolation processing such as demosaic processing to captured images as input data (for example, Patent Document 1 below).
  • the learning model used in this way is obtained by learning using general RGB images. Therefore, in order to fully demonstrate the performance of the learning model, it is preferable to input the same data as the input data used for learning. That is, an inference result is obtained by generating an RGB image or the like by applying interpolation processing such as demosaic processing to pixel data in the image sensor and inputting it to the learning model.
  • This technology has been developed in view of such problems, and aims to provide a technology for appropriately handling learning models within an image sensor.
  • An information processing apparatus includes a learning model generation unit that generates a learning model by performing machine learning using color pixel data generated by extracting pixel data of the same color among pixel data obtained from a pixel array unit in which unit pixel groups in which a plurality of types of pixels having mutually different color filters are arranged in a predetermined pattern are arranged two-dimensionally as learning input data.
  • a learning model generation unit that generates a learning model by performing machine learning using color pixel data generated by extracting pixel data of the same color among pixel data obtained from a pixel array unit in which unit pixel groups in which a plurality of types of pixels having mutually different color filters are arranged in a predetermined pattern are arranged two-dimensionally as learning input data.
  • color pixel data obtained without performing interpolation processing such as demosaic processing or YUV conversion is used for machine learning. This makes it possible to improve the spatial resolution of the learning input data compared to general RGB images.
  • the information processing apparatus includes an inference processing unit that performs inference processing by inputting the color pixel data of the image data obtained from the imaging unit as input data to a learning model obtained by performing machine learning using color pixel data separated for each color obtained without performing interpolation processing from pixel data obtained from a pixel array unit in which a unit pixel group composed of a plurality of pixels at least partially different in color filters is two-dimensionally arranged.
  • FIG. 4 is a diagram for explaining pixel data output from a pixel;
  • FIG. FIG. 4 is an explanatory diagram of an R image, a G image, and a B image obtained by performing format conversion in a conventional method;
  • FIG. 10 is an explanatory diagram of an R image, a G image, and a B image after resizing in a conventional method;
  • FIG. 4 is a diagram for explaining color pixel data generated using the technique of the present embodiment;
  • FIG. FIG. 4 is a diagram for explaining color pixel data generated using the technique of the present embodiment;
  • FIG. FIG. 4 is a diagram for explaining color pixel data generated using the technique of the present embodiment;
  • FIG. FIG. 4 is a diagram for explaining color pixel data generated using the technique of the present embodiment;
  • FIG. FIG. 4 is a diagram for explaining color pixel data generated using the technique of the present embodiment;
  • FIG. FIG. 4 is a diagram for explaining color pixel data generated using the technique of the present embodiment;
  • FIG. 4 is a diagram for explaining color pixel data generated using the technique of the present embodiment;
  • FIG. It is a figure which shows the structural example of a learning model production
  • 4 is a flowchart showing an example of learning model generation processing executed by a learning model generation device;
  • 4 is a flowchart showing an example of inference processing executed by an image sensor;
  • FIG. 3 is a diagram showing a configuration example of a pixel array section including IR pixels; It is a figure for demonstrating the summary of this Embodiment.
  • a learning model generation device 1 as an information processing device in the present embodiment is a device that generates a learning model (hereinafter referred to as an “AI model”) using image data obtained based on imaging by an image sensor IS as learning input data.
  • AI model a learning model
  • the learning input data used to generate the AI model is Bayer format color pixel data, not general R (red) G (green) B (blue) images or YUV images obtained according to the light receiving operation of the image sensor IS.
  • the image input data is obtained without performing interpolation processing such as demosaicing processing for obtaining an RGB image and YUV conversion processing for obtaining a YUV image.
  • demosaic processing will be given as an example of interpolation processing.
  • the pixel array section 2 of the image sensor IS is formed by two-dimensionally arraying unit pixel groups 3 in row and column directions.
  • the unit pixel group 3 is composed of two pixels 4 each arranged in the row direction and the column direction. Each pixel has a color filter 5 .
  • the unit pixel group 3 includes a red pixel 4R having sensitivity to red light R by including a color filter 5R that transmits only red light R, a green pixel 4G that has sensitivity to green light G by including a color filter 5G that transmits only green light G, and a blue pixel 4B that has sensitivity to blue light B by including a color filter 5B that transmits only blue light B.
  • Each pixel 4 in the pixel array section 2 belongs to either a red pixel row 6R in which only red pixels 4R and green pixels 4G are arranged, or a blue pixel row 6B in which only green pixels 4G and blue pixels 4B are arranged.
  • Each pixel 4 outputs a signal (pixel data 7) corresponding to the amount of charge obtained by photoelectrically converting incident light.
  • pixel data 7R also referred to as red pixel data 7R
  • the pixel data 7 output from the green pixel 4G is referred to as pixel data 7G (also referred to as green pixel data 7G)
  • the pixel data 7 output from the blue pixel 4B is referred to as pixel data 7B (also referred to as blue pixel data 7B).
  • the pixel data 7R, 7G, and 7B are raw data output from the pixel array section 2.
  • R image 8R, G image 8G, and B image 8B are generated by performing format conversion from Raw data (see FIG. 3).
  • the R image 8R, the G image 8G, and the B image 8B have a size of W pixels in the row direction and a size of H pixels in the column direction by demosaicing. At least 3/4 of the pixel data 7 in the R image 8R, G image 8G, and B image 8B are obtained by interpolation processing using surrounding pixel data 7 .
  • each of the R image 8R, the G image 8G, and the B image 8B is resized so that the size in the row direction is W2 ( ⁇ W) pixels and the size in the column direction is H2 ( ⁇ H) pixels to obtain the R image 8R', the G image 8G', and the B image 8B' (see FIG. 4).
  • W2 be 1/2 of W and H2 be 1/2 of H. That is, the image size of the R image 8R', G image 8G' and B image 8B' after resizing is 1/4 of the R image 8R, G image 8G and B image 8B before resizing.
  • the spatial resolution of the R image 8R', the G image 8G', and the B image 8B' is set to 1/4 of the image data GD as Raw data. That is, the detailed features of the subject are lost.
  • the image data GD having a row size of W pixels and a column size of H pixels is resized.
  • the image size of the learning input data is resized to twice the size in the row direction and the column direction (the image size is quadrupled).
  • the color pixel data 9 separated for each type of pixel is generated from the image data GD (image data after resizing if resized). Specifically, as shown in FIG. 5, the color pixel data 9R is generated by extracting only the red pixel data 7R from the image data GD made into Raw data.
  • the row outputting the pixel data 7R corresponding to the red pixel 4R is defined as a red pixel data row 10R (see FIG. 5)
  • the row outputting the pixel data 7B corresponding to the blue pixel 4B is defined as a blue pixel data row 10B
  • the green pixel data 7G positioned in the red pixel data row 10R is defined as green pixel data 7Gr
  • the green pixel data 7G positioned in the blue pixel data row 10B is defined as green pixel data 7Gb.
  • Color pixel data 9Gr (see FIG. 6) obtained by extracting only the green pixel data 7Gr from the image data GD
  • color pixel data 9Gb obtained by extracting only the green pixel data 7Gb
  • color pixel data 9B obtained by extracting only the blue pixel data 7B are generated.
  • the color pixel data 9R, 9Gr, 9Gb, and 9B are used as learning input data for learning model generation.
  • the spatial resolution of each color pixel data 9R, 9Gr, 9Gb, and 9B is 1/1 that of the image data GD as Raw data. That is, the spatial resolution of the learning input data generated by the method of the present embodiment is four times that of the conventional method.
  • the four types of color pixel data 9R, 9Gr, 9Gb, and 9B may be used as learning input data for learning of the learning model, but one color pixel data 9G (composite color pixel data) may be generated by synthesizing the green color pixel data 9Gr and 9Gb. In this case, the three types of color pixel data 9R, 9G, and 9B are used as learning input data for learning of the learning model.
  • the amount of learning input data used for learning the learning model can be reduced, and the amount of computation and learning time related to learning can be reduced.
  • the subject positions are matched in the R image 8R', the G image 8G', and the B image 8B' as learning input data in the conventional method.
  • the pixels located at the top left of each of the R image 8R', the G image 8G' and the B image 8B' are for the same object.
  • the positions of the subjects are different.
  • the learning process of the learning model the learning progresses while considering the positional deviation of the subject for each color pixel data 9 as a feature of the learning input data. Therefore, even if the color pixel data 9R, 9G, and 9B in which the positions of the objects are displaced are input as input data for the generated learning model, it is possible to make inferences about the objects in consideration of the positional displacement of the objects.
  • the learning progresses so as to appropriately identify the subject characterized by the positional deviation, so that the recognition rate can be improved.
  • the R image 8R', the G image 8G' and the B image 8B' have the same image size, it is possible to input and infer a learning model that has been learned in consideration of the positional deviation with the color pixel data 9R, 9G, and 9B.
  • Apparatus for generating learning model> A configuration example of the learning model generation device 1 that performs the learning model generation process described above will be described with reference to FIG. Note that the learning model generation device 1 does not need to include all of the configurations described below, and may include only some of them.
  • a CPU (Central Processing Unit) 70 and a GPU (Graphics Processing Unit) 71 of the learning model generation device 1 execute various processes according to a program stored in a non-volatile memory unit 74 such as a ROM (Read Only Memory) 72 or an EEP-ROM (Electrically Erasable Programmable Read-Only Memory) or a program loaded from a storage unit 79 to a RAM (Random Access Memory) 73.
  • the RAM 73 also stores data necessary for the CPU 70 to execute various processes.
  • the CPU 70 , GPU 71 , ROM 72 , RAM 73 and nonvolatile memory section 74 are interconnected via a bus 83 .
  • An input/output interface 75 is also connected to this bus 83 .
  • the input/output interface 75 is connected to an input section 76 including operators and operating devices.
  • an input section 76 including operators and operating devices.
  • various operators and operating devices such as a keyboard, mouse, key, dial, touch panel, touch pad, remote controller, etc. are assumed.
  • User operations are detected by the input unit 76 , and signals corresponding to the input operations are interpreted by the CPU 70 and GPU 71 .
  • the input/output interface 75 is connected integrally or separately with a display unit 77 such as an LCD (Liquid Crystal Display) or an organic EL panel, and an audio output unit 78 such as a speaker.
  • the display unit 77 is a display unit that performs various displays, and includes, for example, a display device included in the learning model generation device 1, a monitor device such as a camera connected to a housing of the learning model generation device 1, and the like.
  • the display unit 77 displays images for various types of image processing, moving images to be processed, etc. on the display screen based on instructions from the CPU 70 and the GPU 71 . Further, the display unit 77 displays various operation menus, icons, messages, etc., that is, as a GUI (Graphical User Interface) based on instructions from the CPU 70 and the GPU 71 .
  • GUI Graphic User Interface
  • the input/output interface 75 may be connected to a storage unit 79 made up of a hard disk, a solid-state memory, etc., and a communication unit 80 made up of a modem or the like.
  • the communication unit 80 performs communication processing via a transmission line such as the Internet, wired/wireless communication with various devices, bus communication, and the like.
  • a drive 81 is also connected to the input/output interface 75 as required, and a removable storage medium 82 such as a magnetic disk, optical disk, magneto-optical disk, or semiconductor memory is appropriately mounted.
  • Data files such as image files and various computer programs can be read from the removable storage medium 82 by the drive 81 .
  • the read data file is stored in the storage unit 79 , and the image and sound contained in the data file are output by the display unit 77 and the sound output unit 78 .
  • Computer programs and the like read from the removable storage medium 82 are installed in the storage unit 79 as required.
  • software for the processing of the present embodiment can be installed via network communication by the communication unit 80 or removable storage medium 82.
  • the software may be stored in advance in the ROM 72, the storage unit 79, or the like.
  • the CPU 70 and GPU 71 performing processing operations based on various programs, image file reading processing, learning processing, and communication processing are executed as learning model generation processing in the learning model generation device 1 .
  • the CPU 70 or GPU 71 has a learning model generation unit F1.
  • the learning model generation unit F1 generates a learning model using the color pixel data 9 described above as learning input data.
  • the learning model generated at this time is assumed to have learned the feature of positional displacement of the subject for each color pixel data 9 as described above.
  • the learning model generation device 1 is not limited to being configured with a single information processing device as shown in FIG. 9, and may be configured by systematizing a plurality of information processing devices.
  • the plurality of information processing apparatuses may be systematized by LAN or the like, or may be remotely located by VPN or the like using the Internet or the like.
  • the plurality of information processing devices may include information processing devices as a group of servers (cloud) that can be used by a cloud computing service.
  • the camera device 11 has an image sensor IS.
  • a learning model for performing inference processing and its program are stored, for example, in the image sensor IS provided in the camera device 11 . Inference processing is thereby executed in the image sensor IS.
  • the image sensor IS also outputs image data to which metadata is added as an inference result.
  • the image data may not be output outside the image sensor IS by enabling the image sensor IS to output only the metadata as the inference result. As a result, security can be protected, and the amount of communication inside and outside the image sensor IS can be reduced.
  • the camera device 11 may include a smart phone as an information processing device having a camera function, in addition to devices such as a compact camera and a surveillance camera.
  • the image sensor IS includes an imaging unit 12, a control unit 13, a signal processing unit 14, a DSP (Digital Signal Processor) 15, and a storage unit 16.
  • DSP Digital Signal Processor
  • the imaging unit 12 has a pixel array unit 2 . Since the configuration of the pixel array section 2 is the same as that described above, a detailed description thereof will be omitted.
  • the imaging unit 12 includes, in addition to the pixel array unit 2, a vertical driving unit (not shown) for driving row by row or for driving all pixels simultaneously, a column processing unit (not shown) for performing noise removal processing and A/D (Analog to Digital) conversion processing, and a horizontal driving unit (not shown) for sequentially selecting unit circuits corresponding to pixel columns of the column processing unit.
  • the control unit 13 controls the light receiving operation of the imaging unit 12 by supplying various signals to the imaging unit 12 .
  • the image data GD as the Raw data described above is output from the pixel array section 2 to the signal processing section 14 .
  • the signal processing unit 14 performs various types of signal processing on the image data GD, and in the present embodiment particularly performs the above-described resizing processing.
  • the signal processing unit 14 generates color pixel data 9 (9R, 9Gr, 9Gb, 9B) from the resized image data. Further, the signal processing unit 14 may combine the color pixel data 9Gr and 9Gb to generate the color pixel data 9G.
  • the color pixel data 9 are stored in the storage section 16 .
  • the DSP 15 performs inference processing by inputting the generated color pixel data 9 into the learning model, and obtains subject recognition information as a result. Therefore, the DSP 15 is configured to have an inference processing unit F2 as shown in FIG.
  • the DSP 15 outputs subject recognition information to the outside of the image sensor IS as metadata or the like.
  • the color pixel data 9 input to the learning model, the RGB image generated from the image data GD as Raw data, and the like may be output to the outside of the image sensor IS. If the image sensor IS is configured to output only metadata, the amount of data output from the image sensor IS can be reduced. Therefore, it is possible to lighten the processing of the subsequent application.
  • the storage unit 16 comprehensively represents ROM, RAM, etc., for example, and in the present embodiment, stores image data after resizing, learning models, programs for performing inference processing using the learning models, and the like.
  • the image sensor IS and the pixel array unit 2 of the camera device 11 are described with the same reference numerals as the image sensor IS and the pixel array unit 2 for generating learning input data used for learning the learning model described above, but this does not mean that they are the same as the image sensor IS and the pixel array unit 2 used for acquiring image data as learning input data. That is, the image sensor IS and the pixel array unit 2 for acquiring image data as learning input data may be different from the image sensor IS and pixel array unit 2 of the camera device 11 for acquiring image data to be subjected to inference processing.
  • the image sensor IS can be regarded as an information processing device that performs inference processing. Inference processing is performed by inputting the color pixel data 9 of the image data GD as Raw data obtained from the imaging unit 12 into the learning model as input data. As a result, in the inference process, the input data with improved spatial resolution compared to general RGB images is input to the learning model, so it is possible to improve the accuracy of the inference result.
  • the CPU 70 (or GPU 71) of the learning model generation device 1 acquires image data GD as Raw data in step S101.
  • the CPU 70 (or GPU 71) of the learning model generation device 1 generates color pixel data 9 (9R, 9Gr, 9Gb, 9B) from the image data GD in step S102. Further, if necessary in step S102, the color pixel data 9G may be synthesized from the color pixel data 9Gr and 9Gb.
  • the CPU 70 (or GPU 71) of the learning model generation device 1 performs learning processing for the learning model in step S103.
  • learning processing for example, parameters used in DNN (Deep Neural Network) are adjusted.
  • the CPU 70 (or GPU 71) of the learning model generation device 1 determines in step S104 whether or not learning has ended. For example, when a predetermined number of color pixel data 9 are used for learning, when the performance of the learning model satisfies a predetermined performance, or when a predetermined number of iterations is completed, it is determined that learning is completed.
  • the CPU 70 (or GPU 71) of the learning model generation device 1 returns to step S101 and acquires image data GD as new Raw data again.
  • the weight reduction process is a process of reducing the size of the learning model while maintaining the performance of the learning model as much as possible. For example, it is performed by reducing nodes that have a small impact on the inference result (pruning), quantizing parameters, and changing to a smaller model with a smaller number of nodes and layers (distillation).
  • pruning reducing nodes that have a small impact on the inference result
  • quantizing parameters changing to a smaller model with a smaller number of nodes and layers (distillation).
  • the processing unit (signal processing unit 14 or DSP 15) of the image sensor IS acquires image data GD as Raw data in step S201.
  • the processing unit of the image sensor IS generates color pixel data 9 (9R, 9Gr, 9Gb, 9B) from the image data GD in step S202. Further, if necessary in step S202, the color pixel data 9G may be synthesized from the color pixel data 9Gr and 9Gb.
  • the processing unit of the image sensor IS performs inference processing using the learning model in step S203.
  • inference processing a subject is identified and label information and the like are generated (selected).
  • step S204 the processing unit of the image sensor IS outputs the label information and the like as metadata to the outside of the image sensor IS.
  • the learning input data generated by the method of the present embodiment and the color pixel data 9 input to the learning model in the inference processing have four times the spatial resolution of the conventional method.
  • an R image 8R' as learning input data in the conventional method has a size of 4 pixels in both the row direction and the column direction.
  • an R image 8R' as learning input data in the conventional method consists of four pixel data 7R and pixel data 7R' interpolated therefrom. That is, the example shown in FIG. 13 has a spatial resolution of four pixels.
  • the color pixel data 9R as learning input data generated by the method according to the present embodiment has a size of 2 pixels in both the row direction and the column direction, as shown in FIG. Since these color pixel data 9R are not obtained by interpolation, they have a spatial resolution of four pixels.
  • the learning input data required to ensure a recognition rate equivalent to that of a learning model generated by a conventional method is reduced to 1/4 in this embodiment.
  • the pixel array section 2 has a configuration in which the unit pixel group 3 consisting of the red pixel 4R, the green pixels 4G, 4G, and the blue pixel 4B is arranged in a matrix.
  • the pixel array section 2A may be used in which one of the green pixels 4G, 4G in the unit pixel group 3A is replaced with an IR pixel 4IR that receives infrared light (IR: Infrared) (see FIG. 15).
  • IR infrared
  • FIG. 15 shows an example in which the IR pixel 4IR is arranged in the blue pixel row 6B, the IR pixel 4IR may be arranged in the red pixel row 6R.
  • color pixel data 9 (including IR pixel data) based on image data GD as Raw data obtained from a pixel array section 2A in which a unit pixel group 3A including IR pixels 4IR is arranged in a matrix, learning a learning model and performing inference processing using the color pixel data 9, it is possible to improve the recognition rate and the detection rate of a subject such as a suspicious person for the camera device 11 that operates as a surveillance camera at night.
  • the image data GDL as short-term images and the image data GDS as long-term images may be used.
  • the color pixel data 9RL, 9GrL, 9GbL, and 9BL obtained from the image data GDL and the color pixel data 9RS, 9GrS, 9GbS, and 9BS obtained from the image data GDS may be used.
  • the image data GDS as short-term images in the learning process of the learning model it is possible to improve the accuracy of the inference processing for the image data GD taken during the daytime.
  • the image data GDL as long-stored images in the learning process of the learning model it is possible to improve the accuracy of the inference processing for the image data GD captured at night.
  • captured images taken during the daytime may be used as learning input data
  • image data GDL as long-term images
  • captured images taken at night may be used as learning input data
  • an HDR (High Dynamic Range) image may be generated from the image data GDL and the image data GDS and used as learning input data.
  • the learning model generation device 1 extracts pixel data of the same color from among the pixel data 7 (7R, 7G, 7B, 7Gr, 7Gb) obtained from the pixel array section 2 (2A) in which the unit pixel group 3 (3A) in which the unit pixel group 3 (3A) in which the plurality of types of pixels 4 (4R, 4G, 4B) having mutually different color filters 5 (5R, 5G, 5B) are arranged in a predetermined pattern is arranged in a predetermined pattern.
  • R, 9G, 9B, 9Gr, and 9Gb) as learning input data to perform machine learning to generate a learning model.
  • FIG. 16 shows a summary of each example described above.
  • the pixel array section 2 having a size of 8 pixels in both the row direction and the column direction will be taken as an example.
  • the color pixel data 9B, 9G (9Gr, 9Gb) are omitted from the illustration.
  • An image obtained by resizing (or cutting out) the area in which the subject is imaged from the pixel array section 2 is Raw data (cropped image of the image data GD) composed of pixel data 7 of 4 pixels in both row and column directions.
  • An image obtained by performing demosaic processing on this is the R image 8R′ according to the conventional method. Note that after the R image 8R is obtained by demosaic processing, resizing (clipping) may be performed to obtain the R image 8R'.
  • the R image 8R' is data obtained by interpolating 3/4 of the pixels.
  • the spatial resolution at this time is used as a reference spatial resolution.
  • the color pixel data 9R has a spatial resolution equal to the reference spatial resolution, a 1/4 color resolution, and a 1/4 image data size. According to this method, the amount of learning input data to be input to the learning model can be reduced as compared with the conventional method. Therefore, the amount of calculation required for machine learning can be reduced, and the learning time can be shortened.
  • the color pixel data 9R is generated without resizing (cutting out) the Raw data (image data GD) obtained from the pixel array unit 2, the color pixel data 9R with four times the spatial resolution can be obtained although the size of the image data is the same. That is, the color pixel data 9 obtained without performing interpolation processing such as demosaic processing or YUV conversion is used for learning the learning model of machine learning. This makes it possible to improve the spatial resolution of the learning input data as compared with general RGB images (R, G and B images). Therefore, it is possible to prepare learning input data in which the details of the subject are not lost, so that the performance of the learning model can be improved.
  • interpolation processing such as demosaic processing or YUV conversion
  • the positions of the red pixel 4R, the green pixel 4G, and the blue pixel 4B in the unit pixel group 3 (3A) are set to different positions.
  • the green light component and the blue light component that should be originally detected at the position of the red pixel 4R are obtained by interpolation processing (demosaicing processing).
  • the red light component and the blue light component at the position of the green pixel 4G, and the red light component and the green light component at the position of the blue pixel 4B are similarly obtained by interpolation processing.
  • the subject position in the R image 8R, the subject position in the G image 8G, and the subject position in the B image 8B after demosaic processing are matched.
  • the color pixel data 9R, 9G (or 9Gr, 9Gb), and 9B obtained without performing such interpolation processing have different positional relationships between the reference position (for example, the upper left position of each color pixel data 9) and the object.
  • the learning model By allowing the learning model to learn including such a difference in positional relationship, it becomes possible to perform inference processing without considering the difference in positional relationship between the subject and the reference position for each color pixel data 9, thereby improving convenience.
  • the R image 8R' and the color pixel data 8R have the same image size, it is possible to alternately perform the learning of the learning model and the inference processing.
  • the unit pixel group 3A included in the pixel array section 2A may include pixels (IR pixels 4IR) that receive infrared light.
  • the unit pixel group 3A includes red pixels 4R, green pixels 4G, blue pixels 4B, and IR pixels 4IR. Even if the IR pixel 4IR is included in this way, the above effect can be obtained by using the color pixel data 9 (also including the IR pixel data) obtained based on the pixel data 7 output from the unit pixel group 3A as learning input data.
  • the unit pixel group 3 included in the pixel array section 2 may be formed by Bayer arraying pixels 4 (4R, 4G, 4B) each having a filter (color filter 5R) for transmitting red light R, a filter (color filter 5G) for transmitting green light G, and a filter (color filter 5B) for transmitting blue light B as color filters 5.
  • the above effects can be obtained when the unit pixel group 3 adopts the Bayer array.
  • pixels (IR pixels 4IR) that receive infrared light may be arranged in place of any of the pixels 4 (green pixels 4G) having filters that transmit the green light G in the Bayer array. For example, by replacing one of the green pixel 4G in the red pixel row 6R and the green pixel 4G in the blue pixel row 6B with the IR pixel 4IR, the recognition rate of the learning model used in applications such as surveillance cameras at night can be improved.
  • the unit pixel group 3 included in the pixel array section 2 includes the red pixel row 6R in which the red pixels 4R that receive the red light R and the green pixels 4G that receive the green light G are arranged, and the blue pixel row 6B in which the green pixels 4G that receive the green light G and the blue pixels 4B that receive the blue light B are arranged.
  • 7G and blue pixel data 7B obtained from the blue pixel 4B, and the green pixel data 7G may be synthesized color pixel data obtained by synthesizing the output of the green pixel 4G arranged in the red pixel row 6R and the green pixel 4G arranged in the blue pixel row 6B.
  • the information processing device extracts the pixel data 7 of the same color from the pixel data 7 (7R, 7G (or 7Gr, 7Gb), 7B) obtained from the pixel array section 2 (2A) in which the unit pixel group 3 (3A), in which a plurality of types of pixels 4 having mutually different color filters 5 (5R, 5G, 5B) are arranged in a predetermined pattern, is arranged in a predetermined pattern. , 9Gb), and 9B) as learning input data, and inputs color pixel data 9 for image data obtained from the imaging unit 12 as input data to a learning model obtained by performing machine learning.
  • the input data with improved spatial resolution compared to general RGB images is input to the learning model, so it is possible to improve the accuracy of the inference result.
  • the information processing device may output metadata as a result of inference processing. For example, by outputting metadata without outputting an image, the amount of data output from the image sensor IS can be reduced. Therefore, it is possible to lighten the processing of the subsequent application.
  • the information processing device may be an image sensor IS. That is, the inference processing may be performed within the image sensor IS and the result thereof may be output outside the image sensor IS. As a result, it is possible to simplify the configuration of the subsequent equipment that receives and process the output from the image sensor IS, and to reduce the processing amount of the downstream equipment.
  • the image sensor IS may include the storage unit 16 that stores learning models. By storing the learning model in the image sensor IS, data communication between the image sensor IS and devices outside the image sensor IS can be reduced.
  • An information processing apparatus comprising a learning model generation unit that generates a learning model by performing machine learning using color pixel data generated by extracting pixel data of the same color among pixel data obtained from a pixel array unit in which unit pixel groups in which a plurality of types of pixels having mutually different color filters are arranged in a predetermined pattern are arranged two-dimensionally as learning input data.
  • the unit pixel group includes a pixel that receives infrared light.
  • the unit pixel group includes pixels having a filter for transmitting red light, a filter for transmitting green light, and a filter for transmitting blue light as the color filters in a Bayer array.
  • the pixel group includes pixels that receive infrared light in place of any of the pixels that have a filter that transmits the green light in the Bayer array.
  • the unit pixel group is a red pixel row in which red pixels that receive red light and green pixels that receive green light are arranged, and a blue pixel row in which green pixels that receive green light and blue pixels that receive blue light are arranged,
  • the learning input data used for the machine learning is red pixel data obtained from the red pixels, green pixel data obtained from the green pixels, and blue pixel data obtained from the blue pixels;
  • the green pixel data are The information processing apparatus according to (3), wherein the synthesized color pixel data is obtained by synthesizing outputs of green pixels arranged in the red pixel row and outputs of green pixels arranged in the blue pixel row.
  • An information processing apparatus comprising an inference processing unit that performs inference processing by inputting color pixel data of image data obtained from an imaging unit as input data to a learning model obtained by performing machine learning using color pixel data generated by extracting pixel data of the same color among pixel data obtained from a pixel array unit in which unit pixel groups in which a plurality of types of pixels having different color filters are arranged in a predetermined pattern are arranged two-dimensionally as learning input data.
  • the information processing device which outputs metadata as a result of the inference processing.
  • 1 Learning model generation device (information processing device) 2, 2A pixel array section 3, 3A unit pixel group 4 pixel 4R red pixel 4G green pixel 4B blue pixel 5, 5R, 5G, 5B color filter 6R red pixel row 6B blue pixel row 7, 7R, 7G, 7B, 7Gr, 7Gb pixel data 9, 9R, 9Gr, 9Gb, 9B color pixel data 9RL, 9GrL, 9GbL, 9BL color pixel data 9RS, 9GrS, 9G bS, 9BS color pixel data 12 imaging unit 16 storage unit IS image sensor (information processing device) R Red light G Green light B Blue light F1 Learning model generator F2 Inference processor

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Color Television Image Signal Generators (AREA)

Abstract

Dispositif de traitement d'informations comprenant une unité de génération de modèle d'apprentissage qui génère un modèle d'apprentissage par exécution d'un apprentissage automatique dans lequel des données de pixel de couleur sont utilisées en tant que données d'entrée d'apprentissage, les données de pixel de couleur étant divisées par couleur et étant obtenues, sans exécuter d'interpolation, sur la base de données de pixel obtenues à partir d'une unité de réseau de pixels dans laquelle un groupe de pixels unitaires, qui est disposé en deux dimensions, est formé à partir d'une pluralité de pixels pour lesquels au moins certains des filtres colorés associés sont différents.
PCT/JP2022/047125 2022-01-18 2022-12-21 Dispositif de traitement d'informations WO2023140026A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2022005560 2022-01-18
JP2022-005560 2022-01-18

Publications (1)

Publication Number Publication Date
WO2023140026A1 true WO2023140026A1 (fr) 2023-07-27

Family

ID=87348197

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2022/047125 WO2023140026A1 (fr) 2022-01-18 2022-12-21 Dispositif de traitement d'informations

Country Status (1)

Country Link
WO (1) WO2023140026A1 (fr)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2020188310A (ja) * 2019-05-10 2020-11-19 ソニーセミコンダクタソリューションズ株式会社 画像認識装置および画像認識方法
JP2020198470A (ja) * 2019-05-30 2020-12-10 ソニーセミコンダクタソリューションズ株式会社 画像認識装置および画像認識方法
WO2020246401A1 (fr) * 2019-06-05 2020-12-10 ソニーセミコンダクタソリューションズ株式会社 Dispositif de reconnaissance d'image et procédé de reconnaissance d'image
WO2021187121A1 (fr) * 2020-03-19 2021-09-23 ソニーセミコンダクタソリューションズ株式会社 Dispositif de capture d'images à semi-conducteurs

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2020188310A (ja) * 2019-05-10 2020-11-19 ソニーセミコンダクタソリューションズ株式会社 画像認識装置および画像認識方法
JP2020198470A (ja) * 2019-05-30 2020-12-10 ソニーセミコンダクタソリューションズ株式会社 画像認識装置および画像認識方法
WO2020246401A1 (fr) * 2019-06-05 2020-12-10 ソニーセミコンダクタソリューションズ株式会社 Dispositif de reconnaissance d'image et procédé de reconnaissance d'image
WO2021187121A1 (fr) * 2020-03-19 2021-09-23 ソニーセミコンダクタソリューションズ株式会社 Dispositif de capture d'images à semi-conducteurs

Similar Documents

Publication Publication Date Title
AU2018346909B2 (en) Image signal processor for processing images
US8203633B2 (en) Four-channel color filter array pattern
US8237831B2 (en) Four-channel color filter array interpolation
JP2020123172A (ja) 撮像システム、現像システム、撮像方法、及びプログラム
US20140015954A1 (en) Image processing apparatus, image processing system, image processing method, and image processing program
CN103416067B (zh) 摄像装置
JP4796871B2 (ja) 撮像装置
WO2012153489A1 (fr) Système de traitement d'image
WO2008032545A1 (fr) Dispositif de traitement d'image, procédé de traitement d'image et programme
CN102713972A (zh) 使用神经网络的像素信息再现
CN113014804A (zh) 图像处理方法、装置、电子设备和可读存储介质
CN113674685B (zh) 像素阵列的控制方法、装置、电子设备和可读存储介质
JP2004274724A (ja) 高解像度画像を再構成する方法および装置
WO2023140026A1 (fr) Dispositif de traitement d'informations
US20150281506A1 (en) Arrangement for image processing
US20100045831A1 (en) Imaging Apparatus and Image Processing Program
CN115147297A (zh) 一种图像处理方法及装置
CN114125319A (zh) 图像传感器、摄像模组、图像处理方法、装置和电子设备
JP5825681B2 (ja) 画像処理装置、画像処理方法及びプログラム
Blasinski et al. Real-time, color image barrel distortion removal
JP2020061129A (ja) 画像処理方法、画像処理装置、撮像装置、画像処理システム、プログラム、および、記憶媒体
JP4805294B2 (ja) デジタル画像の画像解像度を低減するカラービニング
US11651479B2 (en) Unified ISP pipeline for image sensors with various color filter array and high dynamic range schemes
US11748862B2 (en) Image processing apparatus including neural network processor and method of operation
US20240087086A1 (en) Image processing method, image processing apparatus, program, trained machine learning model production method, processing apparatus, and image processing system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22922176

Country of ref document: EP

Kind code of ref document: A1