WO2020027074A1

WO2020027074A1 - Solid-state imaging device and electronic apparatus

Info

Publication number: WO2020027074A1
Application number: PCT/JP2019/029715
Authority: WO
Inventors: 良仁浴
Original assignee: ソニーセミコンダクタソリューションズ株式会社
Priority date: 2018-07-31
Filing date: 2019-07-29
Publication date: 2020-02-06

Abstract

This solid-state imaging device includes: an imaging unit (11) that acquires image data; a processing unit (14) that executes processing for extracting a specified region on the basis of a neural network calculation model, such processing executed on the image data or data based on the image data; and an output unit (16) that outputs image data which was manipulated on the basis of the specified region, or image data which was read from the imaging unit on the basis of the specified region.

Description

Solid-state imaging device and electronic equipment

The present disclosure relates to a solid-state imaging device and an electronic device. More specifically, the present invention relates to processing of image data in a chip.

Devices such as digital cameras are equipped with image sensors having CMOS (Complementary Metal Oxide Semiconductor) and DSP (Digital Signal Processor). In an image sensor, a captured image is supplied to a DSP, where various processing is performed in the DSP and output to an external device such as an application processor.

International Publication No. WO 2018/051809

However, in the above-described conventional technology, simple image processing such as noise removal is performed in a DSP in an image sensor, and complicated processing such as face authentication using image data is generally performed by an application processor or the like. It is a target. For this reason, the captured image captured by the image sensor is output to the application processor as it is. From the viewpoints of security and privacy, it is desired to execute the processing in the chip of the image sensor.

Therefore, the present disclosure proposes a solid-state imaging device and an electronic device that can execute processing in a chip of an image sensor.

In order to solve the above-described problem, a solid-state imaging device according to an embodiment of the present disclosure includes an imaging unit that acquires image data, and the image data or data based on the image data, based on a neural network calculation model. A processing unit that executes a process of extracting a specific region, and an output unit that outputs image data processed based on the specific region, or image data read from the imaging unit based on the specific region. And

(4) By mounting a processing unit for extracting a specific region to be processed from the image data acquired by the imaging unit on the solid-state imaging device, it is possible to execute the extraction of the processing region and the processing in the chip. As a result, privacy information and the like included in the image data as they are can be prevented from being output to the outside of the chip, and a secure solid-state imaging device can be realized. Also, there is an advantage that the amount of data output from the solid-state imaging device to the outside can be reduced.

According to the present disclosure, it is possible to execute the processing in the chip of the image sensor. Note that the effects described here are not necessarily limited, and may be any of the effects described in the present disclosure.

FIG. 2 is a block diagram illustrating a schematic configuration example of an imaging device as an electronic apparatus according to the first embodiment. FIG. 4 is a diagram for describing image processing according to the first embodiment. It is a flow chart which shows a flow of processing processing concerning a 1st embodiment. FIG. 9 is a diagram illustrating a modification of the first embodiment. FIG. 6 is a diagram illustrating an imaging device according to a second embodiment. It is a figure explaining the modification of a 2nd embodiment. FIG. 11 is a diagram illustrating an imaging device according to a third embodiment. It is a sequence diagram showing the flow of the processing concerning a 3rd embodiment. FIG. 2 is a schematic diagram illustrating a chip configuration example of the image sensor according to the embodiment. FIG. 3 is a diagram for explaining a layout example according to the embodiment. FIG. 3 is a diagram for explaining a layout example according to the embodiment. It is a block diagram showing an example of a schematic structure of a vehicle control system. It is explanatory drawing which shows an example of the installation position of a vehicle exterior information detection part and an imaging part. It is a figure showing an example of the schematic structure of an endoscope operation system. FIG. 3 is a block diagram illustrating an example of a functional configuration of a camera head and a CCU. It is a block diagram showing an example of a schematic structure of a diagnosis support system.

Hereinafter, embodiments of the present disclosure will be described in detail with reference to the drawings. In the following embodiments, the same portions will be denoted by the same reference numerals, without redundant description.

In addition, the present disclosure will be described according to the following item order.
1. 1. First embodiment 2. Modification of First Embodiment Second embodiment4. Third embodiment5. 5. Chip configuration of image sensor Layout example 7. Other embodiments8. Application example to mobile object 9. 9. Example of application to endoscopic surgery system Application example to WSI (Whole Slide Imaging) system

(1. First Embodiment)
[1-1. Configuration of Image Processing System According to First Embodiment]
FIG. 1 is a block diagram illustrating a schematic configuration example of an imaging device as an electronic device according to the first embodiment. As shown in FIG. 1, the imaging device 1 is communicably connected to a cloud server 30. The imaging apparatus 1 and the cloud server 30 are communicably connected to each other via a network or a USB (Universal Serial Bus) cable, regardless of whether they are wired or wireless.

The cloud server 30 is an example of a server device that stores image data such as still images and moving images transmitted from the imaging device 1. For example, the cloud server 30 can store image data in arbitrary units, such as for each user, for each date, and for each imaging location, and can provide various services such as creating an album using the image data.

The imaging device 1 is an example of an electronic device having the image sensor 10 and the application processor 20, and is, for example, a digital camera, a digital video camera, a tablet terminal, a smartphone, or the like. In the following embodiments, an example in which an image is captured will be described. However, the present invention is not limited to this, and the same processing can be performed for a moving image and the like.

The image sensor 10 is, for example, a CMOS (Complementary Metal Oxide Semiconductor) image sensor composed of one chip, receives incident light, performs photoelectric conversion, and outputs image data corresponding to the amount of incident light received by the application processor 20. Output to

The application processor 20 is an example of a processor such as a CPU (Central Processing Unit) that executes various applications. The application processor 20 performs various processes corresponding to the application, such as a display process of displaying image data input from the image sensor 10 on a display, a biometric authentication process using the image data, and a transmission process of transmitting the image data to the cloud server 30. Execute.

[1-2. Configuration of Imaging Device According to First Embodiment]
As illustrated in FIG. 1, the imaging device 1 includes an image sensor 10 that is a solid-state imaging device, and an application processor 20. The image sensor 10 includes an imaging unit 11, a control unit 12, a signal processing unit 13, a DSP (also called a processing unit) 14, a memory 15, and a selector 16 (also called an output unit).

The imaging unit 11 includes, for example, an optical system 104 including a zoom lens, a focus lens, an aperture, and the like, and a pixel array unit 101 having a configuration in which unit pixels including light receiving elements such as photodiodes are arranged in a two-dimensional matrix. . Light incident from the outside passes through the optical system 104 to form an image on a light receiving surface of the pixel array unit 101 on which light receiving elements are arranged. Each unit pixel of the pixel array unit 101 converts the light incident on the light receiving element into an electric charge, and accumulates a charge corresponding to the amount of incident light in a readable manner.

The imaging unit 11 includes a converter (Analog to Digital Converter: hereinafter referred to as an ADC) 17 (for example, see FIG. 2). The ADC 17 generates digital image data by converting an analog pixel signal for each unit pixel read from the imaging unit 11 into a digital value, and outputs the generated image data to the signal processing unit 13. The ADC 17 may include a voltage generation circuit that generates a drive voltage for driving the imaging unit 11 from a power supply voltage or the like.

The size of the image data output by the imaging unit 11 can be selected from a plurality of sizes such as 12M (3968 × 2976) pixels and VGA (Video Graphics Array) size (640 × 480 pixels Z). . The image data output by the imaging unit 11 can be, for example, a color image of RGB (red, green, and blue) or a monochrome image of only luminance. These selections can be made as a kind of setting of the shooting mode.

The control unit 12 controls each unit in the image sensor 10 according to, for example, a user operation or a set operation mode.

The signal processing unit 13 performs various kinds of signal processing on digital image data read from the imaging unit 11 or digital image data read from the memory 15 (hereinafter, referred to as processing target image data). . For example, when the image data to be processed is a color image, the signal processing unit 13 converts the format of the image data into YUV image data, RGB image data, or the like. In addition, the signal processing unit 13 performs, for example, processing such as noise removal and white balance adjustment on the image data to be processed as necessary. In addition, the signal processing unit 13 performs various signal processing (also referred to as pre-processing) necessary for the DSP 14 to process the image data to be processed.

The DSP 14, for example, executes a program stored in the memory 15 to function as a processing unit that executes various processes using a learned model created by machine learning using a deep neural network (DNN). . For example, the DSP 14 executes a calculation process based on the learned model stored in the memory 15 to execute a process of multiplying the dictionary coefficient stored in the memory 15 by the image data. The result (calculation result) obtained by such calculation processing is output to the memory 15 and / or the selector 16. The calculation result may include image data obtained by executing a calculation process using the learned model, and various information (metadata) obtained from the image data. Further, the DSP 14 may include a memory controller for controlling access to the memory 15.

The arithmetic processing includes, for example, one using a learned learning model which is an example of a neural network calculation model. For example, the DSP 14 can execute DSP processing, which is various processing, using a learned learning model. For example, the DSP 14 reads out image data from the memory 15 and inputs the image data into a learned learning model, and acquires a face position such as a face outline or a face image area as an output result of the learned model. Then, the DSP 14 performs processing such as masking, mosaic, and avatar processing on the extracted face position in the image data to generate processed image data. After that, the DSP 14 stores the generated processed image data (processed image data) in the memory 15.

{Circle around (4)} The learned learning model includes a DNN, a support vector machine, and the like, which have learned the detection of the face position of a person using the learning data. When image data, which is data to be discriminated, is input, the learned learning model outputs a discrimination result, that is, area information such as an address for specifying a face position. The DSP 14 updates the learning model by changing the weights of various parameters in the learning model using the learning data, or prepares a plurality of learning models and uses the learning model according to the content of the arithmetic processing. , Or a learned model that has been learned from an external device is acquired or updated, and the above-described arithmetic processing can be executed.

Note that the image data to be processed by the DSP 14 may be image data normally read from the pixel array unit 101, or the data size may be reduced by thinning out pixels of the normally read image data. The image data may be reduced image data. Alternatively, the image data may be image data read out with a smaller data size than usual by executing reading out of the pixel array unit 101 by thinning out pixels. Note that the normal reading here may be reading without skipping pixels.

By the face position extraction and processing by such a learning model, the processed image data in which the face position of the image data is masked, the processed image data in which the face position of the image data is mosaic-processed, or the face position of the image data is It is possible to generate avatar-processed image data or the like that is replaced with a character.

The memory 15 stores the image data output from the imaging unit 11, the image data processed by the signal processing unit 13, the calculation result obtained by the DSP 14, and the like as necessary. Further, the memory 15 stores an algorithm of a learned learning model executed by the DSP 14 as a program and a dictionary coefficient.

The memory 15 stores ISO (International Organization for Standardization) sensitivity and exposure time in addition to the image data output from the signal processing unit 13 and the processed image data output from the DSP 14 (hereinafter referred to as processed image data). , A frame rate, a focus, a shooting mode, a cutout range, and the like. That is, the memory 15 can store various types of imaging information set by the user.

The selector 16 selectively outputs the processed image data output from the DSP 14 and the image data stored in the memory 15 according to, for example, a selection control signal from the control unit 12. For example, the selector 16 selects one of the processed image data and the operation result of the metadata or the like stored in the memory 15 by a user setting or the like, and outputs the selected operation result to the application processor 20.

For example, when the processing mode for outputting the processed image data is selected, the selector 16 reads the processed image data generated by the DSP 14 from the memory 15 and outputs the processed image data to the application processor. On the other hand, when the normal processing mode in which the processed image data is not output is selected, the selector 16 outputs the image data input from the signal processing unit 13 to the application processor. When the first processing mode is selected, the selector 16 may directly output the calculation result output from the DSP 14 to the application processor 20.

The image data and the processed image data output from the selector 16 as described above are input to the application processor 20 that processes display and user interface. The application processor 20 is configured using, for example, a CPU, and executes an operating system, various application software, and the like. The application processor 20 may have functions such as a GPU (Graphics Processing Unit) and a baseband processor. The application processor 20 performs various processes as needed on the input image data and the calculation results, executes display to the user, and transmits the image data and the calculation result to the external cloud server 30 via the predetermined network 40. Or

Various networks such as the Internet, a wired LAN (Local Area Network) or a wireless LAN, a mobile communication network, and Bluetooth (registered trademark) can be applied to the predetermined network 40. In addition, the transmission destination of the image data and the calculation result is not limited to the cloud server 30, and various servers having a communication function such as a server that operates alone, a file server that stores various data, and a communication terminal such as a mobile phone. Information processing device (system).

[1-3. Description of Image Processing According to First Embodiment]
FIG. 2 is a diagram illustrating processing of an image according to the first embodiment. As illustrated in FIG. 2, the signal processing unit 13 performs signal processing on the image data read from the imaging unit 11 and stores the processed data in the memory 15. The DSP 14 reads the image data from the memory 15, executes face detection using the learned learning model, and detects a face position from the image data (Process 1).

Next, the DSP 14 performs a processing (processing 2) for performing masking, mosaicing, and the like on the detected face position, generates processed image data, and stores the processed image data in the memory 15. Thereafter, the selector 16 outputs the processed image data in which the face area has been processed to the application processor 20 according to the user's selection.

[1-4. Process flow according to first embodiment]
FIG. 3 is a flowchart showing the flow of the processing according to the first embodiment. As shown in FIG. 3, the image data captured by the imaging unit 11 is stored in the memory 15 (S101).

Then, the DSP 14 reads out the image data from the memory 15 (S102), and detects the face position using the learned learning model (S103). Subsequently, the DSP 14 generates processed image data obtained by processing the face position of the image data, and stores the processed image data in the memory 15 (S104).

Thereafter, when the processing mode that is the processing mode with the processing is selected (S105: Yes), the selector 16 reads the processed image data from the memory 15 and outputs the processed image data to an external device such as the application processor 20 (S106). ).

On the other hand, when the normal processing mode, which is the processing mode without processing, is selected (S105: No), the selector 16 reads out the image data on which the processing has not been performed from the memory 15 and reads the image data from the external device such as the application processor 20. Output to the device (S107).

[1-5. Action / Effect]
As described above, the image sensor 10 can execute the processing in a closed area within one chip even when the processing is required, so that the captured image data can be prevented from being output to the outside as it is, and security can be reduced. And privacy can be improved. Further, since the image sensor 10 allows the user to select whether or not to perform processing, the processing mode can be selected according to the application, and the convenience for the user can be improved.

(2. Modification of First Embodiment)
In the first embodiment, an example in which masking or the like is performed on a face position has been described, but the processing is not limited to this. For example, a partial image in which a face position is extracted can be generated.

FIG. 4 is a diagram illustrating a modification of the first embodiment. As illustrated in FIG. 4, the signal processing unit 13 performs signal processing on the image data read from the imaging unit 11 and stores the processed data in the memory 15. The DSP 14 reads the image data from the memory 15, executes face detection using the learned learning model, and detects a face position from the image data (Process 1).

(4) Subsequently, the DSP 14 generates partial image data from which the detected face position is extracted (Process 2), and stores the partial image data in the memory 15. After that, the selector 16 outputs the partial image data of the face to the application processor 20 according to the user's selection.

As described above, the image sensor 10 can perform extraction of partial image data in a closed area within one chip even when processing is required, so that the application processor such as identification of a person, face authentication, and image collection for each person can be used. 20 can be output. As a result, transmission of unnecessary images can be suppressed, security can be improved, privacy can be protected, and data capacity can be reduced.

(3. Second Embodiment)
[3-1. Description of imaging apparatus according to second embodiment]
By the way, in the first embodiment, the example in which the DSP 14 executes the processing is described. However, the present invention is not limited to this, and the selector 16 can also perform the processing. Therefore, in the second embodiment, an example in which the selector 16 performs the processing will be described.

FIG. 5 is a diagram illustrating an imaging device according to the second embodiment. As shown in FIG. 5, since the configuration of the image sensor 10 according to the second embodiment is the same as that of the image sensor 10 according to the first embodiment, a detailed description is omitted. The difference from the first embodiment is that the DSP 14 of the image sensor 10 notifies the selector 16 of the position information of the face position extracted using the learning model.

For example, as shown in FIG. 5, the signal processing unit 13 performs signal processing on the image data read from the imaging unit 11 and stores the processed data in the memory 15. The DSP 14 reads the image data from the memory 15, executes face detection using the learned learning model, and detects a face position from the image data (Process 1). Then, the DSP 14 notifies the selector 16 of position information such as an address for specifying the face position.

The selector 16 reads the image data from the memory 15 when the processing is selected by the user, and specifies the ROI (Region of interest) to be processed by using the position information acquired from the DSP 14. Then, the selector 16 performs processing such as masking on the specified ROI to generate processed image data (Process 2), and outputs the processed image data to the application processor 20. Note that the selector 16 can also store the processed image data in the memory 15.

[3-2. First Modification of Second Embodiment]
As in the modification of the first embodiment, the selector 16 can also generate a partial image in which the face position is extracted by the selector 16 in the second embodiment.

FIG. 6 is a diagram illustrating a first modification of the second embodiment. As shown in FIG. 6, the signal processing unit 13 performs signal processing on the image data read from the imaging unit 11 and stores the processed data in the memory 15. The DSP 14 reads the image data from the memory 15, executes face detection using the learned learning model, and detects a face position from the image data (Process 1). Then, the DSP 14 notifies the selector 16 of position information such as an address for specifying the face position.

Subsequently, when the processing is selected by the user, the selector 16 reads out the image data from the memory 15 and specifies the ROI (Region of interest) to be processed using the position information acquired from the DSP 14. . Thereafter, the selector 16 generates partial image data in which a portion corresponding to the ROI is extracted from the image data (Process 2), and outputs the partial image data to the application processor 20.

[3-3. Second Modification of Second Embodiment]
In the above-described second embodiment and the first modification thereof, the selector 16 performs processing such as extraction (also called cutout or trimming) or processing (such as masking) of an ROI on image data stored in the memory 15. 2, the selector 16 performs processing 2 such as ROI cutout or processing (masking or the like) directly on the image data output from the signal processing unit 13. It is also possible to configure to execute.

[3-4. Third Modification of Second Embodiment]
Further, the image data itself read from the imaging unit 11 may be partial image data only of the ROI or image data not including the ROI. In that case, the face position extracted by the DSP 14 with respect to the first frame is notified to the control unit 12, and the control unit 12 instructs the imaging unit 11 in the second frame that is the next frame of the first frame. The reading of the partial image data from the pixel area corresponding to the ROI and the reading of the image data from the pixel area corresponding to the area other than the ROI are executed.

In the second embodiment and its modifications, the selector 16 is not limited to the processing such as masking, and can also rewrite only the area corresponding to the ROI in the image data to another image and output the image. Only the area corresponding to the ROI of the image data can be output without being read from the memory 15. Note that this processing can also be executed by the DSP 14 in the first embodiment.

As described above, since the image sensor 10 can execute the processing by the selector 16, the processing load of the DSP 14 when the processing is unnecessary can be reduced. Further, since the image sensor 10 can output the processed image processed by the selector 16 without storing it in the memory 15, the used capacity of the memory 15 can be reduced, and the cost and size of the memory 15 can be reduced. Can be achieved. As a result, the size of the entire image sensor 10 can be reduced.

(4. Third Embodiment)
[4-1. Description of imaging apparatus according to third embodiment]
By the way, the image sensor 10 can read out a small amount of image data first and read out the face position before reading out the entire image data from the imaging unit 11, thereby speeding up the processing. Thus, in a third embodiment, an example will be described in which the processing speed is increased.

FIG. 7 is a diagram illustrating an imaging device according to the third embodiment. As shown in FIG. 7, the configuration of the image sensor 10 according to the third embodiment is the same as that of the image sensor 10 according to the first embodiment, and a detailed description thereof will be omitted. Here, differences from the first embodiment will be described.

For example, as shown in FIG. 7, when reading image data from all the unit pixels, the imaging unit 11 reads out the image data of a small capacity from the target unit pixel instead of all the unit pixels, and stores the thinned-out image data in the memory 15. Store. In parallel with this, the imaging unit 11 executes normal reading of image data.

Then, the DSP 14 reads out a small amount of image data from the memory 15, executes face detection using the learned learning model, and detects a face position from the image data (Process 1). Then, the DSP 14 notifies the selector 16 of position information such as an address for specifying the face position.

After that, when the normal image data read by the imaging unit 11 is input, the selector 16 uses the position information acquired from the DSP 14 to convert a ROI (Region of Interest) to be processed from the normal image data. Identify. Then, the selector 16 performs processing such as masking on the area corresponding to the ROI to generate processed image data (Process 2), and outputs the processed image data to the application processor 20.

[4-2. Process flow according to third embodiment]
Next, the flow of the processing described in FIG. 7 will be described. FIG. 8 is a sequence diagram illustrating a flow of the processing according to the third embodiment. As shown in FIG. 8, the imaging unit 11 reads out the image by thinning it out (S201), and stores the thinned-out image data in the memory 15 (S202). After that, the imaging unit 11 continues reading the normal image data.

In parallel with this, the DSP 14 performs face detection from the small-capacity image data using DNN or the like, and detects the face position (S203). Then, the DSP 14 notifies the selector 16 of the position information of the detected face position (S205 and S206).

Then, the selector 16 holds the position information of the face position notified from the DSP 14 (S207). Thereafter, when the reading of the normal image data is completed, the imaging unit 11 outputs the data to the selector 16 (S209 and S210), and the selector 16 specifies the face position from the normal image data using the position information of the face position. (S211).

Then, the selector 16 generates processed image data obtained by processing the face position (S212), and outputs the processed image data to an external device (S213). For example, the selector 16 cuts out and outputs only the position of the face detected by the DNN. As described above, since the image sensor 10 can detect the face position before the normal reading of the image data is completed, the image sensor 10 can execute the processing without delay after the reading of the image data. Processing can be speeded up as compared with the embodiment.

(5. Image sensor chip configuration)
Next, an example of a chip configuration of the image sensor 10 shown in FIG. 1 will be described in detail below with reference to the drawings.

FIG. 9 is a schematic diagram illustrating an example of a chip configuration of the image sensor according to the present embodiment. As shown in FIG. 9, the image sensor 10 has a laminated structure in which a rectangular flat plate-shaped first substrate (die) 100 and a rectangular flat plate-shaped second substrate (die) 120 are bonded together. I have.

サイズ The size of the first substrate 100 and the size of the second substrate may be the same, for example. Further, the first substrate 100 and the second substrate 120 may be semiconductor substrates such as a silicon substrate.

(1) On the first substrate 100, the pixel array unit 101 of the imaging unit 11 in the configuration of the image sensor 10 shown in FIG. Further, a part or all of the optical system 104 may be provided on the first substrate 100 on a chip.

1. On the second substrate 120, in the configuration of the image sensor 10 shown in FIG. 1, the ADC 17, the control unit 12, the signal processing unit 13, the DSP 14, the memory 15, and the selector 16 are arranged. Note that an interface circuit, a driver circuit, and the like (not shown) may be arranged on the second substrate 120.

The bonding of the first substrate 100 and the second substrate 120 is performed by dividing the first substrate 100 and the second substrate 120 into chips, respectively, and then dividing the first substrate 100 and the second substrate 120 into individual chips. A so-called CoC (Chip-on-Chip) method of bonding may be used. Alternatively, one of the first substrate 100 and the second substrate 120 (for example, the first substrate 100) may be separated into chips, and then this chip may be separated. The so-called CoW (Chip on Wafer) method, in which the singulated first substrate 100 is bonded to the second substrate 120 before singulation (that is, in a wafer state), or the first substrate 100 and the second substrate 120 may be used. A so-called WoW (Wafer-on-Wafer) method may be used in which the substrate 120 and the substrate 120 are bonded together in a wafer state.

には As a method for bonding the first substrate 100 and the second substrate 120, for example, plasma bonding or the like can be used. However, the present invention is not limited to this, and various joining methods may be used.

(6. Layout example)
FIGS. 10 and 11 are diagrams for explaining a layout example according to the present embodiment. FIG. 10 shows a layout example of the first substrate 100, and FIG. 11 shows a layout example of the second substrate 120.

[6-1. Layout example of first substrate]
As shown in FIG. 10, on the first substrate 100, the pixel array unit 101 of the imaging unit 11 in the configuration of the image sensor 10 shown in FIG. When a part or all of the optical system 104 is mounted on the first substrate 100, the optical system 104 is provided at a position corresponding to the pixel array unit 101.

The pixel array unit 101 is arranged to be shifted toward one side L101 among the four sides L101 to L104 of the first substrate 100. In other words, the pixel array unit 101 is arranged such that the center O101 is closer to the side L101 than the center O100 of the first substrate 100. When the surface of the first substrate 100 on which the pixel array unit 101 is provided is rectangular, the side L101 may be, for example, the shorter side. However, the present invention is not limited to this, and the pixel array unit 101 may be arranged to be offset on the longer side.

Each of the unit pixels 101a in the pixel array unit 101 is placed in a region near the side L101 of the four sides of the pixel array unit 101, in other words, in a region between the side L101 and the pixel array unit 101. A TSV array 102 in which a plurality of through wirings (Through Silicon Via) (hereinafter, referred to as TSVs) penetrating the first substrate 100 is provided as wiring for electrically connecting to the ADC 17 arranged in the 120. As described above, by bringing the TSV array 102 close to the side L101 to which the pixel array unit 101 is close, it is possible to easily secure an arrangement space for each unit such as the ADC 17 on the second substrate 120.

Note that the TSV array 102 has a region close to one of the two sides L103 and L104 intersecting with the side L101 (but may be the side L103), in other words, the side L104 (or the side L103). It may be provided in a region between the pixel array unit 101 and the pixel array unit 101.

Of the four sides L101 to L104 of the first substrate 100, each of the sides L102 to L103 in which the pixel array unit 101 is not offset is provided with a pad array 103 including a plurality of pads arranged linearly. ing. The pads included in the pad array 103 include, for example, a pad (also referred to as a power supply pin) to which a power supply voltage for an analog circuit such as the pixel array unit 101 and the ADC 17 is applied, a signal processing unit 13, a DSP 14, a memory 15, and a selector. 16 and a pad (also referred to as a power supply pin) to which a power supply voltage for digital circuits such as the control unit 12 is applied, and an interface pad (also referred to as a signal pin) such as MIPI (Mobile Industry Processor Interface) or SPI (Serial Peripheral Interface). ), And pads (also called signal pins) for inputting and outputting clocks and data. Each pad is electrically connected to an external power supply circuit or interface circuit via a wire, for example. It is preferable that each pad array 103 and the TSV array 102 are sufficiently separated from each other in the pad array 103 so that the influence of signal reflection from a wire connected to each pad in the pad array 103 can be ignored.

[6-2. Layout example of second substrate]
On the other hand, as shown in FIG. 11, the ADC 17, the control unit 12, the signal processing unit 13, the DSP 14, and the memory 15 in the configuration of the image sensor 10 shown in FIG. ing. In the first layout example, the memory 15 is divided into two areas, a memory 15A and a memory 15B. Similarly, the ADC 17 is divided into two areas: an ADC 17A and a DAC (Digital-to-Analog Converter) 17B. The DAC 17B supplies a reference voltage for AD conversion to the ADC 17A, and is included in a part of the ADC 17 in a broad sense. Although not shown in FIG. 10, the selector 16 is also arranged on the second substrate 120.

Further, the second substrate 120 includes a wiring 122 electrically connected to each TSV (hereinafter, simply referred to as the TSV array 102) in the TSV array 102 penetrating the first substrate 100, A pad array 123 in which a plurality of pads electrically connected to each pad in the pad array 103 of the substrate 100 is linearly arranged.

For connection between the TSV array 102 and the wiring 122, for example, two TSVs, that is, a TSV provided on the first substrate 100 and a TSV provided from the first substrate 100 to the second substrate 120, are connected in an out-of-chip manner. A so-called twin TSV method, a so-called shared TSV method in which connection is performed by a common TSV provided from the first substrate 100 to the second substrate 120, or the like can be employed. However, the present invention is not limited thereto, and various methods such as a so-called Cu-Cu bonding method in which copper (Cu) exposed on the bonding surface of the first substrate 100 and the bonding surface of the second substrate 120 are bonded to each other are used. A connection mode can be adopted.

The connection form between each pad in the pad array 103 of the first substrate 100 and each pad in the pad array 123 of the second substrate 120 is, for example, wire bonding. However, the present invention is not limited to this, and connection forms such as through holes and castellations may be used.

In the layout example of the second substrate 120, for example, the vicinity of the wiring 122 connected to the TSV array 102 is set as the upstream side, and the ADC 17A and the ADC 17A are sequentially arranged from the upstream along the flow of the signal read from the pixel array unit 101. The signal processing unit 13 and the DSP 14 are provided. That is, the ADC 17A to which the pixel signal read from the pixel array unit 101 is first input is arranged near the wiring 122 on the most upstream side, and then the signal processing unit 13 is arranged and the area farthest from the wiring 122 The DSP 14 is arranged in the. In this manner, by arranging the components from the ADC 17 to the DSP 14 from the upstream side along the signal flow, it is possible to reduce the number of wires connecting each part. This makes it possible to reduce signal delay, reduce signal propagation loss, improve the S / N ratio, and reduce power consumption.

{Circle around (1)} The control unit 12 is arranged, for example, near the wiring 122 on the upstream side. In FIG. 10, the control unit 12 is disposed between the ADC 17A and the signal processing unit 13. With such a layout, it is possible to reduce the signal delay, reduce the signal propagation loss, improve the S / N ratio, and reduce the power consumption when the control unit 12 controls the pixel array unit 101. In addition, signal pins and power supply pins for analog circuits are collectively arranged near the analog circuit (for example, the lower side in FIG. 10), and signal pins and power supply pins for the remaining digital circuits are placed near digital circuits (for example, 10 (upper side in FIG. 10), and the power pins for analog circuits and the power pins for digital circuits can be sufficiently separated.

In the layout shown in FIG. 10, the DSP 14 is disposed on the opposite side of the ADC 17A, which is the most downstream side. With such a layout, in other words, the DSP 14 is disposed in a region that does not overlap with the pixel array unit 101 in the stacking direction of the first substrate 100 and the second substrate 120 (hereinafter, simply referred to as the vertical direction). Becomes possible.

As described above, by configuring the pixel array unit 101 and the DSP 14 not to overlap in the vertical direction, it is possible to reduce noise generated by the DSP 14 performing signal processing from entering the pixel array unit 101. Become. As a result, even when the DSP 14 is operated as a processing unit that executes an operation based on the learned model, it is possible to reduce the entry of noise due to the signal processing of the DSP 14 into the pixel array unit 101, It is possible to obtain an image with reduced quality deterioration.

The DSP 14 and the signal processing unit 13 are connected by a part of the DSP 14 or a connection unit 14a formed by a signal line. The selector 16 is arranged, for example, near the DSP 14. When the connecting portion 14a is a part of the DSP 14, some of the DSPs 14 overlap the pixel array portion 101 in the vertical direction, but even in such a case, all the DSPs 14 overlap the pixel array portion 101 in the vertical direction. It is possible to reduce the intrusion of noise into the pixel array unit 101 as compared with the case of performing the operation.

The

memories

15A and 15B are arranged, for example, so as to surround the DSP 14 from three directions. In this way, by disposing the

memories

15A and 15B so as to surround the DSP 14, it is possible to shorten the overall distance while averaging the wiring distance between each memory element and the DSP 14 in the memory 15. This makes it possible to reduce signal delay, signal propagation loss, and power consumption when the DSP 14 accesses the memory 15.

The pad array 123 is disposed, for example, at a position on the second substrate 120 corresponding to the pad array 103 of the first substrate 100 in the vertical direction. Here, of the pads included in the pad array 123, a pad located near the ADC 17A is used for transmitting a power supply voltage and an analog signal for an analog circuit (mainly, the ADC 17A). On the other hand, pads located near the control unit 12, the signal processing unit 13, the DSP 14, and the

memories

15A and 15B are power supplies for digital circuits (mainly, the control unit 12, the signal processing unit 13, the DSP 14, the

memories

15A and 15B). Used for voltage and digital signal propagation. With such a pad layout, it is possible to reduce a distance on a wiring connecting each pad and each part. This makes it possible to reduce signal delay, reduce signal and power supply voltage propagation loss, improve S / N ratio, and reduce power consumption.

(7. Other embodiments)
The processing according to each of the above-described embodiments may be performed in various different forms other than the above-described embodiments.

For example, in the processing, various processes other than those described in the above embodiment can be executed according to the content learned by the learning model. For example, not only the whole face is extracted, but also the outline of the face, only a part of the eyes and the nose, the owner of the imaging device 1 or a specific person, the image of the house, It is also possible to extract a part of the nameplate, window, etc. from. In addition, it is also possible to extract an outdoor part reflected in image data in a room, to extract a person and an animal separately, and to extract a window part from image data. Further, as an example of the processing, processing to read only a specific area extracted such as a face, not to read only a specific area, to paint a specific area black, or to read an image in which only a specific area is cut out. included. Further, not only a rectangular area but also an arbitrary area such as a triangle can be extracted. Processing such as masking and mosaic processing is not limited to one processing, and a plurality of processings can be combined. The extraction of the face position and the like can be executed not only by the DSP 14 but also by the signal processing unit 13.

In the above embodiment, the learning model learned by the DNN has been exemplified. However, various neural networks such as a RNN (Recurrent Neural Network) and a CNN (Convolutional Neural Network) can be used in addition to the DNN. The learning model is not limited to a learning model using DNN, but may be a learning model learned by various other machine learning such as a decision tree or a support vector machine.

処理 The processing procedures, control procedures, specific names, and information including various data and parameters shown in the above documents and drawings can be arbitrarily changed unless otherwise specified. The specific examples, distributions, numerical values, and the like described in the embodiments are merely examples, and can be arbitrarily changed.

The components of each device shown in the drawings are functionally conceptual, and do not necessarily need to be physically configured as shown in the drawings. That is, the specific form of distribution / integration of each device is not limited to the one shown in the figure, and all or a part thereof may be functionally or physically distributed / arbitrarily divided into arbitrary units according to various loads and usage conditions. Can be integrated and configured. For example, the control unit 12 and the signal processing unit 13 shown in FIG. 1 may be integrated.

(8. Application example to mobile object)
The technology (the present technology) according to the present disclosure can be applied to various products. For example, the technology according to the present disclosure is realized as a device mounted on any type of moving object such as an automobile, an electric vehicle, a hybrid electric vehicle, a motorcycle, a bicycle, a personal mobility, an airplane, a drone, a ship, and a robot. You may.

FIG. 12 is a block diagram illustrating a schematic configuration example of a vehicle control system that is an example of a mobile object control system to which the technology according to the present disclosure may be applied.

Vehicle control system 12000 includes a plurality of electronic control units connected via communication network 12001. In the example shown in FIG. 12, the vehicle control system 12000 includes a drive system control unit 12010, a body system control unit 12020, an outside information detection unit 12030, an inside information detection unit 12040, and an integrated control unit 12050. As a functional configuration of the integrated control unit 12050, a microcomputer 12051, an audio / video output unit 12052, and a vehicle-mounted network I / F (Interface) 12053 are illustrated.

The drive system control unit 12010 controls the operation of the device related to the drive system of the vehicle according to various programs. For example, the drive system control unit 12010 includes a drive force generation device for generating a drive force of the vehicle such as an internal combustion engine or a drive motor, a drive force transmission mechanism for transmitting the drive force to the wheels, and a steering angle of the vehicle. It functions as a control mechanism such as a steering mechanism that adjusts and a braking device that generates a braking force of the vehicle.

The body control unit 12020 controls the operation of various devices mounted on the vehicle body according to various programs. For example, the body-related control unit 12020 functions as a keyless entry system, a smart key system, a power window device, or a control device for various lamps such as a head lamp, a back lamp, a brake lamp, a blinker, and a fog lamp. In this case, a radio wave or a signal of various switches transmitted from a portable device replacing the key can be input to the body control unit 12020. The body control unit 12020 receives the input of these radio waves or signals, and controls a door lock device, a power window device, a lamp, and the like of the vehicle.

外 Out-of-vehicle information detection unit 12030 detects information external to the vehicle on which vehicle control system 12000 is mounted. For example, an imaging unit 12031 is connected to the outside-of-vehicle information detection unit 12030. The out-of-vehicle information detection unit 12030 causes the imaging unit 12031 to capture an image outside the vehicle, and receives the captured image. The out-of-vehicle information detection unit 12030 may perform an object detection process or a distance detection process of a person, a vehicle, an obstacle, a sign, a character on a road surface, or the like based on the received image.

The imaging unit 12031 is an optical sensor that receives light and outputs an electric signal according to the amount of received light. The imaging unit 12031 can output an electric signal as an image or can output the information as distance measurement information. The light received by the imaging unit 12031 may be visible light or non-visible light such as infrared light.

The in-vehicle information detection unit 12040 detects information in the vehicle. The in-vehicle information detection unit 12040 is connected to, for example, a driver status detection unit 12041 that detects the status of the driver. The driver state detection unit 12041 includes, for example, a camera that captures an image of the driver, and the in-vehicle information detection unit 12040 determines the degree of driver fatigue or concentration based on the detection information input from the driver state detection unit 12041. The calculation may be performed, or it may be determined whether the driver has fallen asleep.

The microcomputer 12051 calculates a control target value of the driving force generation device, the steering mechanism or the braking device based on the information on the inside and outside of the vehicle acquired by the outside information detection unit 12030 or the inside information detection unit 12040, and the drive system control unit A control command can be output to 12010. For example, the microcomputer 12051 realizes functions of an ADAS (Advanced Driver Assistance System) including a vehicle collision avoidance or a shock mitigation, a following operation based on an inter-vehicle distance, a vehicle speed maintaining operation, a vehicle collision warning, or a vehicle lane departure warning. Cooperative control for the purpose.

Further, the microcomputer 12051 controls the driving force generation device, the steering mechanism, the braking device, and the like based on the information on the surroundings of the vehicle acquired by the outside information detection unit 12030 or the inside information detection unit 12040, and thereby, It is possible to perform cooperative control for automatic driving or the like in which the vehicle travels autonomously without depending on the operation.

マイクロ Also, the microcomputer 12051 can output a control command to the body system control unit 12020 based on information on the outside of the vehicle acquired by the outside information detection unit 12030. For example, the microcomputer 12051 controls the headlamp according to the position of the preceding vehicle or the oncoming vehicle detected by the outside information detection unit 12030, and performs cooperative control for the purpose of preventing glare such as switching a high beam to a low beam. It can be carried out.

The sound image output unit 12052 transmits at least one of a sound signal and an image signal to an output device capable of visually or audibly notifying a passenger of the vehicle or the outside of the vehicle of information. In the example of FIG. 12, an audio speaker 12061, a display unit 12062, and an instrument panel 12063 are illustrated as output devices. The display unit 12062 may include, for example, at least one of an on-board display and a head-up display.

FIG. 13 is a diagram illustrating an example of an installation position of the imaging unit 12031.

では In FIG. 13, the imaging unit 12031 includes

imaging units

12101, 12102, 12103, 12104, and 12105.

The

imaging units

12101, 12102, 12103, 12104, and 12105 are provided, for example, at positions such as a front nose, a side mirror, a rear bumper, a back door, and an upper part of a windshield in the vehicle interior of the vehicle 12100. The imaging unit 12101 provided on the front nose and the imaging unit 12105 provided above the windshield in the passenger compartment mainly acquire an image in front of the vehicle 12100. The

imaging units

12102 and 12103 provided in the side mirror mainly acquire images of the side of the vehicle 12100. The imaging unit 12104 provided in the rear bumper or the back door mainly acquires an image behind the vehicle 12100. The imaging unit 12105 provided above the windshield in the passenger compartment is mainly used for detecting a preceding vehicle, a pedestrian, an obstacle, a traffic light, a traffic sign, a lane, and the like.

FIG. 13 shows an example of the imaging range of the imaging units 12101 to 12104. The imaging range 12111 indicates the imaging range of the imaging unit 12101 provided on the front nose, the imaging ranges 12112 and 12113 indicate the imaging ranges of the

imaging units

12102 and 12103 provided on the side mirrors, respectively, and the imaging range 12114 indicates 14 shows an imaging range of an imaging unit 12104 provided in a rear bumper or a back door. For example, by overlaying image data captured by the imaging units 12101 to 12104, an overhead image of the vehicle 12100 viewed from above can be obtained.

At least one of the imaging units 12101 to 12104 may have a function of acquiring distance information. For example, at least one of the imaging units 12101 to 12104 may be a stereo camera including a plurality of imaging elements or an imaging element having pixels for detecting a phase difference.

For example, based on the distance information obtained from the imaging units 12101 to 12104, the microcomputer 12051 calculates a distance to each three-dimensional object in the imaging ranges 12111 to 12114 and a temporal change of the distance (relative speed with respect to the vehicle 12100). In particular, it is possible to extract, as a preceding vehicle, a three-dimensional object that travels at a predetermined speed (for example, 0 km / h or more) in the same direction as the vehicle 12100, which is the closest three-dimensional object on the traveling path of the vehicle 12100. it can. Further, the microcomputer 12051 can set an inter-vehicle distance to be secured before the preceding vehicle and perform automatic brake control (including follow-up stop control), automatic acceleration control (including follow-up start control), and the like. In this way, it is possible to perform cooperative control for automatic driving or the like in which the vehicle travels autonomously without depending on the operation of the driver.

For example, the microcomputer 12051 converts the three-dimensional object data relating to the three-dimensional object into other three-dimensional objects such as a motorcycle, a normal vehicle, a large vehicle, a pedestrian, a telephone pole, and the like based on the distance information obtained from the imaging units 12101 to 12104. It can be classified and extracted and used for automatic avoidance of obstacles. For example, the microcomputer 12051 distinguishes obstacles around the vehicle 12100 into obstacles that are visible to the driver of the vehicle 12100 and obstacles that are difficult to see. Then, the microcomputer 12051 determines a collision risk indicating a risk of collision with each obstacle, and when the collision risk is equal to or more than the set value and there is a possibility of collision, via the audio speaker 12061 or the display unit 12062. By outputting an alarm to the driver through forced driving and avoidance steering via the drive system control unit 12010, driving assistance for collision avoidance can be performed.

At least one of the imaging units 12101 to 12104 may be an infrared camera that detects infrared light. For example, the microcomputer 12051 can recognize a pedestrian by determining whether or not a pedestrian exists in the captured images of the imaging units 12101 to 12104. The recognition of such a pedestrian is performed by, for example, extracting a feature point in an image captured by the imaging units 12101 to 12104 as an infrared camera, and performing a pattern matching process on a series of feature points indicating the outline of the object to determine whether the object is a pedestrian. Is performed according to a procedure for determining When the microcomputer 12051 determines that a pedestrian is present in the images captured by the imaging units 12101 to 12104 and recognizes the pedestrian, the audio image output unit 12052 outputs a rectangular outline to the recognized pedestrian for emphasis. The display unit 12062 is controlled so that is superimposed. In addition, the sound image output unit 12052 may control the display unit 12062 to display an icon or the like indicating a pedestrian at a desired position.

As described above, an example of the vehicle control system to which the technology according to the present disclosure can be applied has been described. The technology according to the present disclosure can be applied to the imaging unit 12031 or the like among the configurations described above. By applying the technology according to the present disclosure to the imaging unit 12031 and the like, it is possible to reduce the size of the imaging unit 12031 and the like, so that the interior and exterior of the vehicle 12100 can be easily designed. In addition, by applying the technology according to the present disclosure to the imaging unit 12031 and the like, a clear image with reduced noise can be obtained, so that a more easily viewable captured image can be provided to the driver. This makes it possible to reduce driver fatigue.

(9. Example of application to endoscopic surgery system)
The technology (the present technology) according to the present disclosure can be applied to various products. For example, the technology according to the present disclosure may be applied to an endoscopic surgery system.

FIG. 14 is a diagram illustrating an example of a schematic configuration of an endoscopic surgery system to which the technology (the present technology) according to the present disclosure may be applied.

FIG. 14 illustrates a situation where an operator (doctor) 11131 is performing an operation on a patient 11132 on a patient bed 11133 using the endoscopic surgery system 11000. As shown, the endoscopic surgery system 11000 includes an endoscope 11100, other surgical tools 11110 such as an insufflation tube 11111 and an energy treatment tool 11112, and a support arm device 11120 that supports the endoscope 11100. And a cart 11200 on which various devices for endoscopic surgery are mounted.

The endoscope 11100 includes a lens barrel 11101 having a predetermined length from the distal end inserted into the body cavity of the patient 11132, and a camera head 11102 connected to the proximal end of the lens barrel 11101. In the illustrated example, the endoscope 11100 which is configured as a so-called rigid endoscope having a hard lens barrel 11101 is illustrated. However, the endoscope 11100 may be configured as a so-called flexible endoscope having a soft lens barrel. Good.

開口 An opening in which an objective lens is fitted is provided at the tip of the lens barrel 11101. A light source device 11203 is connected to the endoscope 11100, and light generated by the light source device 11203 is guided to the distal end of the lens barrel by a light guide that extends inside the lens barrel 11101, and the objective The light is radiated toward the observation target in the body cavity of the patient 11132 via the lens. In addition, the endoscope 11100 may be a direct view scope, a perspective view scope, or a side view scope.

光学 An optical system and an image sensor are provided inside the camera head 11102, and the reflected light (observation light) from the observation target is focused on the image sensor by the optical system. The observation light is photoelectrically converted by the imaging element, and an electric signal corresponding to the observation light, that is, an image signal corresponding to the observation image is generated. The image signal is transmitted to a camera control unit (CCU: \ Camera \ Control \ Unit) 11201 as RAW data.

The $ CCU 11201 is configured by a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), and the like, and controls the operations of the endoscope 11100 and the display device 11202 overall. Further, the CCU 11201 receives an image signal from the camera head 11102, and performs various image processing on the image signal for displaying an image based on the image signal, such as a development process (demosaicing process).

The display device 11202 displays an image based on an image signal on which image processing has been performed by the CCU 11201 under the control of the CCU 11201.

The light source device 11203 is configured by a light source such as an LED (light emitting diode), for example, and supplies the endoscope 11100 with irradiation light when imaging an operation part or the like.

The input device 11204 is an input interface for the endoscopic surgery system 11000. The user can input various information and input instructions to the endoscopic surgery system 11000 via the input device 11204. For example, the user inputs an instruction or the like to change imaging conditions (type of irradiation light, magnification, focal length, and the like) by the endoscope 11100.

The treatment instrument control device 11205 controls the driving of the energy treatment instrument 11112 for cauterizing, incising a tissue, sealing a blood vessel, and the like. The insufflation device 11206 is used to inflate the body cavity of the patient 11132 for the purpose of securing the visual field by the endoscope 11100 and securing the working space of the operator. Send. The recorder 11207 is a device that can record various types of information related to surgery. The printer 11208 is a device capable of printing various types of information on surgery in various formats such as text, images, and graphs.

The light source device 11203 that supplies the endoscope 11100 with irradiation light at the time of imaging the operation site can be configured by, for example, a white light source including an LED, a laser light source, or a combination thereof. When a white light source is configured by a combination of the RGB laser light sources, the output intensity and output timing of each color (each wavelength) can be controlled with high accuracy, so that the light source device 11203 adjusts the white balance of the captured image. It can be carried out. In this case, the laser light from each of the RGB laser light sources is radiated to the observation target in a time-division manner, and the driving of the image pickup device of the camera head 11102 is controlled in synchronization with the irradiation timing. It is also possible to capture the image obtained in a time-division manner. According to this method, a color image can be obtained without providing a color filter in the image sensor.

The driving of the light source device 11203 may be controlled so as to change the intensity of output light at predetermined time intervals. By controlling the driving of the image sensor of the camera head 11102 in synchronization with the timing of the change of the light intensity, an image is acquired in a time-division manner, and the image is synthesized, so that a high dynamic image without so-called blackout and whiteout is obtained. An image of the range can be generated.

The light source device 11203 may be configured to be able to supply light in a predetermined wavelength band corresponding to special light observation. In the special light observation, for example, the wavelength dependence of light absorption in body tissue is used to irradiate light of a narrower band compared to irradiation light (ie, white light) at the time of normal observation, so that the surface of the mucous membrane is exposed. A so-called narrow-band light observation (Narrow-Band-Imaging) for photographing a predetermined tissue such as a blood vessel with high contrast is performed. Alternatively, in the special light observation, fluorescence observation in which an image is obtained by fluorescence generated by irradiating excitation light may be performed. In fluorescence observation, the body tissue is irradiated with excitation light to observe fluorescence from the body tissue (autofluorescence observation), or a reagent such as indocyanine green (ICG) is locally injected into the body tissue and Irradiation with excitation light corresponding to the fluorescence wavelength of the reagent can be performed to obtain a fluorescence image. The light source device 11203 can be configured to be able to supply narrowband light and / or excitation light corresponding to such special light observation.

FIG. 15 is a block diagram showing an example of a functional configuration of the camera head 11102 and the CCU 11201 shown in FIG.

The camera head 11102 includes a lens unit 11401, an imaging unit 11402, a driving unit 11403, a communication unit 11404, and a camera head control unit 11405. The CCU 11201 includes a communication unit 11411, an image processing unit 11412, and a control unit 11413. The camera head 11102 and the CCU 11201 are communicably connected to each other by a transmission cable 11400.

The lens unit 11401 is an optical system provided at a connection with the lens barrel 11101. Observation light taken in from the tip of the lens barrel 11101 is guided to the camera head 11102, and enters the lens unit 11401. The lens unit 11401 is configured by combining a plurality of lenses including a zoom lens and a focus lens.

撮像 The number of imaging elements constituting the imaging unit 11402 may be one (so-called single-panel type) or plural (so-called multi-panel type). When the imaging unit 11402 is configured as a multi-panel type, for example, an image signal corresponding to each of RGB may be generated by each imaging element, and a color image may be obtained by combining the image signals. Alternatively, the imaging unit 11402 may be configured to include a pair of imaging elements for acquiring right-eye and left-eye image signals corresponding to 3D (dimensional) display. By performing the 3D display, the operator 11131 can more accurately grasp the depth of the living tissue in the operative part. Note that when the imaging unit 11402 is configured as a multi-plate system, a plurality of lens units 11401 may be provided for each imaging element.

撮像 In addition, the imaging unit 11402 does not necessarily have to be provided in the camera head 11102. For example, the imaging unit 11402 may be provided inside the lens barrel 11101 immediately after the objective lens.

The drive unit 11403 is configured by an actuator, and moves the zoom lens and the focus lens of the lens unit 11401 by a predetermined distance along the optical axis under the control of the camera head control unit 11405. Thus, the magnification and the focus of the image captured by the imaging unit 11402 can be appropriately adjusted.

The communication unit 11404 is configured by a communication device for transmitting and receiving various information to and from the CCU 11201. The communication unit 11404 transmits the image signal obtained from the imaging unit 11402 as RAW data to the CCU 11201 via the transmission cable 11400.

The communication unit 11404 receives a control signal for controlling driving of the camera head 11102 from the CCU 11201 and supplies the control signal to the camera head control unit 11405. The control signal includes, for example, information indicating that the frame rate of the captured image is specified, information that specifies the exposure value at the time of imaging, and / or information that specifies the magnification and focus of the captured image. Contains information about the condition.

Note that the above-described imaging conditions such as the frame rate, the exposure value, the magnification, and the focus may be appropriately designated by the user, or may be automatically set by the control unit 11413 of the CCU 11201 based on the acquired image signal. Good. In the latter case, a so-called AE (Auto Exposure) function, an AF (Auto Focus) function, and an AWB (Auto White Balance) function are mounted on the endoscope 11100.

The camera head control unit 11405 controls the driving of the camera head 11102 based on the control signal from the CCU 11201 received via the communication unit 11404.

The communication unit 11411 is configured by a communication device for transmitting and receiving various information to and from the camera head 11102. The communication unit 11411 receives an image signal transmitted from the camera head 11102 via the transmission cable 11400.

(4) The communication unit 11411 transmits a control signal for controlling driving of the camera head 11102 to the camera head 11102. The image signal and the control signal can be transmitted by electric communication, optical communication, or the like.

The image processing unit 11412 performs various types of image processing on an image signal that is RAW data transmitted from the camera head 11102.

The control unit 11413 performs various kinds of control related to imaging of the operation section and the like by the endoscope 11100 and display of a captured image obtained by imaging the operation section and the like. For example, the control unit 11413 generates a control signal for controlling driving of the camera head 11102.

制御 Also, the control unit 11413 causes the display device 11202 to display a captured image showing the operative part or the like based on the image signal subjected to the image processing by the image processing unit 11412. At this time, the control unit 11413 may recognize various objects in the captured image using various image recognition techniques. For example, the control unit 11413 detects a shape, a color, or the like of an edge of an object included in the captured image, and thereby detects a surgical tool such as forceps, a specific living body site, bleeding, a mist when using the energy treatment tool 11112, and the like. Can be recognized. When displaying the captured image on the display device 11202, the control unit 11413 may use the recognition result to superimpose and display various types of surgery support information on the image of the operative site. By superimposing the operation support information and presenting it to the operator 11131, the burden on the operator 11131 can be reduced, and the operator 11131 can reliably perform the operation.

The transmission cable 11400 connecting the camera head 11102 and the CCU 11201 is an electric signal cable corresponding to electric signal communication, an optical fiber corresponding to optical communication, or a composite cable thereof.

Here, in the illustrated example, the communication is performed by wire using the transmission cable 11400, but the communication between the camera head 11102 and the CCU 11201 may be performed wirelessly.

As described above, an example of the endoscopic surgery system to which the technology according to the present disclosure can be applied has been described. The technology according to the present disclosure can be applied to, for example, the imaging unit 11402 of the camera head 11102 among the configurations described above. By applying the technology according to the present disclosure to the camera head 11102, the camera head 11102 and the like can be reduced in size, so that the endoscopic surgery system 11000 can be reduced in size. In addition, by applying the technology according to the present disclosure to the camera head 11102 and the like, a clear image with reduced noise can be obtained, and thus a more easily viewable captured image can be provided to the operator. Thereby, it becomes possible to reduce the fatigue of the operator.

Here, the endoscopic surgery system has been described as an example, but the technology according to the present disclosure may be applied to, for example, a microscopic surgery system or the like.

(10. Application to WSI (Whole Slide Imaging) system)
The technology according to the present disclosure can be applied to various products. For example, the technology according to the present disclosure may be applied to a pathological diagnosis system in which a doctor or the like observes cells or tissues collected from a patient to diagnose a lesion or a support system therefor (hereinafter, referred to as a diagnosis support system). Good. This diagnosis support system may be a WSI (Whole Slide Imaging) system that diagnoses or supports a lesion based on an image acquired using digital pathology technology.

FIG. 16 is a diagram illustrating an example of a schematic configuration of a diagnosis support system 5500 to which the technology according to the present disclosure is applied. As shown in FIG. 16, the diagnosis support system 5500 includes one or more pathology systems 5510. Further, a medical information system 5530 and a derivation device 5540 may be included.

Each of the # 1 or more pathology systems 5510 is a system mainly used by a pathologist, and is introduced into, for example, a research laboratory or a hospital. Each pathological system 5510 may be installed in different hospitals, and may be connected to various networks such as a WAN (Wide Area Network) (including the Internet), a LAN (Local Area Network), a public line network, and a mobile communication network. It is connected to the medical information system 5530 and the derivation device 5540 via the terminal.

Each pathological system 5510 includes a microscope 5511, a server 5512, a display control device 5513, and a display device 5514.

The microscope 5511 has a function of an optical microscope, and captures an observation object contained in a glass slide to acquire a pathological image as a digital image. The observation target is, for example, a tissue or a cell collected from a patient, and may be a piece of organ, saliva, blood, or the like.

The server 5512 stores and stores the pathological image acquired by the microscope 5511 in a storage unit (not shown). In addition, when the server 5512 receives a browsing request from the display control device 5513, the server 5512 searches for a pathological image from a storage unit (not shown), and sends the searched pathological image to the display control device 5513.

The display control device 5513 sends a browsing request for a pathological image received from the user to the server 5512. Then, the display control device 5513 causes the pathological image received from the server 5512 to be displayed on a display device 5514 using a liquid crystal, EL (Electro-Luminescence), CRT (Cathode Ray Tube), or the like. Note that the number of the display devices 5514 may correspond to 4K or 8K, and is not limited to one and may be plural.

Here, when the observation target is a solid such as a piece of meat of an organ, the observation target may be, for example, a stained thin section. The thin section may be produced by, for example, thinly cutting a block piece cut out from a specimen such as an organ. In the case of slicing, the block pieces may be fixed with paraffin or the like.

Various stains may be applied to the staining of the thin sections, such as general staining indicating the morphology of the tissue such as HE (Hematoxylin-Eosin) staining, and immunostaining indicating the immune state of the tissue such as IHC (Immunohistochemistry) staining. . At that time, one thin section may be stained using a plurality of different reagents, or two or more thin sections (also referred to as adjacent thin sections) cut out continuously from the same block piece may use different reagents. May be used for staining.

The microscope 5511 may include a low-resolution imaging unit for imaging at low resolution and a high-resolution imaging unit for imaging at high resolution. The low-resolution imaging unit and the high-resolution imaging unit may be different optical systems or may be the same optical system. When the optical systems are the same, the resolution of the microscope 5511 may be changed according to the imaging target.

ガラス The glass slide containing the observation target is placed on a stage located within the angle of view of the microscope 5511. The microscope 5511 first obtains an entire image within the angle of view using the low-resolution imaging unit, and specifies an area of the observation target from the obtained entire image. Subsequently, the microscope 5511 obtains a high-resolution image of each divided region by dividing the region where the observation target object is present into a plurality of divided regions of a predetermined size, and sequentially capturing each divided region with a high-resolution imaging unit. I do. In switching the target divided region, the stage may be moved, the imaging optical system may be moved, or both of them may be moved. Further, each divided region may overlap with an adjacent divided region in order to prevent occurrence of an imaging omission region due to unintentional sliding of the glass slide. Further, the whole image may include identification information for associating the whole image with the patient. This identification information may be, for example, a character string or a QR code (registered trademark).

高 The high-resolution image acquired by the microscope 5511 is input to the server 5512. The server 5512 divides each high-resolution image into smaller-sized partial images (hereinafter, referred to as tile images). For example, the server 5512 divides one high-resolution image into a total of 100 tile images of 10 × 10 vertically and horizontally. At this time, if adjacent divided areas overlap, the server 5512 may perform a stitching process on the high-resolution images adjacent to each other by using a technique such as template matching. In that case, the server 5512 may generate a tile image by dividing the entire high-resolution image attached by the stitching process. However, the generation of the tile image from the high-resolution image may be performed before the stitching process.

{Circle around (5)} The server 5512 may generate a tile image of a smaller size by further dividing the tile image. Such generation of a tile image may be repeated until a tile image of a size set as the minimum unit is generated.

When the minimum unit tile image is generated in this way, the server 5512 executes a tile synthesis process of generating one tile image by synthesizing a predetermined number of adjacent tile images for all tile images. This tile synthesizing process can be repeated until one tile image is finally generated. Through such processing, a tile image group having a pyramid structure in which each layer is configured by one or more tile images is generated. In this pyramid structure, the tile image of a certain layer and the tile image of a layer different from this layer have the same number of pixels, but have different resolutions. For example, when one tile image of the upper layer is generated by synthesizing a total of four tile images of 2 × 2, the resolution of the tile image of the upper layer is 倍 times the resolution of the tile image of the lower layer used for the synthesis. It has become.

By constructing such a pyramid-structured tile image group, it is possible to switch the level of detail of the observation target displayed on the display device depending on the layer to which the tile image to be displayed belongs. For example, when the lowermost tile image is used, a narrow area of the observation target is displayed in detail, and a wider area of the observation target is coarsely displayed as the upper tile image is used. it can.

タイル The generated pyramid-structured tile image group is stored in a storage unit (not shown) together with identification information (referred to as tile identification information) capable of uniquely identifying each tile image, for example. When receiving a request to acquire a tile image including tile identification information from another device (for example, the display control device 5513 or the deriving device 5540), the server 5512 transmits a tile image corresponding to the tile identification information to another device. I do.

Note that a tile image as a pathological image may be generated for each imaging condition such as a focal length and a staining condition. When a tile image is generated for each imaging condition, along with a specific pathological image, another pathological image corresponding to an imaging condition different from the specific imaging condition, and another pathological image in the same region as the specific pathological image are displayed. They may be displayed side by side. Specific imaging conditions may be specified by the viewer. When a plurality of imaging conditions are specified for the viewer, pathological images of the same area corresponding to each imaging condition may be displayed side by side.

The server 5512 may store the pyramid-structured tile image group in a storage device other than the server 5512, for example, a cloud server. Further, a part or all of the tile image generation processing as described above may be executed by a cloud server or the like.

The display control device 5513 extracts a desired tile image from the pyramid-structured tile image group in response to an input operation from the user, and outputs this to the display device 5514. Through such processing, the user can obtain a feeling as if the user is observing the observation target object while changing the observation magnification. That is, the display control device 5513 functions as a virtual microscope. The virtual observation magnification here actually corresponds to the resolution.

Note that any method may be used as a method for capturing a high-resolution image. Stopping and moving the stage may be repeated to obtain a high-resolution image by capturing the divided area while moving the stage, or by moving the stage at a predetermined speed and capturing a high-resolution image on the strip by capturing the divided area. Is also good. Also, the process of generating a tile image from a high-resolution image is not an indispensable configuration. By changing the resolution of the entire high-resolution image combined by the stitching process in a stepwise manner, an image in which the resolution changes stepwise can be obtained. May be generated. Even in this case, it is possible to gradually present the user from a low-resolution image in a wide area to a high-resolution image in a narrow area.

The medical information system 5530 is a so-called electronic medical record system, and stores information for identifying a patient, information on a patient's disease, examination information and image information used for diagnosis, diagnosis results, and information on diagnosis such as prescription drugs. For example, a pathological image obtained by imaging an observation target of a patient may be temporarily stored via the server 5512, and then displayed on the display device 5514 by the display control device 5513. A pathologist using the pathological system 5510 makes a pathological diagnosis based on the pathological image displayed on the display device 5514. The result of the pathological diagnosis performed by the pathologist is stored in the medical information system 5530.

The derivation device 5540 can execute analysis on a pathological image. For this analysis, a learning model created by machine learning can be used. The derivation device 5540 may derive a classification result of a specific area, a tissue identification result, or the like as the analysis result. Furthermore, the deriving device 5540 may derive identification results of cell information, number, position, luminance information, and the like, and scoring information for them. These pieces of information derived by the derivation device 5540 may be displayed on the display device 5514 of the pathology system 5510 as diagnosis support information.

Note that the deriving device 5540 may be a server system including one or more servers (including a cloud server). The derivation device 5540 may be configured to be incorporated in, for example, the display control device 5513 or the server 5512 in the pathology system 5510. That is, various analyzes on the pathological image may be executed in the pathological system 5510.

技術 The technology according to the present disclosure can be suitably applied to, for example, the microscope 5511 among the configurations described above. Specifically, the technology according to the present disclosure can be applied to the low-resolution imaging unit and / or the high-resolution imaging unit of the microscope 5511. By applying the technology according to the present disclosure to the low-resolution imaging unit, it is possible to specify the region of the observation target in the entire image in the low-resolution imaging unit. In addition, by applying the technology according to the present disclosure to the high-resolution imaging unit, part or all of the generation processing of the tile image and the analysis processing for the pathological image can be performed in the high-resolution imaging unit. Thereby, a part or all of the processing from the acquisition of the pathological image to the analysis of the pathological image can be executed on the fly within the microscope 5511, so that the diagnosis support information can be output more quickly and accurately. For example, partial extraction of a specific tissue, partial output of an image in consideration of personal information, and the like can be executed in the microscope 5511, thereby shortening the imaging time, reducing the data amount, and improving the workflow of a pathologist. It is possible to reduce time and the like.

The configuration described above can be applied not only to the diagnosis support system but also to all biological microscopes such as a confocal microscope, a fluorescence microscope, and a video microscope. Here, the observation target may be a biological sample such as a cultured cell, a fertilized egg, or a sperm, a biological material such as a cell sheet or a three-dimensional cell tissue, or a living body such as a zebrafish or a mouse. Further, the observation target object is not limited to a glass slide, and can be observed in a state stored in a well plate, a petri dish, or the like.

動 Furthermore, a moving image may be generated from a still image of the observation target acquired using a microscope. For example, a moving image may be generated from still images captured continuously for a predetermined period, or an image sequence may be generated from still images captured at predetermined intervals. In this way, by generating a moving image from a still image, it is possible to observe the movement of cancer cells, nerve cells, myocardial tissue, sperm, etc. Dynamic characteristics of an object can be analyzed using machine learning.

The embodiments and the modifications described above can be combined as appropriate within a range that does not contradict processing contents.

効果 In addition, the effects described in the present specification are merely examples and are not limited, and may have other effects.

Note that the present technology can also have the following configurations.
(1)
An imaging unit for acquiring image data;
For the image data or data based on the image data, a processing unit that performs a process of extracting a specific region based on a neural network calculation model,
Image data processed based on the specific area, or an output unit that outputs image data read from the imaging unit based on the specific area,
A solid-state imaging device having:
(2)
The solid-state imaging device according to (1), wherein the processing unit extracts the specific region to be processed from the image data by an arithmetic process using a learned learning model.
(3)
The solid-state imaging device according to (1) or (2), wherein the processing unit performs a masking process, a mosaic process, or an avatar process on the specific region to generate the processed image data. apparatus.
(4)
The solid-state imaging device according to (1), wherein the processing unit extracts partial image data corresponding to the specific region from the image data.
(5)
The solid-state imaging device according to (4), wherein the output unit outputs the partial image data to the outside.
(6)
The output unit outputs the partial image data of the specific area extracted by the processing unit to the outside as the processed image data or the image data read from the imaging unit. ).
(7)
The output unit outputs image data other than the specific area extracted by the processing unit to the outside as the processed image data or the image data read from the imaging unit. ).
(8)
A control unit that controls reading of image data from the imaging unit,
The solid-state imaging device according to any one of (1) to (7), wherein the control unit reads partial image data corresponding to the specific region from the imaging unit.
(9)
A control unit that controls reading of image data from the imaging unit,
The solid-state imaging device according to any one of (1) to (7), wherein the control unit reads image data that does not include the specific region from the imaging unit.
(10)
The solid-state imaging device according to any one of (1) to (9), wherein the specific region is a region including at least one of a person's face, eyes, nose, mouth, window, and nameplate.
(11)
The imaging unit is configured to read out a unit pixel to be read out at the time of reading a captured image, to obtain the image data,
The solid-state imaging device according to any one of (1) to (10), wherein the processing unit extracts the specific region from the thinned image data.
(12)
An imaging unit that acquires image data, a processing unit that performs a process of extracting a specific region based on a neural network calculation model for data based on the image data, and the image that is processed based on the specific region Data, or a solid-state imaging device having an output unit that outputs image data read from the imaging unit based on the specific region,
A control device that executes a process by an application on the processed image data output from the solid-state imaging device or the image data read from the imaging unit.

Reference Signs List 1 imaging device 10 image sensor 11 imaging unit 12 control unit 13 signal processing unit 14 DSP
15 memory 16 selector 20 application processor 30 cloud server

Claims

An imaging unit for acquiring image data;
For the image data or data based on the image data, a processing unit that performs a process of extracting a specific region based on a neural network calculation model,
Image data processed based on the specific area, or an output unit that outputs image data read from the imaging unit based on the specific area,
A solid-state imaging device having:
The solid-state imaging device according to claim 1, wherein the processing unit extracts the specific region to be processed from the image data by an arithmetic process using a learned learning model.
The solid-state imaging device according to claim 1, wherein the processing unit performs a masking process, a mosaic process, or an avatar process on the specific region to generate the processed image data.
The solid-state imaging device according to claim 1, wherein the processing unit extracts partial image data corresponding to the specific region from the image data.
The solid-state imaging device according to claim 4, wherein the output unit outputs the partial image data to the outside.
The solid according to claim 1, wherein the output unit outputs the partial image data of the specific area extracted by the processing unit to the outside as the processed image data or the image data read from the imaging unit. Imaging device.
2. The solid according to claim 1, wherein the output unit outputs image data other than the specific area extracted by the processing unit to the outside as the processed image data or the image data read from the imaging unit. 3. Imaging device.
A control unit that controls reading of image data from the imaging unit,
The solid-state imaging device according to claim 1, wherein the control unit reads partial image data corresponding to the specific area from the imaging unit.
A control unit that controls reading of image data from the imaging unit,
The solid-state imaging device according to claim 1, wherein the control unit reads out image data that does not include the specific region from the imaging unit.
The solid-state imaging device according to claim 1, wherein the specific area is an area including at least one of a person's face, eyes, nose, mouth, window, and nameplate.
The imaging unit is configured to read out a unit pixel to be read out at the time of reading a captured image, to obtain the image data,
The solid-state imaging device according to claim 1, wherein the processing unit extracts the specific region from the thinned image data.
An imaging unit that acquires image data, a processing unit that executes a process of extracting a specific region based on a neural network calculation model for data based on the image data, and the image processed based on the specific region. Data, or a solid-state imaging device having an output unit that outputs image data read from the imaging unit based on the specific region,
A control device that executes processing by an application on the processed image data output from the solid-state imaging device or the image data read from the imaging unit.