WO2023218936A1 - イメージセンサ、情報処理方法、プログラム - Google Patents
イメージセンサ、情報処理方法、プログラム Download PDFInfo
- Publication number
- WO2023218936A1 WO2023218936A1 PCT/JP2023/016162 JP2023016162W WO2023218936A1 WO 2023218936 A1 WO2023218936 A1 WO 2023218936A1 JP 2023016162 W JP2023016162 W JP 2023016162W WO 2023218936 A1 WO2023218936 A1 WO 2023218936A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- processing
- model
- image processing
- inference
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/94—Hardware or software architectures specially adapted for image or video understanding
- G06V10/955—Hardware or software architectures specially adapted for image or video understanding using specific electronic processors
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/10—Image acquisition
- G06V10/12—Details of acquisition arrangements; Constructional details thereof
- G06V10/14—Optical characteristics of the device performing the acquisition or on the illumination arrangements
- G06V10/147—Details of sensors, e.g. sensor lenses
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/59—Context or environment of the image inside of a vehicle, e.g. relating to seat occupancy, driver state or inner lighting conditions
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/90—Arrangement of cameras or camera modules, e.g. multiple cameras in TV studios or sports stadiums
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N25/00—Circuitry of solid-state image sensors [SSIS]; Control thereof
- H04N25/70—SSIS architectures; Circuits associated therewith
- H04N25/79—Arrangements of circuitry being divided between different or multiple substrates, chips or circuit boards, e.g. stacked image sensors
Definitions
- the present technology relates to the technical field of image sensors, information processing methods, and programs that perform inference processing using artificial intelligence models.
- Some image sensors of camera devices are capable of inference processing by deploying an artificial intelligence model (for example, Patent Document 1 below).
- CV Computer Vision
- ISP Image Signal Processor
- This technology was developed in view of these problems, and its purpose is to improve the efficiency of processing related to the input tensor and output tensor of the artificial intelligence model in an image sensor in which the artificial intelligence model is deployed.
- the image sensor includes a pixel array section in which a plurality of pixels are arranged two-dimensionally, a frame memory that stores image data output from the pixel array section, and a frame memory that stores image data output from the frame memory.
- the image processing unit includes an image processing unit that performs image processing on the image data, and an inference processing unit that performs inference processing using an artificial intelligence model using image data subjected to image processing by the image processing unit as an input tensor. This makes it possible to improve the efficiency of processing related to the input tensor and output tensor of the artificial intelligence model in the image sensor in which the artificial intelligence model is deployed.
- FIG. 1 is a diagram illustrating a configuration example of an information processing system.
- FIG. 2 is a diagram for explaining each device that registers and downloads an AI model and an AI application via a marketplace function provided in a cloud-side information processing device.
- FIG. 2 is a diagram illustrating an example of the flow of processing executed by each device when registering or downloading an AI model or an AI application via a marketplace function.
- FIG. 2 is a diagram illustrating an example of the flow of processing executed by each device when deploying an AI application or an AI model.
- FIG. 2 is a diagram for explaining a connection mode between a cloud-side information processing device and an edge-side information processing device.
- FIG. 2 is a functional block diagram of a cloud-side information processing device.
- FIG. 2 is a block diagram showing an example of the internal configuration of a camera.
- FIG. 2 is a diagram showing a configuration example of an image sensor.
- FIG. 3 is a functional block diagram of a CPU included in the image sensor. It is a floor map showing a configuration example of each layer of an image sensor. It is a floor map which shows the other first example of a structure about each layer of an image sensor. It is a floor map which shows the second other example of a structure about each layer of an image sensor. It is a floor map which shows the other third example of a structure about each layer of an image sensor. It is a floor map which shows the modification of the third example of composition about each layer of an image sensor.
- FIG. 3 is a diagram showing a first example of execution timing of each process. It is a figure which shows the second example about the execution timing of each process. It is a figure which shows the third example about the execution timing of each process. It is another example of the functional block diagram of the CPU included in the image sensor.
- 1 is a diagram illustrating a first configuration example of a functional configuration of an image sensor that performs privacy mask processing;
- FIG. 3 is a diagram illustrating a second configuration example of a functional configuration of an image sensor that performs privacy mask processing.
- 7 is a flowchart illustrating processing executed by the image sensor regarding privacy mask processing.
- FIG. 1 is a diagram illustrating a first configuration example of a functional configuration of an image sensor that performs privacy mask processing
- FIG. 3 is a diagram illustrating a second configuration example of a functional configuration of an image sensor that performs privacy mask processing.
- 7 is a flowchart illustrating processing executed by the image sensor regarding privacy mask processing.
- FIG. 1 is a diagram illustrating
- FIG. 3 is a diagram illustrating a second configuration example of a functional configuration of an image sensor that performs privacy mask processing.
- FIG. 3 is a functional block diagram showing a modification of the configuration of the image sensor.
- FIG. 2 is a block diagram showing the software configuration of the camera.
- FIG. 2 is a block diagram showing an operating environment of a container when container technology is used.
- FIG. 2 is a block diagram showing an example of a hardware configuration of an information processing device. It is a figure explaining the flow of processing in other explanations. It is a figure which shows an example of the login screen for logging into a marketplace.
- FIG. 3 is a diagram illustrating an example of a developer screen presented to each developer using the marketplace.
- FIG. 3 is a diagram illustrating an example of a user screen presented to an application user who uses a marketplace.
- FIG. 1 is a block diagram showing a schematic configuration example of an information processing system 100 as an embodiment of the present technology.
- the information processing system 100 includes a cloud server 1, a user terminal 2, a plurality of cameras 3, a fog server 4, and a management server 5.
- the cloud server 1, user terminal 2, fog server 4, and management server 5 are configured to be able to communicate with each other via a network 6, such as the Internet.
- the cloud server 1, user terminal 2, fog server 4, and management server 5 are information processing devices equipped with a microcomputer having a CPU (Central Processing Unit), ROM (Read Only Memory), and RAM (Random Access Memory). It is configured.
- CPU Central Processing Unit
- ROM Read Only Memory
- RAM Random Access Memory
- the user terminal 2 is an information processing device that is assumed to be used by a user who is a recipient of a service using the information processing system 100.
- the management server 5 is an information processing device that is assumed to be used by a service provider.
- Each camera 3 is equipped with an image sensor such as a CCD (Charge Coupled Device) type image sensor or a CMOS (Complementary Metal Oxide Semiconductor) type image sensor, and captures an image of a subject and outputs image data (captured image data) as digital data. obtain.
- the sensors included in the camera 3 include, for example, an RGB sensor that captures an RGB image, a distance measurement sensor that outputs a distance image, and the like.
- each camera 3 also has a function of performing processing (for example, image recognition processing, subject detection processing, etc.) using AI (Artificial Intelligence) on the captured image.
- Each camera 3 is configured to be capable of data communication with the fog server 4, and is configured to transmit various data such as processing result information indicating the results of AI image processing etc. to the fog server 4, and receive various data from the fog server 4. It is said that it is possible to
- Each camera 3 may be used as a variety of surveillance cameras.
- surveillance cameras for indoor areas such as stores, offices, and residences
- surveillance cameras for monitoring outdoor areas such as parking lots and streets (including traffic surveillance cameras, etc.)
- Applications include surveillance cameras on production lines, surveillance cameras that monitor inside and outside of cars, etc.
- a surveillance camera is used in a store
- a plurality of cameras 3 are placed at predetermined positions in the store, and the user can monitor customer demographics (gender, age group, etc.) and behavior (flow line) in the store. It is conceivable to make it possible to confirm the following. In that case, it would be possible to generate the above-mentioned analytical information such as information on the customer demographics of these customers, information on their flow lines in the store, and information on the congestion status at the checkout register (for example, waiting time at the checkout register). It will be done.
- each camera 3 is placed at each position near the road so that the user can recognize information such as the license plate number (vehicle number), car color, and car model of passing vehicles. In that case, it is conceivable to generate information such as the license plate number, car color, car model, etc. as the above-mentioned analysis information.
- the cameras should be placed so that they can monitor each parked vehicle, and monitor whether there are any suspicious persons acting suspiciously around each vehicle.
- the suspicious person it may be possible to notify the user of the presence of the suspicious person and the suspicious person's attributes (gender, age group), etc.
- the fog server 4 is arranged for each monitoring target, for example, in the above-mentioned store monitoring application, the fog server 4 is placed in the monitored store together with each camera 3.
- the cloud server 1 does not need to directly receive the transmitted data from the multiple cameras 3 in the monitored target, which reduces the processing burden on the cloud server 1. It will be done.
- the fog server 4 is not limited to providing one for each monitoring target, but it is also possible to provide one fog server 4 for a plurality of monitoring targets.
- the fog server 4 function can be provided in the information processing system 100.
- the server 4 may be omitted, each camera 3 may be directly connected to the network 6, and the cloud server 1 may directly receive transmission data from a plurality of cameras 3.
- the cloud-side information processing device includes the cloud server 1 and the management server 5, and is a group of devices that provide services that are expected to be used by a plurality of users.
- the camera 3 and the fog server 4 correspond to the edge-side information processing device, and can be considered as a group of devices placed in an environment prepared by a user who uses a cloud service.
- both the cloud-side information processing device and the edge-side information processing device may be in an environment prepared by the same user.
- fog server 4 may be an on-premises server.
- the camera 3 which is an information processing device on the edge side, performs AI image processing
- the cloud server 1 which is an information processing device on the cloud side, processes the results of the AI image processing on the edge side. It is intended to realize advanced application functions using information (for example, result information of image recognition processing using AI).
- various methods can be considered for registering application functions in the cloud server 1 (or including the fog server 4), which is an information processing device on the cloud side.
- An example thereof will be explained with reference to FIG. 2.
- the fog server 4 is not shown in FIG. 2, the configuration may include the fog server 4. In this case, the fog server 4 may take on part of the functions on the edge side.
- the cloud server 1 and management server 5 described above are information processing devices that constitute a cloud-side environment. Further, the camera 3 is an information processing device that constitutes an environment on the edge side.
- the image sensor IS can be regarded as an information processing device that constitutes the edge-side environment. That is, it may be considered that the image sensor IS, which is another edge-side information processing device, is mounted inside the camera 3, which is the edge-side information processing device.
- the user terminals 2 used by users who use various services provided by the information processing device on the cloud side include an application developer terminal 2A used by a user who develops an application used for AI image processing, and an application developer terminal 2A used by a user who develops an application used for AI image processing.
- an application user terminal 2B used by a user and an AI model developer terminal 2C used by a user who develops an AI model used for AI image processing.
- the application developer terminal 2A may be used by a user who develops an application that does not use AI image processing.
- the information processing device on the cloud side is equipped with a training dataset for performing AI learning and an AI model that serves as the basis for development.
- a user who develops an AI model communicates with the information processing device on the cloud side using the AI model developer terminal 2C, and downloads these learning datasets and AI models.
- the training data set may be provided for a fee.
- an AI model developer registers personal information in a marketplace (electronic market) prepared as a function on the cloud side, making it possible to purchase various functions and materials registered in the marketplace. You may also purchase a training dataset.
- the AI model developer After developing an AI model using the learning dataset, the AI model developer registers the developed AI model in the marketplace using the AI model developer terminal 2C. Thereby, an incentive may be paid to the AI model developer when the AI model is downloaded.
- a user who develops an application downloads an AI model from the marketplace using the application developer terminal 2A, and develops an application (hereinafter referred to as "AI application") using the AI model.
- AI application an application
- an incentive may be paid to the AI model developer.
- the application development user registers the developed AI application in the marketplace using the application developer terminal 2A. Thereby, an incentive may be paid to the user who developed the AI application when the AI application is downloaded.
- a user who uses an AI application uses the application user terminal 2B to perform operations to deploy an AI application and an AI model from the marketplace to the camera 3, which is an edge-side information processing device that the user manages. conduct. At this time, an incentive may be paid to the AI model developer.
- This makes it possible for the camera 3 to perform AI image processing using an AI application and AI model, making it possible not only to capture images but also to detect customers and vehicles through AI image processing. .
- the deployment of the AI application and AI model means that at least a part of the program as the AI application can be deployed so that the target (device) as the execution entity can use the AI application and AI model.
- This refers to the installation of an AI application or AI model on a target as an execution entity so that it can be executed.
- the camera 3 may be configured to be able to extract the attribute information of the customer from the captured image captured by the camera 3 through AI image processing. These attribute information are transmitted from the camera 3 via the network 6 to the information processing device on the cloud side.
- a cloud application is deployed on the information processing device on the cloud side, and each user can use the cloud application via the network 6.
- the cloud applications there are applications that analyze the flow of customers visiting the store using their attribute information and captured images.
- Such cloud applications are uploaded by application development users and the like.
- a cloud application for flow line analysis using the application user terminal 2B
- application users can analyze the flow line of customers visiting their store and view the analysis results. has been done. Browsing of the analysis results may be performed by graphically presenting the flow lines of customers visiting the store on a map of the store.
- the results of the flow line analysis may be displayed in the form of a heat map, and the analysis results may be viewed by presenting the density of customers visiting the store. Further, the information may be displayed in categories according to attribute information of customers visiting the store.
- AI models optimized for each user may be registered. For example, captured images captured by a camera 3 placed in a store managed by a certain user are appropriately uploaded to an information processing device on the cloud side and stored therein.
- the AI model is re-learned, and the AI model is updated and re-registered in the marketplace.
- the AI model relearning process may be made available to the user as an option on the marketplace, for example.
- the recognition rate of image processing for images taken in a dark place, etc. can be improved.
- the recognition rate of image processing for images captured in bright places can be improved.
- the application user can always obtain optimized processing result information by deploying the updated AI model to the camera 3 again. Note that the AI model relearning process will be described later.
- AI models optimized for each camera may be registered.
- an AI model that is applied to the camera 3 that can acquire RGB images an AI model that is applied to the camera 3 that is equipped with a distance measurement sensor that generates a distance image, etc. can be considered.
- the AI model that should be used by camera 3 during bright hours is an AI model trained using vehicles and images captured in a bright environment
- the AI model that should be used by camera 3 during dark hours is an AI model that is trained using images taken in a dark environment.
- AI models trained using the images may be registered in respective marketplaces. It is desirable that these AI models be updated as appropriate to AI models with improved recognition rates through relearning processing.
- data with privacy-related information deleted from the perspective of privacy protection is uploaded.
- data with privacy-related information deleted may be made available to AI model development users and application development users.
- FIGS. 3 and 4 The flow of the above-described processing is shown in flowcharts in FIGS. 3 and 4. Note that the cloud-side information processing device corresponds to the cloud server 1, management server 5, etc. in FIG.
- the AI model developer uses the AI model developer terminal 2C, which has a display unit such as an LCD (Liquid Crystal Display) or an organic EL (Electro Luminescence) panel, to browse the list of datasets registered in the marketplace and display the desired data set.
- the AI model developer terminal 2C transmits a download request for the selected data set to the cloud-side information processing device in step S21.
- the cloud-side information processing device receives the request in step S1, and performs a process of transmitting the requested data set to the AI model developer terminal 2C in step S2.
- the AI model developer terminal 2C performs a process of receiving the data set in step S22. This allows the AI model developer to develop an AI model using the dataset.
- the AI model developer After the AI model developer finishes developing the AI model, the AI model developer performs operations to register the developed AI model in the marketplace (for example, the name of the AI model and the location where the AI model is placed).
- the AI model developer terminal 2C sends a request to register the AI model to the marketplace to the cloud-side information processing device.
- the cloud-side information processing device receives the registration request in step S3, and performs a registration process for the AI model in step S4, thereby making it possible to display the AI model on the marketplace, for example. .
- This allows users other than AI model developers to download AI models from the marketplace.
- an application developer who wants to develop an AI application uses the application developer terminal 2A to view a list of AI models registered in the marketplace.
- the application developer terminal 2A in response to an operation by the application developer (for example, an operation to select one of the AI models on the marketplace), the application developer terminal 2A sends a download request for the selected AI model to the cloud side information. Send to processing device.
- the cloud-side information processing device receives the request in step S5, and transmits the AI model to the application developer terminal 2A in step S6.
- the application developer terminal 2A receives the AI model in step S32. This allows application developers to develop AI applications that use AI models developed by others.
- step S33 the application developer terminal 2A transmits an AI application registration request to the cloud-side information processing device.
- the cloud-side information processing device can display the AI application on the marketplace, for example, by accepting the registration request in step S7 and registering the AI application in step S8. This allows users other than application developers to select and download AI applications on the marketplace.
- FIG. 4 shows an example in which a user other than an application developer selects and downloads an AI application on the marketplace.
- the application user terminal 2B selects a purpose in step S41.
- purpose selection the selected purpose is transmitted to the cloud-side information processing device.
- the cloud-side information processing device selects an AI application according to the purpose in step S9, and selects an AI model in step S10.
- table data in which AI applications and AI models are associated with each other according to the purpose is stored in the cloud-side information processing device, and it is possible to select the AI application and AI model according to the purpose.
- step S11 the cloud-side information processing device performs processing to deploy the selected AI application and AI model.
- the AI application and AI model are transmitted to the camera 3.
- step S51 an AI application and AI model deployment process is performed in step S51. This makes it possible to perform AI image processing on the captured image captured by the camera 3.
- the camera 3 acquires an image by performing an imaging operation in step S52. Then, in step S53, the camera 3 performs AI image processing on the acquired image to obtain, for example, an image recognition result.
- step S54 the camera 3 performs a process of transmitting the captured image and the result information of the AI image processing.
- both the captured image and the AI image processing result information may be transmitted, or only one of them may be transmitted.
- the cloud-side information processing device that has received this information performs analysis processing in step S12. Through this analysis process, for example, flow line analysis of customers visiting the store, vehicle analysis process for traffic monitoring, etc. are performed.
- step S13 the cloud-side information processing device performs analysis result presentation processing. This processing is realized, for example, by the user using the cloud application described above.
- the application user terminal 2B Upon receiving the analysis result presentation process, the application user terminal 2B performs a process of displaying the analysis result on a monitor or the like in step S42.
- the user of the AI application can obtain an analysis result according to the purpose selected in step S41.
- the AI model may be updated to optimize the image captured by the camera 3 managed by the application user. For example, as the camera 3 repeatedly executes each process of steps S52, S53, and S54, the captured images received from the camera 3 and the result information of AI image processing are accumulated in the cloud-side information processing device.
- the cloud-side information processing device performs an AI model update process in step S14.
- This process is a process of relearning the AI model by giving new data to the AI model.
- step S15 the cloud-side information processing device performs processing to deploy the new updated AI model.
- the camera 3 executes a process of developing a new AI model in step S55.
- the updated AI application may be further expanded in the process of step S55.
- a service using the information processing system 100 is assumed in which a user as a customer can select the type of function regarding AI image processing of each camera 3.
- the selection of the function type can also be referred to as the above-mentioned purpose setting.
- an image recognition function and an image detection function may be selected, or a more detailed type may be selected so as to exhibit an image recognition function or an image detection function for a specific subject.
- a service provider sells a camera 3 and a fog server 4 that have an image recognition function using AI to a user, and has the user install the camera 3 and fog server 4 at a location to be monitored.
- the company will also develop a service that provides users with the above-mentioned analytical information.
- the camera 3 uses AI image processing to obtain analysis information that corresponds to the customer's desired use. It is possible to selectively set functions.
- AI image processing functions are activated to detect customers and identify their attributes in order to function as store surveillance cameras, and in the event of a disaster, they Switch to AI image processing function to understand which products remain on the shelves. At the time of this switching, it is conceivable to change the AI model so that appropriate recognition results can be obtained.
- the management server 5 has a function of selectively setting the AI image processing function of the camera 3.
- management server 5 may be provided in the cloud server 1 or the fog server 4.
- the information processing device on the cloud side is equipped with a relearning function, a device management function, and a marketplace function, which are functions that can be used via the Hub.
- the Hub performs secure and highly reliable communication with the edge-side information processing device. Thereby, various functions can be provided to the edge-side information processing device.
- the relearning function is a function that performs relearning and provides a newly optimized AI model, thereby providing an appropriate AI model based on new learning materials.
- the device management function is a function to manage the camera 3 as an edge-side information processing device, and provides functions such as management and monitoring of the AI model deployed in the camera 3, and problem detection and troubleshooting. be able to.
- the device management function is also a function to manage information about the camera 3 and fog server 4.
- Information on the camera 3 and fog server 4 includes information on the chip used as the arithmetic processing unit, memory capacity, storage capacity, CPU and memory usage rate, and information on the information installed in each device. This information includes software information such as the OS (Operating System).
- device management functionality protects secure access by authenticated users.
- the marketplace function is a function to register AI models developed by the above-mentioned AI model developers and AI applications developed by application developers, and a function to deploy these developed products to authorized edge-side information processing devices. etc.
- the marketplace function also provides functions related to payment of incentives according to the deployment of developed products.
- the camera 3 as an edge-side information processing device is equipped with an edge runtime, an AI application, an AI model, and an image sensor IS.
- the edge runtime functions as embedded software for managing applications deployed on the camera 3 and communicating with the cloud-side information processing device.
- the AI model is a development of the AI model registered in the marketplace in the cloud-side information processing device, and the camera 3 uses the captured image to generate information on the results of AI image processing according to the purpose. can be obtained.
- the cloud-side information processing device collectively refers to devices such as the cloud server 1 and the management server 5.
- the cloud-side information processing device has a license authorization function F1, an account service function F2, a device monitoring function F3, a marketplace function F4, and a camera service function F5.
- the license authorization function F1 is a function that performs various types of authentication-related processing. Specifically, in the license authorization function F1, processing related to device authentication of each camera 3 and processing related to authentication of each of the AI models, software, and firmware used in the cameras 3 are performed.
- the above-mentioned software means software necessary for appropriately realizing AI image processing in the camera 3.
- AI image processing based on captured images is performed appropriately and that the results of AI image processing are sent to the fog server 4 and cloud server 1 in an appropriate format, it is necessary to control the input data to the AI model. It is also required to appropriately process the output data of AI models.
- the above software includes peripheral processing necessary to appropriately realize AI image processing.
- Such software is software for realizing a desired function using an AI model, and corresponds to the above-mentioned AI application.
- the AI application is not limited to one that uses only one AI model, but may also be one that uses two or more AI models.
- image data as recognition result information image data, etc., hereinafter referred to as "recognition result information" obtained by an AI model that executes AI image processing using a captured image as an input tensor is used as an input tensor.
- recognition result information image data, etc., hereinafter referred to as "recognition result information”
- it is an AI application that performs predetermined image processing as second AI image processing on the input tensor for the first AI image processing using coordinate information as recognition result information of the first AI image processing. It's okay.
- the input tensor for each AI image process may be a RAW image, or may be an RGB image obtained by performing synchronization processing on the RAW image. The same applies to the following explanation.
- the license authorization function F1 for authentication of the camera 3, a process is performed in which a device ID (Identification) is issued for each camera 3 when the camera 3 is connected via the network 6. Furthermore, regarding the authentication of AI models and software, unique IDs (AI model IDs, software IDs) are issued for each AI model and AI application that has been applied for registration from the AI model developer terminal 2C or the software developer terminal 7. Processing takes place.
- the license authorization function F1 also provides various keys, certificates, etc. for secure communication between the camera 3, the AI model developer terminal 2C, the software developer terminal 7, and the cloud server 1. Processing for issuing the certificate to the manufacturer of the camera 3 (particularly the manufacturer of the image sensor IS to be described later), AI model developer, and software developer is performed, as well as processing for updating and suspending certification validity.
- the license authorization function F1 when user registration (registration of account information accompanied by issuance of a user ID) is performed by the account service function F2 described below, the camera 3 (device ID above) purchased by the user and A process of linking the information with the user ID is also performed.
- the account service function F2 is a function that generates and manages user account information.
- the account service function F2 receives input of user information and generates account information based on the input user information (generates account information including at least user ID and password information).
- the account service function F2 also performs registration processing (registration of account information) for AI model developers and AI application developers (hereinafter sometimes abbreviated as "software developers").
- the device monitoring function F3 is a function that performs processing for monitoring the usage status of the camera 3. For example, various factors related to the usage status of the camera 3 include the location where the camera 3 is used, the output frequency of output data of AI image processing, the free space of the CPU and memory used for AI image processing, etc. Monitor information such as usage rate.
- the marketplace function F4 is a function for selling AI models and AI applications. For example, a user can purchase an AI application and an AI model used by the AI application via a sales website (sales site) provided by the marketplace function F4. Additionally, software developers are allowed to purchase AI models for creating AI applications via the sales site mentioned above.
- the camera service function F5 is a function for providing services related to the use of the camera 3 to the user.
- This camera service function F5 is the function related to the generation of analysis information described above. That is, it is a function that generates analysis information of a subject based on processing result information of image processing in the camera 3 and performs processing for allowing the user to view it via the user terminal 2.
- the camera service function F5 includes an imaging setting search function.
- this imaging setting search function is a function of acquiring recognition result information of AI image processing from the camera 3 and searching for imaging setting information of the camera 3 using AI based on the acquired recognition result information.
- the imaging setting information broadly refers to setting information related to an imaging operation for obtaining a captured image.
- optical settings such as focus and aperture, settings related to the readout operation of captured image signals such as frame rate, exposure time, gain, etc., as well as gamma correction processing, noise reduction processing, super resolution processing, etc.
- the imaging settings of the camera 3 are optimized according to the purpose set by the user, and good inference results can be obtained.
- the camera service function F5 also includes an AI model search function.
- This AI model search function acquires recognition result information of AI image processing from camera 3, and uses AI to search for the optimal AI model to be used for AI image processing in camera 3 based on the acquired recognition result information. It is a function.
- the search for an AI model here refers to, for example, when AI image processing is realized using a CNN (Convolutional Neural Network) that includes convolution operations, various processing parameters such as weighting coefficients and setting information related to the neural network structure are used. (including, for example, kernel size information) etc.
- the camera service function F5 may include a function of determining processing assignment.
- processing allocation determination function when deploying an AI application to an edge-side information processing device, processing is performed to determine the deployment destination device for each SW component. Note that some SW components may be determined to be executed on the cloud-side device, and in this case, it is not necessary to perform the deployment process because they have already been deployed on the cloud-side device.
- imaging settings search function and AI model search function it is possible to perform imaging settings that will yield good results from AI image processing, and to select an appropriate AI model according to the actual usage environment.
- AI image processing can be performed using
- processing assignment determination function in addition to this, it is possible to ensure that AI image processing and its analysis processing are executed in an appropriate device.
- the camera service function F5 has an application setting function prior to deploying each SW component.
- the application setting function is a function to set an appropriate AI application according to the user's purpose.
- an appropriate AI application is selected depending on the purpose selected by the user.
- the SW components constituting the AI application are automatically determined.
- the combination of SW components may be different depending on whether the user's request is focused on privacy or speed.
- the user terminal 2 (corresponding to the application user terminal 2B in FIG. 2) accepts the user's operation to select the purpose (application), and selects an appropriate AI application according to the selected application. Processing etc. are performed.
- the cloud server 1 alone realizes the license authorization function F1, the account service function F2, the device monitoring function F3, the marketplace function F4, and the camera service function F5. It is also possible to adopt a configuration in which the information processing apparatuses share the burden of implementation. For example, it is conceivable that one information processing device performs each of the above functions. Alternatively, it is also possible that a single function among the above functions is shared and performed by a plurality of information processing apparatuses (for example, the cloud server 1 and the management server 5).
- an AI model developer terminal 2C is an information processing device used by an AI model developer.
- the software developer terminal 7 is an information processing device used by an AI application developer.
- FIG. 7 is a block diagram showing an example of the internal configuration of the camera 3.
- the camera 3 includes an imaging optical system 31, an optical system drive section 32, an image sensor IS, a control section 33, a memory section 34, and a communication section 35.
- the image sensor IS, the control section 33, the memory section 34, and the communication section 35 are connected via a bus 36, and are capable of mutual data communication.
- the imaging optical system 31 includes lenses such as a cover lens, zoom lens, and focus lens, and an aperture (iris) mechanism. This imaging optical system 31 guides light (incident light) from the subject and focuses it on the light receiving surface of the image sensor IS.
- the optical system drive unit 32 comprehensively represents the zoom lens, focus lens, and aperture mechanism drive units included in the imaging optical system 31.
- the optical system drive unit 32 includes actuators for driving each of the zoom lens, focus lens, and aperture mechanism, and a drive circuit for the actuators.
- the control unit 33 is configured with a microcomputer having, for example, a CPU, a ROM, and a RAM, and the CPU executes various processes according to programs stored in the ROM or programs loaded in the RAM, thereby controlling the camera. Performs overall control of step 3.
- a microcomputer having, for example, a CPU, a ROM, and a RAM, and the CPU executes various processes according to programs stored in the ROM or programs loaded in the RAM, thereby controlling the camera. Performs overall control of step 3.
- control unit 33 instructs the optical system drive unit 32 to drive the zoom lens, focus lens, aperture mechanism, etc.
- the optical system drive unit 32 moves the focus lens and zoom lens, opens and closes the aperture blades of the aperture mechanism, etc. in response to these drive instructions.
- the control unit 33 controls writing and reading of various data to and from the memory unit 34 .
- the memory unit 34 is a nonvolatile storage device such as an HDD (Hard Disk Drive) or a flash memory device, and is used as a storage destination (recording destination) for image data output from the image sensor IS.
- control unit 33 performs various data communications with external devices via the communication unit 35.
- the communication unit 35 in this example is configured to be able to perform data communication with at least the fog server 4 (or cloud server 1) shown in FIG.
- the image sensor IS is configured as, for example, a CCD type image sensor, a CMOS type image sensor, or the like.
- the image sensor IS includes an imaging section 41, an image signal processing section 42, an internal sensor control section 43, an AI image processing section 44, a memory section 45, and a communication I/F 46, each of which communicates data with each other via a bus 47. It is considered possible.
- the imaging section 41 includes a pixel array section in which pixels having photoelectric conversion elements such as photodiodes are arranged two-dimensionally, and a readout circuit that reads out electrical signals obtained by photoelectric conversion from each pixel of the pixel array section. It is possible to output the electrical signal as a captured image signal.
- the readout circuit performs, for example, CDS (Correlated Double Sampling) processing, AGC (Automatic Gain Control) processing, etc. on the electrical signal obtained by photoelectric conversion, and further performs A/D (Analog/Digital) conversion processing.
- CDS Correlated Double Sampling
- AGC Automatic Gain Control
- the image signal processing unit 42 performs preprocessing, synchronization processing, YC generation processing, resolution conversion processing, codec processing, etc. on the captured image signal as digital data after A/D conversion processing.
- preprocessing clamping processing for clamping the R, G, and B black levels to predetermined levels and correction processing between the R, G, and B color channels are performed on the captured image signal.
- simultaneous processing color separation processing is performed so that the image data for each pixel includes all R, G, and B color components. For example, in the case of an image sensor using a Bayer array color filter, demosaic processing is performed as color separation processing.
- a luminance (Y) signal and a color (C) signal are generated (separated) from R, G, and B image data.
- the resolution conversion process the resolution conversion process is performed on image data that has been subjected to various types of signal processing.
- the codec processing the image data that has been subjected to the various processes described above is subjected to encoding processing for recording or communication, and file generation, for example.
- video file formats such as MPEG-2 (MPEG: Moving Picture Experts Group) and H. It is possible to generate files in formats such as H.264.
- the image signal processing unit 42 calculates information about the subject based on two signals output from the image sensor IS as iToF (indirect time of flight), for example. The distance information is calculated and a distance image is output.
- iToF indirect time of flight
- the in-sensor control unit 43 instructs the imaging unit 41 to control the execution of the imaging operation.
- the image signal processing unit 42 also controls the execution of processing.
- the AI image processing unit 44 performs image recognition processing as AI image processing on the captured image.
- the AI image processing unit 44 is realized by a DSP (Digital Signal Processor).
- DSP Digital Signal Processor
- the image recognition functions that can be realized by the AI image processing unit 44 can be switched by changing the AI image processing algorithm. In other words, by switching the AI model used for AI image processing, the functional type of AI image processing can be switched.
- Various types of functions for AI image processing can be considered, and examples include the following types. ⁇ Class identification ⁇ Semantic segmentation ⁇ Person detection ⁇ Vehicle detection ⁇ Target tracking ⁇ OCR (Optical Character Recognition)
- class identification is a function that identifies the target class.
- the "class” here refers to information representing the category of an object, such as "person,”"car,””plane,””ship,””truck,””bird,””cat,””dog,””deer,””frog,”””Horse” etc.
- Target tracking is a function of tracking a subject that is a target, and can be translated as a function of obtaining historical information on the position of the subject.
- the switching of the AI model may be performed by an instruction from the cloud-side information processing device, or may be performed based on determination processing by the control unit 33 of the camera 3 or the in-sensor control unit 43. Furthermore, when switching the AI model, the AI model may be switched from a plurality of AI models stored in the memory unit 45, or may be switched by receiving and deploying the AI model from the cloud-side information processing device. Good too. By receiving an AI model from the cloud-side information processing device each time switching occurs, the capacity of the memory unit 45 can be suppressed, making it possible to achieve downsizing, power saving, and cost reduction.
- the memory unit 45 can be used as a so-called frame memory in which captured image data (RAW image data) obtained by the image signal processing unit 42 and image data after synchronization processing are stored.
- the memory unit 45 can also be used to temporarily store data used by the AI image processing unit 44 in the process of AI image processing.
- the memory unit 45 also stores information on AI applications and AI models used by the AI image processing unit 44.
- the information on the AI application and the AI model may be deployed in the memory unit 45 as a container using container technology, which will be described later, or may be deployed using microservice technology.
- By expanding the AI model used for AI image processing into the memory unit 45 it is possible to change the function type of AI image processing, or to change to an AI model whose performance has been improved through relearning.
- the explanation is based on examples of AI models and AI applications used for image recognition, but the invention is not limited to this, and programs executed using AI technology can also be used. etc. may be targeted.
- the capacity of the memory unit 45 is small, the information of the AI application or AI model is expanded into a memory outside the image sensor IS such as the memory unit 34 as a container using container technology, and then only the AI model is stored.
- the data may be stored in the memory unit 45 in the image sensor IS via the communication I/F 46 described below.
- the communication I/F 46 is an interface for communicating with the control unit 33, memory unit 34, etc. located outside the image sensor IS.
- the communication I/F 46 performs communication to acquire programs executed by the image signal processing unit 42 and AI applications and AI models used by the AI image processing unit 44 from the outside, and stores them in the memory unit 45 included in the image sensor IS. Make me remember. Thereby, the AI model is stored in a part of the memory section 45 included in the image sensor IS, and can be used by the AI image processing section 44.
- the AI image processing unit 44 performs predetermined image recognition processing using the AI application and AI model obtained in this way to recognize the subject according to the purpose.
- the recognition result information of the AI image processing is output to the outside of the image sensor IS via the communication I/F 46.
- the communication I/F 46 of the image sensor IS outputs not only the image data output from the image signal processing section 42 but also the recognition result information of the AI image processing. Note that it is also possible to output only either the image data or the recognition result information from the communication I/F 46 of the image sensor IS.
- captured image data used for the relearning function is uploaded from the image sensor IS to the cloud-side information processing device via the communication I/F 46 and the communication unit 35. Ru.
- recognition result information of AI image processing is output from the image sensor IS to another information processing device outside the camera 3 via the communication I/F 46 and the communication unit 35. .
- Image sensor configuration Various configurations of the image sensor IS described above are possible.
- the image sensor IS has a structure in which three layers are stacked.
- the image sensor IS is configured as a one-chip semiconductor device in which three layers of dies each serving as a semiconductor substrate are stacked.
- the image sensor IS includes a die D1 forming a first layer of a semiconductor substrate, a die D2 forming a second layer, and a die D3 forming a third layer.
- Each layer is electrically connected, for example, by a Cu--Cu bond.
- the image sensor IS includes an imaging section 41, an image signal processing section 42, an in-sensor control section 43, an AI image processing section 44, a memory section 45, and a communication I/F 46, which are classified by function as shown in FIG. However, each function may be completed in one layer by mounting electronic components on one layer, while others may have electronic components mounted over multiple layers.
- the imaging section 41 includes a pixel array section 41a provided on the die D1 and an analog circuit section 41b provided on the die D2 (see FIG. 8).
- the analog circuit section 41b includes a transistor as a readout circuit, a vertical drive circuit, a comparator, a circuit that performs CDS processing, AGC processing, etc., an A/D conversion section, and the like.
- the image signal processing section 42 includes a logic circuit section 42a provided on the die D2 and an ISP (Image Signal Processor) 42b provided on the die D3.
- ISP Image Signal Processor
- the logic circuit section 42a includes a circuit that performs processing to detect and correct defective pixels in the captured image signal as digital data generated by the A/D conversion section.
- the ISP 42b performs synchronization processing, YC generation processing, resolution conversion processing, codec processing, noise removal processing, etc. Note that some of the processing may be executed by the in-sensor control unit 43.
- the in-sensor control unit 43 is composed of a CPU 43a provided in the die D3, and functions as a control function F11, an authentication function F12, and an encryption function F13 shown in FIG. 9 by executing a predetermined program. Each function will be explained later.
- the AI image processing unit 44 is provided in the die D3 and functions as an inference processing unit.
- CV Computer Vision
- edge enhancement processing scaling processing
- affine transformation processing is performed.
- This can be performed by the CPU 43a or the like.
- the processing time can be reduced compared to when the CV processing is performed by the ISP 42b.
- these CV processes are, for example, processes for generating input images to an AI model. That is, these CV processes are processes for generating image data of a predetermined size defined as an input tensor of an AI model and suitable for AI image processing.
- CV processing does not need to be a process of generating an input image to an AI model, as long as it executes processing using a plurality of lines for each processing unit. For example, it may be a process of drawing (emphasizing) a bounding box in an area where a person is detected by AI image processing.
- the memory section 45 includes a second layer storage section 45a provided on the die D2 and a third layer storage section 45b provided on the die D3.
- the second layer storage unit 45a functions as a frame memory in which image data and RAW image data after being subjected to synchronization processing by the ISP 42b are stored. Note that even if the frame memory is provided not in the second layer but in the third layer or outside the image sensor IS, it is possible to obtain the effects described above or below.
- the third layer storage unit 45b functions as a working memory in which the process and results of AI image processing by the AI image processing unit 44 are stored. Further, the third layer storage unit 45b stores weighting coefficients, parameters, etc. for the AI model, and functions as a storage unit where the AI model is expanded.
- the third layer storage unit 45b and the AI image processing unit 44 in the same layer, it is possible to improve the transfer speed and readout speed of various intermediate data generated in the process of inference processing using an AI model. , it is possible to shorten the time required for inference processing.
- part of the data stored in the third layer storage section 45b can be stored in the second layer storage section 45a, and the third layer storage section 45b capacity can be reduced.
- the size of the third layer storage section 45b can be reduced, and the chip size of the semiconductor substrate forming the third layer can be reduced, and additional functions can be added to the third layer to improve the functionality of the image sensor IS. becomes possible.
- the second layer storage section 45a as a frame memory in the second layer, it is suitable when it is desired to perform a plurality of different processes on a frame image.
- a second layer storage section 45a as a frame memory in the second layer
- pixel values of part of the frame image stored in the frame memory can be rewritten for processing such as mask processing and addition of bounding boxes. This can be achieved by These processes can be realized by the CPU 43a, the memory controller, or their cooperation.
- the second layer storage section 45a and the third layer storage section 45b may include not only RAM but also ROM.
- the communication I/F 46 is provided in the die D2.
- the die D1 which is the first layer provided with the pixel array section 41a, as the outermost layer, light is easily incident on the pixel array section 41a, and the conversion efficiency of photoelectric conversion processing is improved.
- the pixel array section 41a is arranged in a second layer provided with an analog circuit section 41b that functions as a conversion processing section that performs A/D conversion on pixel signals read out from each pixel included in the pixel array section 41a.
- an analog circuit section 41b that functions as a conversion processing section that performs A/D conversion on pixel signals read out from each pixel included in the pixel array section 41a.
- the AI image processing section 44 It is possible to suppress the influence of electromagnetic noise generated during execution of the process on the charges accumulated in the pixel array section 41a.
- the analog circuit section 41b driven at a high voltage is not provided in the third layer, it is possible to employ cutting-edge semiconductor manufacturing processes in manufacturing the die D3 as a semiconductor substrate forming the third layer. It is possible to miniaturize the element.
- an image sensor IS has been known that has a two-layer structure, which includes a first layer in which the pixel array section 41a is mounted, and a second layer in which all other parts are mounted.
- the area of the second layer increases, causing the problem that the first layer becomes larger to match the size of the second layer. was there. In this case, an excess area was created on the first layer where no components were mounted, and it was difficult to say that it was appropriate in terms of board utilization efficiency.
- the overall size of the image sensor IS can be matched to the size of the pixel array section 41a, and the overall size of the image sensor IS can be reduced.
- control function F11 shown in FIG. 9 issues instructions to the imaging section 41 and the image signal processing section 42, and controls the imaging operation so that desired captured image data is obtained. Furthermore, the control function F11 instructs the AI image processing unit 44 to implement AI image processing using the AI model.
- the authentication function F12 sends a request to have the cloud-side information processing device authenticate that the image sensor IS has been registered using the certificate held in the image sensor IS, and the cloud-side information processing device Establish communication with the device.
- the encryption function F13 decrypts the developed AI model using the decryption key when the AI model is developed from outside the image sensor IS. Further, the encryption function F13 performs a process of encrypting image data output from the image sensor IS using an encryption key.
- the certificate handled by the authentication function F12 and the decryption key and encryption key handled by the encryption function F13 are stored in the ROM and RAM of the second layer storage section 45a and the third layer storage section 45b.
- FIG. 10 shows an example of the arrangement of the parts arranged on the dies D1, D2, and D3 forming each layer of the image sensor IS.
- a pixel array section 41a is formed over substantially the entire surface of the die D1 forming the first layer.
- the die D2 forming the second layer is provided with an analog circuit section 41b, a logic circuit section 42a, a second layer storage section 45a, and a communication I/F 46.
- the die D3 forming the third layer is provided with an ISP 42b, a CPU 43a, an AI image processing section 44, and a third layer storage section 45b.
- the third layer storage section 45b adjacent to the ISP 42b, CPU 43a, and AI image processing section 44, it is possible to speed up the processing in each section.
- the AI image processing executed by the AI image processing section 44 may handle a large amount of intermediate data, and so providing the third layer storage section 45b adjacently has a great effect.
- the chip size of each layer is the same.
- Image sensor IS in which the chip size of each layer is unified, can be manufactured using the so-called WoW (Wafer on Wafer) method, in which each layer is stacked on top of each other in a disk-shaped silicon wafer and then diced. The process can be completed in one go. Furthermore, since silicon wafers, which are large members, are stacked on top of each other, each chip can be easily positioned. Thereby, the difficulty level of the manufacturing process can be lowered and the process can be made smoother.
- WoW Wafer on Wafer
- each layer is stacked in the form of a wafer and cut out by one dicing process, it can be considered that the chip size of each layer is the same.
- the in-sensor control unit 43 (CPU 43a) is not provided in the third layer.
- the in-sensor control unit 43 (CPU 43a) is not provided in the third layer.
- mask processing that fills in people in images to protect privacy, processing that adds bounding boxes to indicate the type of subject detected by AI image processing, etc. are performed as frame memory. This can be achieved by directly manipulating the pixel values of the frame images stored in the second layer storage section 45a, but since this process can be achieved by a memory controller or the like, the CPU 43a does not need to be provided.
- FIG. 1 Another second configuration example is shown in FIG.
- the chip size of the die D3 constituting the third layer is smaller than the chip size of the dies D1 and D2 constituting the first and second layers.
- the length of the short side of the chip shape is shortened.
- the number of dies D3 formed on one wafer is increased, and the cost of chips can be reduced.
- the chips of the third layer are stacked on the die D2 forming the second layer after dicing. Therefore, it is possible to stack only the dies D3 that are found to be non-defective by inspection, so that the yield of the image sensor IS can be improved.
- the third layer includes two dies D3a and D3b.
- the die D3a is provided with an ISP 42b and an AI image processing section 44
- the die D3b is provided with a third layer storage section 45b.
- the ISP 42b provided on the die D3b and the DSP as the AI image processing section 44 are manufactured using an advanced process of several nanometers, and the third layer storage section 45b provided on the die D3b is manufactured using a highly integrated DRAM (Dynamic Random Access Memory) can be manufactured using different manufacturing processes.
- the third layer storage section 45b which is a DRAM, highly integrated, it is possible to increase the storage capacity of the third layer storage section 45b or to reduce the size of the third layer storage section 45b.
- the third layer storage section 45b is downsized, it becomes possible to arrange chips for realizing other functions in the reduced space, and it is possible to improve the functionality of the image sensor IS. .
- the two dies D3a and D3b arranged in the third layer are arranged apart from each other in the direction in which the short sides of the stacked surfaces extend. Thereby, the number of wires between the die D3a and the die D3b can be increased, the data transfer speed between both chips can be improved, and the processing speed can be increased.
- the die D3a and the die D3b may be arranged apart from each other in the direction in which the long sides of the stacked surfaces extend.
- the influence of electromagnetic noise caused by data transfer through the interchip wiring becomes stronger in the stacking direction of the chips, as the pixel signal of the pixel where the interchip wiring and the readout circuit overlap in the stacking direction. Therefore, pixels affected by electromagnetic noise can be easily identified, so that the noise reduction method does not become complicated.
- FIG. 4 Another fourth configuration example is shown in FIG.
- the arrangement of each part of the third layer is different from the above-mentioned example. Specifically, the area of the area where the analog circuit section 41b provided on the die D2 and the AI image processing section 44 provided on the die D3 overlap when viewed from the stacking direction is reduced.
- a conversion processing section 48 that is provided as a part of the analog circuit section 41b and performs A/D conversion is arranged at a position that does not overlap with the AI image processing section 44 arranged in the third layer when viewed from the stacking direction. be done. Thereby, it is possible to reduce the possibility that electromagnetic noise generated during execution of AI image processing by the AI image processing unit 44 will affect the result of A/D conversion. Therefore, captured image data (RAW image data) with less noise can be generated as digital data after A/D conversion. Furthermore, since A/D conversion and inference processing can be performed simultaneously, it is also possible to perform complex AI image processing that requires a long processing time.
- the fifth configuration example includes a CVDSP 42c as the image signal processing section 42 in addition to a logic circuit section 42a and an ISP 42b.
- the CVDSP 42c is constituted by a DSP that performs CV processing, and is provided in the die D3 constituting the third layer as shown in FIG.
- the CVDSP 42c performs image processing on frame images stored in the second layer storage section 45a as a frame memory. I do. Therefore, the CVDSP 42c is suitable for processing such as edge enhancement processing, scaling processing, and affine transformation processing that requires calculation using pixel data of pixels on a line different from the pixel targeted for image processing.
- the CVDSP 42c is capable of executing these processes without converting the frame image back into line data, and it is possible to improve the processing speed. Further, the CVDSP 42c enables calculations that require parallel processing using a plurality of lines for each processing unit, such as image processing based on a histogram of the entire image surface.
- the CVDSP 42c is provided in the die D3 constituting the third layer.
- the CVDSP 42c and the AI image processing section 44 are arranged adjacent to the third layer storage section 45b.
- the CVDSP 42c and the AI image processing unit 44 are configured to easily access the third layer storage unit 45b, and can speed up processing.
- Example of AI image processing An example of AI image processing executed by the AI image processing unit 44 will be described. In the explanation up to this point, several configuration examples of the image sensor IS have been given, but in the following explanation, the fifth configuration example of the image sensor IS explained using FIGS. 16 and 17 will be based on AI image processing. A case in which this is executed will be explained.
- a first example of AI image processing is to perform mask processing on an image.
- FIG. 18 shows an example of the image Gr1 before mask processing.
- the image Gr1 before mask processing includes a person A as a subject and a box-shaped object B.
- the image Gr1 is input to the AI image processing unit 44 as an input tensor.
- the AI image processing unit 44 performs AI image processing to infer the region where the person A is imaged in the image Gr1 as an input tensor.
- the input tensor of the AI image processing unit 44 is a frame image
- the output tensor of the AI image processing unit 44 is an image in which an image area in which a person is imaged is filled in with a predetermined color as shown in FIG. It is what was done. That is, the AI image processing unit 44 performs masking of a predetermined area.
- the input tensor of the AI image processing unit 44 is a frame image
- the output tensor of the AI image processing unit 44 is coordinate information for specifying an image area where a person is imaged.
- the CPU 43a or the memory controller of the second layer storage section 45a assigns a predetermined pixel value (0, 255, etc.) to the image area specified by the coordinate information in the frame image stored in the second layer storage section 45a. Perform the overwriting process.
- a second example of AI image processing involves superimposing a bounding box on an image.
- the image Gr1 before superimposition is shown in FIG. 18 above.
- a person A and an object B are shown in the image Gr1.
- the image Gr1 is input to the AI image processing unit 44 as an input tensor.
- the AI image processing unit 44 detects a person A and an object B in the image Gr1, and gives a label indicating the classification result of the detected person A and object B.
- an image Gr3 (see FIG. 20) in which the frame image D as a bounding box is superimposed on the frame image is obtained.
- the image Gr3 on which the frame image D is superimposed is output as the output tensor of the AI image processing unit 44.
- Another method is to output coordinate information and label information of person A and object B as an output tensor from the AI image processing unit 44.
- the CPU 43a or the memory controller of the second layer storage section 45a overwrites a rectangle surrounding the image area specified by the coordinate information in the frame image stored in the second layer storage section 45a with predetermined pixel values. By doing so, the frame image D as a bounding box is superimposed.
- a frame image D may include a frame-shaped image and character information representing label information, or a frame image D may include only a frame-shaped image. . If the frame image D is only a frame-shaped image, the frame-shaped image may have a color according to the classification result of the subject.
- the AI image processing unit 44 switches and executes a plurality of AI image processes.
- the AI image processing unit 44 performs first AI image processing using the first AI model, then switches the AI model to the second AI model, and then performs the second AI image processing using the second AI model. Performs AI image processing.
- a frame image is used as an input tensor, and age information of a person captured in the image is estimated and output as an output tensor.
- a frame image is used as an input tensor, and gender information of a person captured in the image is estimated and output as an output tensor.
- CV processing by the CVDSP 42c is not performed between the first AI image processing and the second AI image processing.
- the fourth example of AI image processing is similar to the third example, in which a plurality of AI image processes are switched and executed. Furthermore, as a difference from the third example, CV processing by the CVDSP 42c is performed between both AI image processing.
- the image to be processed by the CVDSP 42c at this time is the frame image stored in the second layer storage section 45a.
- the AI image processing unit 44 performs a plurality of AI image processes using a plurality of AI models
- a first AI model specialized for the first AI image processing and a second AI model specialized for the second AI image processing are used.
- a second AI model can be used. Therefore, a highly accurate inference result can be obtained as a whole.
- switching of the AI model is performed by switching the weighting coefficient of the AI model.
- AI models can be switched with simple processing.
- the input tensor input to the second AI model is subjected to appropriate image processing by the CVDSP 42c, thereby making it possible to improve the recognition rate of AI image processing.
- the image processing by the CVDSP 42c is performed according to the recognition result of the first AI image processing, but the input tensor is data of a frame image stored in the second layer storage unit 45a as a frame memory. ing. That is, since the image sensor IS includes the second layer storage section 45a as a frame memory, it is possible to realize such processing.
- the process of detecting a specific object such as a person is performed in the first AI image processing, and the CVDSP 42c performs the process of cutting out a predetermined image area according to the result of the detection process, and the cut out partial image is
- the second AI image processing using the input tensor as the input tensor.
- face detection is performed in the first AI image processing
- processing to cut out the detected image area is performed in the CVDSP 42c
- processing to detect facial feature amounts is performed in the second AI image processing.
- the first AI image processing performs human body detection
- the CVDSP 42c performs processing to cut out the detected image area
- the second AI image processing performs skeletal estimation and posture estimation.
- the first AI image processing detects the license plate of the vehicle
- the CVDSP 42c performs processing to cut out the detected image area
- the second AI image processing performs processing to estimate the characters written on the license plate. This makes it possible to realize a function of identifying a vehicle that has passed in front of the traffic monitoring camera.
- the second AI image processing can estimate the attribute information about the person, the posture information and skeletal information of the person, and the character string about the license plate. This can be carried out suitably.
- a fifth example of AI image processing is one in which the AI image processing unit 44 switches between and executes a plurality of AI image processes. Further, the difference from the fourth example is that the image to be processed by the CV processing of the CVDSP 42c is not a frame image stored in the second layer storage unit 45a as a frame memory, but an output output from the first AI image processing. This is a point that is an image as a tensor.
- the CVDSP 42c performs further changes to the image changed by the first AI image processing.
- the AI image processing unit 44 performs denoising processing to remove noise from the frame image in the first AI image processing using the first AI model.
- the CVDSP 42c performs edge enhancement processing as CV processing on the noise-removed image that is the output tensor of the first AI model.
- the AI image processing unit 44 receives the edge-enhanced image as an input tensor for the second AI model, and performs detection processing such as person detection as second AI image processing.
- the CVDSP 42c can make the image as the input tensor input to the second AI image processing more appropriate by performing CV processing on the output tensor of the first AI image processing by the AI image processing unit 44. . Therefore, it becomes possible to infer the subject more accurately by the second AI image processing.
- the first AI image processing it is possible to apply various deterioration correction processes that correct image deterioration. Further, as the CV processing by the CVDSP 42c, various types of sharpening processing for sharpening the image can be applied.
- the deterioration correction process may include a dynamic range correction process in addition to the denoising process.
- the sharpening processing may include saturation correction processing and contrast correction processing.
- the second AI image processing is performed using the image obtained by performing the deterioration correction processing as the first AI image processing and further performing the sharpening processing by the CVDSP 42c as an input tensor, so that the second AI image In processing, it becomes possible to perform highly accurate inference processing.
- the process of inferring the image after sharpening may be performed in the first AI image processing, and the CV processing as deterioration correction processing may be performed in the CVDSP 42c.
- a first example of AI image processing is to perform mask processing on a partial area of an image (see FIGS. 18 and 19). Furthermore, a second example of AI image processing involves superimposing a bounding box on a partial area of an image (see FIGS. 18 and 20).
- FIG. 21 shows the flow of processing executed by each unit in the first and second examples of AI image processing.
- the ISP 42b generates an input tensor based on a frame image obtained by processing the analog circuit section 41b and the logic circuit section 42a on the pixel signal output from the pixel array section 41a.
- the frame image may be used as the input tensor as it is, or the frame image may be converted to match the format of the input tensor of the subsequent AI model.
- the generated input tensor is stored in the second layer storage section 45a in step S201.
- the AI image processing unit 44 acquires the input tensor from the second layer storage unit 45a in step S202.
- the input tensor is provided to the first AI model in step S301 to perform inference processing as first AI image processing.
- step S302 the AI image processing unit 44 outputs coordinate information as an output tensor of the first AI model to the CPU 43a.
- the coordinate information may be once stored in the memory section 45 and then output to the CPU 43a via the memory section 45.
- step S401 the CPU 43a performs overwriting processing according to the coordinate information.
- pixel values are overwritten in step S203 in the second layer storage section 45a. This realizes, for example, a process in which an image area in which a person is captured is replaced with a black image, or a process in which a bounding box is superimposed.
- a third example of AI image processing is one in which a plurality of AI image processes are switched and executed.
- FIG. 22 shows the flow of processing executed by each unit in the third example of AI image processing.
- the ISP 42b generates an input tensor based on a frame image obtained by processing the analog circuit section 41b and the logic circuit section 42a on the pixel signal output from the pixel array section 41a.
- the frame image may be used as the input tensor as it is, or the frame image may be converted to match the format of the input tensor of the subsequent AI model.
- the generated input tensor is stored in the second layer storage section 45a in step S201.
- the AI image processing unit 44 acquires the input tensor from the second layer storage unit 45a in step S202.
- the input tensor is provided to the first AI model in step S303 to perform inference processing as first AI image processing.
- the first AI image processing is, for example, processing for inferring the age of a person as a subject.
- step S304 the AI image processing unit 44 outputs the inference result (for example, estimated age information) as the first output tensor of the first AI model to the outside of the image sensor IS. Also, at this time, a completion notification of the inference process is sent to the CPU 43a.
- the inference result for example, estimated age information
- the CPU 43a Upon receiving the completion notification, the CPU 43a transmits an AI model switching instruction in step S402.
- the AI image processing unit 44 performs AI model switching in step S305. Thereby, the first AI model is switched to the second AI model.
- the AI image processing unit 44 acquires the input tensor from the second layer storage unit 45a again in step S204.
- This input tensor may be the same as the input tensor input to the first AI model.
- step S306 the AI image processing unit 44 executes second AI image processing using the second AI model.
- the second AI image processing is, for example, processing for inferring the gender of a person as a subject.
- step S307 the AI image processing unit 44 outputs the inference result (for example, estimated gender information) as a second output tensor of the second AI model to the outside of the image sensor IS.
- a completion notification may be sent to the CPU 43a.
- a fourth example of AI image processing is one in which a plurality of AI image processes are switched and executed. Furthermore, CV processing by the CVDSP 42c is performed between both AI image processing.
- FIG. 23 shows the flow of processing executed by each unit in the fourth example of AI image processing.
- the ISP 42b generates an input tensor based on a frame image obtained by processing the analog circuit section 41b and the logic circuit section 42a on the pixel signal output from the pixel array section 41a.
- the frame image may be used as the input tensor as it is, or the frame image may be converted to match the format of the input tensor of the subsequent AI model.
- the generated input tensor is stored in the second layer storage section 45a in step S201.
- the AI image processing unit 44 acquires the input tensor from the second layer storage unit 45a in step S202.
- the input tensor is provided to the first AI model in step S308 to perform inference processing as first AI image processing.
- the first AI image processing is, for example, processing for specifying an image area in which a person's face as a subject is captured.
- step S309 the AI image processing unit 44 outputs coordinate information as the first output tensor of the first AI model to the CPU 43a.
- the CPU 43a After receiving the coordinate information, the CPU 43a instructs the CVDSP 42c to cut out the data in step S403. At this time, the CPU 43a transmits the received coordinate information to the CVDSP 42c.
- the CVDSP 42c that has received the coordinate information acquires a frame image from the second layer storage unit 45a in step S205.
- step S501 the CVDSP 42c performs a process of cutting out the image area specified based on the coordinate information from the acquired frame image. As a result, the CVDSP 42c obtains a partial image including the person's face.
- step S502 the CVDSP 42c outputs the partial image to the AI image processing unit 44.
- the CPU 43a instructs the AI image processing unit 44 to switch the AI model in step S402, after instructing the cutting out or substantially at the same time as instructing the cutting out.
- the AI image processing unit 44 performs AI model switching in step S305.
- the AI image processing unit 44 performs second AI image processing in step S310 using the partial image received from the CVDSP 42c in step S502 as an input tensor.
- This process is a process of detecting feature amounts of a person's face included in a partial image.
- step S311 the AI image processing unit 44 outputs the detected feature quantity as a second output tensor to the outside of the image sensor IS.
- the AI image processing unit 44 switches between and executes a plurality of AI image processes. Further, the image to be processed by the CV processing of the CVDSP 42c is not a frame image stored in the second layer storage unit 45a as a frame memory, but an image as an output tensor output from the first AI image processing.
- FIG. 24 shows the flow of processing executed by each unit in the fifth example of AI image processing.
- the ISP 42b generates an input tensor based on a frame image obtained by processing the analog circuit section 41b and the logic circuit section 42a on the pixel signal output from the pixel array section 41a.
- the frame image may be used as the input tensor as it is, or the frame image may be converted to match the format of the input tensor of the subsequent AI model.
- the generated input tensor is stored in the second layer storage section 45a in step S201.
- the AI image processing unit 44 acquires the input tensor from the second layer storage unit 45a in step S202.
- the input tensor is provided to the first AI model in step S312 to perform inference processing as first AI image processing.
- the first AI image processing is, for example, denoising processing that removes noise from an image.
- step S313 the AI image processing unit 44 outputs the image data after noise removal as the first output tensor of the first AI model to the third layer storage unit 45b.
- the image data after noise removal is stored in the third layer storage unit 45b in step S601.
- the AI image processing unit 44 sends a completion notification of the denoising process to the CPU 43a.
- step S404 the CPU 43a that has received the completion notification instructs the CVDSP 42c to perform edge enhancement, which is an example of a process for sharpening an image.
- the CPU 43a transmits instruction information to the CVDSP 42c.
- the CVDSP 42c which has received the instruction information regarding edge enhancement, acquires the image data after noise removal from the third layer storage unit 45b in step S602.
- step S503 the CVDSP 42c performs image processing to emphasize edges on the acquired image data after noise removal. As a result, the CVDSP 42c obtains image data after edge emphasis.
- step S504 the CVDSP 42c transmits the edge-enhanced image data to the AI image processing unit 44. Note that when transmitting the edge-enhanced image data from the CVDSP 42c to the AI image processing unit 44, the image data may be temporarily stored in the third layer storage unit 45b.
- the CPU 43a instructs the AI image processing unit 44 to switch the AI model in step S402, after instructing edge emphasis or substantially at the same time as instructing edge emphasis.
- the AI image processing unit 44 performs AI model switching in step S305.
- the CPU 43a After the AI model switching process, the CPU 43a performs second AI image processing in step S314 using the edge-enhanced image data received from the CVDSP 42c in step S504 as an input tensor.
- This process is a process for detecting a person included in an image.
- step S315 the AI image processing unit 44 outputs information about the detected person as a second output tensor to the outside of the image sensor IS.
- FIG. 25 shows an example of the first execution timing.
- the first execution timing example is such that the execution period of the AI image processing by the AI image processing unit 44 does not overlap with the A/D conversion processing by the analog circuit unit 41b.
- the A/D conversion process and the AI image process are executed in a time-sharing manner during the frame period Tf during which one frame image is generated, thereby completing the process.
- processing such as development by the ISP 42b is performed almost simultaneously with A/D conversion. This generates an input tensor to the AI model.
- AI image processing is executed by the AI image processing unit 44, and after the completion of the AI image processing, the inference result is output as an output tensor from the AI model. Since the development process is completed after the A/D conversion process is completed, the AI image processing is necessarily executed after the A/D conversion process is completed.
- one AI image processing is performed using one AI model, so it is better to perform AI image processing between A/D conversion processing. It is easy to understand.
- FIG. 26 shows an example of the second execution timing.
- the timing of the A/D conversion process by the analog circuit section 41b and the development process by the ISP 42b is the same as the first example of execution timing.
- the processing timing by the AI image processing unit 44 is different from the first execution timing example.
- the total execution period of the A/D conversion processing by the analog circuit section 41b and the AI image processing by the AI image processing section 44 is set to be longer than the frame period Tf. That is, the AI image processing by the AI image processing unit 44 is performed so that the execution period partially overlaps with the A/D conversion processing.
- the processing time related to the AI image processing tends to be long. Therefore, when the amount of calculation is large, the execution periods of A/D conversion processing and AI image processing may have to partially overlap.
- the electromagnetic Image quality deterioration due to noise may be suppressed.
- FIG. 27 shows an example of the third execution timing. Similar to the first execution timing example, the execution period of the A/D conversion process by the analog circuit section 41b and the execution period of the AI image processing by the AI image processing section 44 are made not to overlap. Moreover, the execution period of the memory overwriting process by the CPU 43a and the image output process by the communication I/F 46, which are performed after the AI image processing, is made not to overlap with the A/D conversion process.
- Configuration for privacy protection In the above example, an example was described in which mask processing (hereinafter referred to as "privacy mask processing") is performed on an image area in which a person is imaged in order to protect privacy.
- mask processing hereinafter referred to as "privacy mask processing”
- an example of a configuration provided in the image sensor IS so as not to output an image for which privacy protection is not ensured outside the image sensor IS will be described.
- the CPU 43a serving as the in-sensor control unit 43 provided in the die D3 forming the third layer in the image sensor IS has a communication control function in addition to a control function F11, an authentication function F12, and an encryption function F13. It has F14.
- the communication control function F14 performs communication control when captured image data and metadata as inference results are transmitted from the camera 3 to other devices, for example by controlling an antenna provided outside the image sensor IS. I do.
- Communication with other devices realized by the communication control function F14 is, for example, LPWA (Low Power Wide Area) such as SIGFOX or LTE-M (Long Term Evolution Machine).
- LPWA Low Power Wide Area
- SIGFOX Small Integrated Circuit
- LTE-M Long Term Evolution Machine
- the image sensor IS transmits not only image data as an output tensor obtained as an inference result, but also image data as an input tensor input to the AI model to the outside of the image sensor IS or the camera 3. be. This is done for the purpose of checking the operation of the image sensor IS. That is, although it is conceivable that the image sensor IS transmits various image data to the outside, privacy is strongly protected by providing the AI image processing section 44 for performing privacy mask processing within the image sensor IS. becomes possible.
- FIG. 29 shows a first configuration example of the image sensor IS. Note that FIG. 29 shows only the portions related to privacy mask processing extracted from the various portions of the image sensor IS.
- the image sensor IS includes a pixel array section 41a, a circuit section 49, an ISP 42b, an AI image processing section 44, a memory section 45, and a communication I/F 46 as parts related to privacy mask processing.
- the pixel array section 41a has the same configuration as each of the above-mentioned examples, so a description thereof will be omitted.
- the circuit section 49 includes the above-mentioned analog circuit section 41b and logic circuit section 42a. However, the circuit section 49 may be configured to include only the analog circuit section 41b, or may be configured to include both the analog circuit section 41b and the logic circuit section 42a.
- the ISP 42b performs processing to generate image data as an input tensor for the AI model constructed in the AI image processing unit 44.
- the AI image processing unit 44 is capable of executing first AI image processing using the first AI model M1 and second AI image processing using the second AI model M2.
- the first AI image processing and the second AI image processing may be executed simultaneously, or may be executed in a time-sharing manner by switching the AI model.
- the AI image processing unit 44 performs first AI image processing as inference processing using the first AI model M1 and second AI image processing as privacy mask processing using the second AI model M2. considered to be executable.
- the first AI image processing may be a process of detecting a person, a process of detecting another subject, or a process of detecting a feature amount of a specific subject. Alternatively, it may be a process of character recognition, a process of correcting image deterioration, or a process of sharpening the image.
- the input tensor to the second AI model M2 that implements privacy mask processing is the input tensor of the first AI model M1. Furthermore, the output tensor from the second AI model M2 is image data that has been subjected to privacy mask processing. That is, in the second AI model M2, as the privacy masking process, both a process of specifying an image area in which a person is shown in an image and a process of masking the specified area are performed.
- the memory unit 45 is configured to include a ROM and a RAM, but in this example, the ROM of the memory unit 45 is excerpted and described.
- the ROM serving as the memory unit 45 stores weighting coefficients, parameters, etc. for functioning as the second AI model M2. That is, the various numerical values stored in the memory unit 45 for making the second AI model M2 function cannot be rewritten.
- a ROM in which various parameters regarding the second AI model M2 are stored is provided in the die D3 as the third layer in which the AI image processing unit 44 functioning as the second AI model M2 is provided. This is desirable. Thereby, it is possible to quickly switch to the second AI image processing using the second AI model M2.
- the communication I/F 46 is capable of outputting only privacy-protected image data to the outside of the image sensor IS by receiving the output tensor that has been subjected to privacy mask processing by the second AI model M2. .
- the image sensor IS has a configuration for outputting an image as an input tensor to the first AI model M1 from the communication I/F 46.
- Such a configuration is for evaluating and checking the operation of the first AI model M1 in the image sensor IS, and the user can check both the input tensor and output tensor for the first AI model M1, It is possible to judge whether the inference processing is functioning normally or not.
- the configuration of the image sensor IS shown in FIG. 29 can be said to be suitable as a configuration for debugging an AI model.
- the input tensor to the first AI model M1 is image data that has not been subjected to privacy mask processing, it is possible to appropriately perform inference processing and the like.
- FIG. 30 shows a second configuration example of the image sensor IS. Note that FIG. 30 shows only the portions related to privacy mask processing extracted from the various portions included in the image sensor IS.
- the image sensor IS includes a pixel array section 41a, a circuit section 49, an ISP 42b, an AI image processing section 44, a privacy mask processing section PM, a memory section 45, and a communication I/F 46 as parts related to privacy mask processing.
- the pixel array section 41a and the circuit section 49 are the same as those in configuration example 1, so their explanation will be omitted.
- the ISP 42b performs processing to generate image data as an input tensor for the first AI model M1 constructed in the AI image processing unit 44.
- the image data as the input tensor is also input to the privacy mask processing unit PM.
- the AI image processing unit 44 performs inference processing such as person detection using the first AI model M1, and outputs the inference result to the privacy mask processing unit PM as an output tensor.
- the privacy mask processing unit PM receives the image data as an input tensor from the ISP 42b to the first AI model M1 and the detection result as an output tensor from the first AI model M1, and processes the image area in which the detected person is captured. Perform privacy mask processing to mask.
- the privacy mask processing unit PM in this example performs privacy mask processing not by AI image processing using an AI model, but by processing by the CPU 43a or memory controller, for example. That is, for example, the privacy mask processing unit PM performs a process of overwriting the pixel values of a predetermined image area in the input tensor stored in the second layer storage unit 45a with a predetermined value.
- the memory section 45 shown in FIG. 30 includes only the ROM selected from the RAM and the ROM.
- the ROM stores a program executed by the privacy mask processing unit PM. Thereby, predetermined privacy mask processing can be reliably executed.
- FIG. 31 shows an example of the flow of processing executed by the privacy mask processing unit PM of this example.
- the privacy mask processing unit PM obtains the input tensor and output tensor for the first AI model M1 in step S701.
- step S702 the privacy mask processing unit PM determines whether a person class is included in the higher ranking of the inference results. If it is determined that the person class is included, the privacy mask processing unit PM performs privacy mask processing on the image area in which the subject to which the person class has been assigned is detected in step S703.
- privacy masking is to be performed when the person class is included at the top of the inference results, only subjects that are highly likely to be people are subjected to privacy masking. Furthermore, if privacy mask processing is to be performed when the person class is included in the top five of the inference results, subjects that are unlikely to be people are also subject to privacy mask processing. In this case, privacy is strongly protected.
- the privacy mask processing unit PM After performing the privacy mask processing, the privacy mask processing unit PM outputs the input tensor after the mask processing to the communication I/F 46 in step S704.
- the privacy mask processing unit PM outputs the image data as the acquired input tensor as it is to the communication I/F 46 in step S705. .
- FIG. 32 shows a third configuration example of the image sensor IS. Note that FIG. 32 shows only the parts related to privacy mask processing extracted from each part of the image sensor IS.
- the image sensor IS includes a pixel array section 41a, a circuit section 49, an ISP 42b, an AI image processing section 44, a memory section 45, and a communication I/F 46 as parts related to privacy mask processing.
- the pixel array section 41a and the circuit section 49 have the same configuration as in configuration example 1, so their description will be omitted.
- the ISP 42b includes an input tensor processing unit 41b1 that performs development processing and CV processing for input tensors, and a normal image processing unit 41b2 that performs CV processing and the like for normal images (for example, high-resolution images).
- the normal image is an image as a through image displayed on the display section of the camera 3, an image recorded in the memory section 45 for viewing, or the like.
- the input tensor processing unit 41b1 performs processing to generate image data as an input tensor for the AI model constructed in the AI image processing unit 44.
- the normal image processing unit 41b2 performs processing to generate image data for recording by performing the above-mentioned synchronization processing, YC generation processing, resolution conversion processing, codec processing, noise removal processing, etc.
- the AI image processing unit 44 is enabled to execute first AI image processing (inference processing) using the first AI model M1 and second AI image processing (privacy mask processing) using the second AI model M2.
- the input tensor generated by the input tensor processing unit 41b1 is input to the first AI model M1.
- the output tensor from the first AI model M1 is not shown because it can be output to each part.
- the input tensor of the first AI model M1 and the image data generated by the normal image processing unit 41b2 can be input as input tensors to the second AI model M2.
- the second AI model M2 performs a privacy mask process for each input tensor to identify and mask an image area in which a person appears.
- the output tensor from the second AI model M2 is supplied to the communication I/F 46 as privacy-protected image data and output to the outside of the image sensor IS.
- the memory section 45 shown in FIG. 32 includes only the ROM selected from the RAM and the ROM.
- the ROM stores various parameters such as weighting coefficients for the AI image processing unit 44 to function as the second AI model M2.
- the image sensor IS includes a pixel array section 41a, an analog circuit section 41b, a logic circuit section 42a, a second layer storage section 45a as a frame memory, an ISP 42b, an AI image processing section 44, a CPU 43a, and a third layer storage section as a working memory. 45b, a communication I/F 46a for MIPI ("MIPI" in the figure), and a communication I/F 46b for PCIe (Peripheral Component Interconnect Express) ("PCIe" in the figure).
- MIPI MIPI
- PCIe Peripheral Component Interconnect Express
- the authentication function F12 and the encryption function F13 are provided as functions of the CPU 43a, but in this modification, the authentication function F12 and the encryption function F13 are provided separately from the CPU 43a. There is.
- the authentication function F12 and the encryption function F13 appropriately execute the above-described authentication process, encryption process, and decryption process in response to instructions from the CPU 43a.
- the certificate used for the authentication process, the encryption key used for the encryption process, and the decryption key used for the decryption process may be stored in the third layer storage section 45b; It may be stored in a dedicated storage unit that can be used.
- This modified example includes a plurality of buses 47, unlike the example shown in FIG. Specifically, the first one is a memory bus 47a to which the ISP 42b, the AI image processing section 44, the CPU 43a, the third layer storage section 45b, and the MIPI communication I/F 46a are connected.
- the memory bus 47a is mainly used by the ISP 42b, the AI image processing unit 44, and the CPU 43a to access the third layer storage unit 45b as a working memory. Furthermore, the memory bus 47a is used to output MIPI standard image data to the outside of the image sensor IS.
- the second is an APB (Advanced Peripheral Bus) 47b as a low-speed bus to which the ISP 42b, the AI image processing unit 44, and the CPU 43a are connected.
- the APB 47b is mainly used to transmit commands from the CPU 43a to the ISP 42b and the AI image processing unit 44.
- the third is a high-speed AHB (Advanced High-Performance Bus) 47c to which the PCIe communication I/F 46b and the CPU 43a are connected.
- the AHB 47c is used when outputting label information as a recognition result.
- the communication I/F 46a for MIPI is an I/F used mainly for transmitting image data, specifically, frame images stored in the second layer storage section 45a as a frame memory, ISP 42b, etc. It serves as an I/F for outputting images that have been subjected to various processing by the AI image processing unit 44.
- the communication I/F 46b for PCIe is an I/F used mainly to send and receive information other than image data, and specifically, it is used when outputting label information etc. as a recognition result of inference processing. be done.
- the communication I/F 46b can also be used as an I/F to which a test image is input when the test image is used as an input tensor.
- AI image processing can be performed using not only an image obtained according to the light receiving operation of the pixel array section 41a but also an image input from outside the image sensor IS as a test image. Therefore, it is possible to verify the AI model.
- the communication I/F 46b for PCIe instead of the communication I/F 46a for MIPI, power consumption can be reduced.
- the communication I/F 46b for PCIe can be used when deploying an AI model (weighting coefficients and various parameters) within the image sensor IS. Further, in this case, the setting information of the ISP 42b may be expanded to the image sensor IS together with the AI model in order to make the input tensor to the AI model appropriate.
- the image sensor IS may include a CVDSP 42c. Furthermore, when the CVDSP 42c is provided, the setting information of the CVDSP 42c may be developed in the image sensor IS in accordance with the development of the AI model.
- the image sensor IS has a three-layer structure, but it may have a four-layer structure or more.
- a layer for cutting electromagnetic noise may be provided between the second layer and the third layer.
- the CV processing is performed after the AI image processing, and then the AI image processing is performed.
- CV processing may be performed after AI image processing, or AI image processing may be performed after CV processing.
- the first AI image processing is performed, the CV processing is performed on the result, and the second AI image processing is performed using the result of the CV processing as an input tensor. , it becomes possible to perform multiple AI image processes that require CV processing.
- AI image processing inference processing
- an operation system 51 is installed on various hardware 50 such as a CPU, GPU (Graphics Processing Unit), ROM, and RAM as the control unit 33 shown in FIG. 7 (see FIG. 34).
- various hardware 50 such as a CPU, GPU (Graphics Processing Unit), ROM, and RAM as the control unit 33 shown in FIG. 7 (see FIG. 34).
- the operation system 51 is basic software that performs overall control of the camera 3 in order to realize various functions in the camera 3.
- General-purpose middleware 52 is installed on the operation system 51.
- the general-purpose middleware 52 is software for realizing basic operations such as a communication function using the communication unit 35 as the hardware 50 and a display function using the display unit (monitor, etc.) as the hardware 50. be.
- the orchestration tool 53 and the container engine 54 deploy and execute the container 55 by constructing a cluster 56 as an operating environment for the container 55.
- the edge runtime shown in FIG. 5 corresponds to the orchestration tool 53 and container engine 54 shown in FIG. 34.
- the orchestration tool 53 has a function for causing the container engine 54 to appropriately allocate the resources of the hardware 50 and operation system 51 described above.
- the orchestration tool 53 groups the containers 55 into predetermined units (pods to be described later), and each pod is expanded to worker nodes (described later) in logically different areas.
- the container engine 54 is one of the middleware installed in the operation system 51, and is an engine that operates the container 55. Specifically, the container engine 54 has a function of allocating resources (memory, computing power, etc.) of the hardware 50 and the operation system 51 to the container 55 based on a configuration file included in middleware in the container 55.
- the resources allocated in this embodiment include not only resources such as the control unit 33 included in the camera 3 but also resources such as the in-sensor control unit 43, memory unit 45, and communication I/F 46 included in the image sensor IS. It will be done.
- the container 55 is configured to include middleware such as applications and libraries for realizing predetermined functions.
- the container 55 operates to implement a predetermined function using the resources of the hardware 50 and operation system 51 allocated by the container engine 54.
- the AI application and AI model shown in FIG. 5 correspond to one of the containers 55. That is, one of the various containers 55 deployed in the camera 3 realizes a predetermined AI image processing function using an AI application and an AI model.
- cluster 56 constructed by the container engine 54 and the orchestration tool 53 will be described with reference to FIG. 35.
- the cluster 56 may be constructed across a plurality of devices so that functions are realized using not only the hardware 50 of one camera 3 but also other hardware resources of other devices.
- the orchestration tool 53 manages the execution environment of the container 55 on a per worker node 57 basis. Further, the orchestration tool 53 constructs a master node 58 that manages all of the worker nodes 57 .
- the pod 59 is configured to include one or more containers 55, and implements a predetermined function.
- the pod 59 is a management unit for managing the container 55 by the orchestration tool 53.
- the operation of the pod 59 on the worker node 57 is controlled by the pod management library 60.
- the pod management library 60 includes a container runtime for allowing the pods 59 to use logically allocated resources of the hardware 50, an agent that receives control from the master node 58, communication between the pods 59, and communication with the master node 58. It is configured with a network proxy etc. That is, each pod 59 is enabled to implement a predetermined function using each resource by the pod management library 60.
- the master node 58 shares data with an application server 61 that deploys the pod 59, a manager 62 that manages the deployment status of the container 55 by the application server 61, and a scheduler 63 that determines the worker node 57 where the container 55 is placed. It is configured to include a data sharing section 64.
- the AI model may be stored in the memory unit 45 within the image sensor IS via the communication I/F 46 in FIG. 7, and AI image processing may be executed within the image sensor IS.
- the configuration shown in FIGS. 34 and 35 may be deployed in the memory unit 45 and in-sensor control unit 43 within the image sensor IS, and the above-described AI application and AI model may be executed using container technology within the image sensor IS.
- the container technology can be used even when deploying an AI application and/or an AI model to the fog server 4 or the cloud-side information processing device.
- the information of the AI application and the AI model is developed as a container or the like in a memory such as the nonvolatile memory unit 74, storage unit 79, or RAM 73 in FIG. 36, which will be described later, and is executed.
- the information processing device includes a CPU 71.
- the CPU 71 functions as an arithmetic processing unit that performs the various processes described above, and executes programs stored in the ROM 72 or a nonvolatile memory unit 74 such as an EEP-ROM (Electrically Erasable Programmable Read-Only Memory), or the storage unit 79.
- Various processes are executed according to the programs loaded into the RAM 73 from the RAM 73.
- the RAM 73 also appropriately stores data necessary for the CPU 71 to execute various processes.
- the CPU 71 included in the information processing device as the cloud server 1 functions as a license authorization section, an account service providing section, a device monitoring section, a marketplace function providing section, and a camera service providing section in order to realize the above-mentioned functions. .
- the CPU 71, ROM 72, RAM 73, and nonvolatile memory section 74 are interconnected via a bus 83.
- An input/output interface (I/F) 75 is also connected to this bus 83.
- the input/output interface 75 is connected to an input section 76 consisting of an operator or an operating device.
- an input section 76 consisting of an operator or an operating device.
- various operators and operating devices such as a keyboard, a mouse, a key, a dial, a touch panel, a touch pad, and a remote controller are assumed.
- a user's operation is detected by the input unit 76, and a signal corresponding to the input operation is interpreted by the CPU 71.
- a display section 77 consisting of an LCD or an organic EL panel, and an audio output section 78 consisting of a speaker etc. are connected to the input/output interface 75 either integrally or separately.
- the display unit 77 is a display unit that performs various displays, and is configured by, for example, a display device provided in the housing of the computer device, a separate display device connected to the computer device, or the like.
- the display unit 77 displays images for various image processing, moving images to be processed, etc. on the display screen based on instructions from the CPU 71. Further, the display unit 77 displays various operation menus, icons, messages, etc., ie, as a GUI (Graphical User Interface), based on instructions from the CPU 71.
- GUI Graphic User Interface
- the input/output interface 75 may be connected to a storage section 79 made up of a hard disk, solid-state memory, etc., and a communication section 80 made up of a modem or the like.
- the communication unit 80 performs communication processing via a transmission path such as the Internet, and communicates with various devices by wired/wireless communication, bus communication, etc.
- a drive 81 is also connected to the input/output interface 75 as necessary, and a removable storage medium 82 such as a magnetic disk, optical disk, magneto-optical disk, or semiconductor memory is appropriately installed.
- the drive 81 can read data files such as programs used for each process from the removable storage medium 82.
- the read data file is stored in the storage section 79, and images and sounds included in the data file are outputted on the display section 77 and the audio output section 78. Further, computer programs and the like read from the removable storage medium 82 are installed in the storage unit 79 as necessary.
- software for the processing of this embodiment can be installed via network communication by the communication unit 80 or the removable storage medium 82.
- the software may be stored in advance in the ROM 72, storage unit 79, or the like.
- the captured image captured by the camera 3 or the processing result of AI image processing may be received and stored in the removable storage medium 82 via the storage unit 79 or the drive 81.
- each of the cloud server 1, user terminal 2, fog server 4, and management server 5 is not limited to being composed of a single computer device as shown in FIG. 36, but may be composed of a system of multiple computer devices. may be done.
- the plurality of computer devices may be systemized using a LAN (Local Area Network) or the like, or may be placed at a remote location via a VPN (Virtual Private Network) using the Internet or the like.
- the plurality of computer devices may include computer devices as a server group (cloud) that can be used by a cloud computing service.
- edge-side AI model As mentioned above, after the SW component of the AI application and the AI model are deployed, the AI model is retrained and the AI model deployed to each camera 3 etc. is triggered by the operation of the service provider or user. (hereinafter referred to as "edge-side AI model") and the process flow when updating an AI application will be specifically described with reference to FIG. 24. Note that FIG. 37 is described focusing on one camera 3 among the plurality of cameras 3.
- the edge-side AI model to be updated is, as an example, the one deployed in the image sensor IS included in the camera 3.
- the edge-side AI model is the image sensor IS in the camera 3. It may also be something that is deployed outside.
- an AI model relearning instruction is given by the service provider or user.
- This instruction is performed using an API function provided in an API (Application Programming Interface) module provided in the cloud-side information processing device.
- API Application Programming Interface
- the amount of images for example, the number of images
- the amount of images used for learning will also be referred to as a "predetermined number of images.”
- the API module Upon receiving the instruction, the API module transmits a relearning request and image amount information to the Hub (similar to the one shown in FIG. 5) in processing step PS2.
- the Hub transmits an update notification and image amount information to the camera 3 as an edge-side information processing device.
- the camera 3 transmits the captured image data obtained by photographing to the image DB (Database) of the storage group in processing step PS4. This photographing process and transmission process are performed until a predetermined number of images required for relearning is achieved.
- the camera 3 when the camera 3 obtains an inference result by performing inference processing on the captured image data, it may store the inference result in the image DB as metadata of the captured image data in processing step PS4.
- the camera 3 After completing the shooting and transmission of the predetermined number of images, the camera 3 notifies the Hub in processing step PS5 that the transmission of the predetermined number of captured image data has been completed.
- the Hub Upon receiving the notification, the Hub notifies the orchestration tool that the preparation of data for relearning is complete in processing step PS6.
- the orchestration tool transmits an instruction to execute the labeling process to the labeling module.
- the labeling module acquires image data targeted for labeling processing from the image DB (processing step PS8), and performs labeling processing.
- the labeling process referred to here may be a process that performs class identification as described above, a process that estimates the gender or age of the subject of an image and assigns a label, or a process that assigns a label to the subject by estimating the gender or age of the subject. It may be a process of estimating the subject's behavior and assigning a label, or a process of estimating the behavior of the subject and assigning a label.
- the labeling process may be performed manually or automatically. Further, the labeling process may be completed by the information processing device on the cloud side, or may be realized by using a service provided by another server device.
- the labeling module After completing the labeling process, stores the labeling result information in the data set DB in processing step PS9.
- the information stored in the dataset DB may be a set of label information and image data, or may be image ID (Identification) information for identifying the image data instead of the image data itself. .
- the storage management unit that detects that the labeling result information is stored notifies the orchestration tool in processing step PS10.
- the orchestration tool that has received the notification confirms that the labeling process for the predetermined number of image data has been completed, and sends a relearning instruction to the relearning module in processing step PS11.
- the relearning module that has received the relearning instruction acquires a dataset to be used for learning from the dataset DB in processing step PS12, and acquires an AI model to be updated from the learned AI model DB in processing step PS13.
- the relearning module retrains the AI model using the acquired data set and AI model.
- the updated AI model obtained in this manner is stored again in the trained AI model DB in processing step PS14.
- the storage management unit that detects that the updated AI model has been stored notifies the orchestration tool in processing step PS15.
- the orchestration tool that has received the notification transmits an AI model conversion instruction to the conversion module in processing step PS16.
- the conversion module that has received the conversion instruction acquires the updated AI model from the learned AI model DB in processing step PS17, and performs the conversion process of the AI model.
- conversion is performed in accordance with the spec information of the camera 3, which is the destination device.
- downsizing is performed so as not to degrade the performance of the AI model as much as possible, and the file format is converted so that it can be operated on the camera 3.
- the AI model that has been converted by the conversion module is the edge-side AI model described above.
- This converted AI model is stored in the converted AI model DB in processing step PS18.
- the storage management unit that detects that the converted AI model has been stored notifies the orchestration tool in processing step PS19.
- the orchestration tool that has received the notification transmits a notification to the Hub to execute the update of the AI model in processing step PS20.
- This notification includes information for specifying the location where the AI model used for the update is stored.
- the Hub Upon receiving the notification, the Hub sends an AI model update instruction to the camera 3.
- the update instruction also includes information for specifying the location where the AI model is stored.
- the camera 3 performs a process of acquiring and developing the target converted AI model from the converted AI model DB. As a result, the AI model used by the image sensor IS of the camera 3 is updated.
- the camera 3 After completing the update of the AI model by developing the AI model, the camera 3 transmits an update completion notification to the Hub in processing step PS23.
- the Hub that has received the notification notifies the orchestration tool that the AI model update process for the camera 3 has been completed in processing step PS24.
- the AI model can be updated in the same way.
- the device (location) on which the AI model was deployed is stored in the storage management section on the cloud side, and the Hub stores the device (location) where the AI model is deployed from the storage management section.
- the device (location) is read and an AI model update instruction is sent to the device where the AI model has been deployed.
- the device that has received the update instruction performs a process of acquiring and developing the target converted AI model from the converted AI model DB. As a result, the AI model of the device that received the update instruction is updated.
- the orchestration tool transmits an instruction to download an AI application such as updated firmware to the deployment control module.
- the deployment control module transmits an AI application deployment instruction to the Hub.
- This instruction includes information to identify where the updated AI application is stored.
- the Hub transmits the expansion instruction to the camera 3 in processing step PS27.
- the camera 3 downloads and deploys the updated AI application from the container DB of the deployment control module.
- an AI application is defined by multiple SW components such as SW components B1, B2, B3, ... Bn, and an AI application is defined as an AI application.
- the location of each SW component is stored in the storage management unit on the cloud side, and when processing step PS27, the Hub stores the deployment location of each SW component from the storage management unit.
- the deployed device (location) is read out, and a deployment instruction is sent to the deployed device.
- the device that has received the deployment instruction downloads and deploys the updated SW component from the container DB of the deployment control module in processing step PS28.
- the AI application referred to here is a SW component other than the AI model.
- both the AI model and the AI application may be updated together as one container.
- the AI model and the AI application may be updated simultaneously rather than sequentially. This can be realized by executing each process of processing steps PS25, PS26, PS27, and PS28.
- the AI Models and AI applications can be updated.
- the AI model is retrained using captured image data captured in the user's usage environment. Therefore, it is possible to generate an edge-side AI model that can output highly accurate recognition results in the user's usage environment.
- each of the above-described processes may be executed not only when relearning the AI model but also when operating the system for the first time in the user's usage environment.
- FIG. 38 shows an example of the login screen G1.
- the login screen G1 is provided with an ID input field 91 for inputting a user ID and a password input field 92 for inputting a password.
- a login button 93 for logging in and a cancel button 94 for canceling the login are arranged below the password input field 92.
- operators such as an operator for transitioning to a page for users who have forgotten their password, an operator for transitioning to a page for new user registration, and the like.
- FIG. 39 is an example of a screen presented to, for example, an AI application developer using the application developer terminal 2A or an AI model developer using the AI model developer terminal 2C.
- purchasable learning datasets, AI models, AI applications, etc. are displayed on the left side.
- data are displayed on the left side.
- an input device such as a mouse to surround only the desired part of the image with a frame, Just enter your name and you're ready to learn. For example, if you want to perform AI learning with an image of a cat, you can surround only the cat part of the image with a frame and enter "cat" as the text input, and the image with the cat annotation will be used for AI learning. can be prepared for.
- an input field 95 is provided for registering learning datasets collected or created by the developer, and AI models and AI applications developed by the developer.
- An input field 95 is provided for each data item to input the name and data storage location. Furthermore, for the AI model, a check box 96 is provided for setting whether retraining is necessary or not.
- a price setting field (indicated as an input field 95 in the figure), etc. may be provided in which the price required when purchasing data to be registered can be set.
- the user name, last login date, etc. are displayed as part of the user information.
- the amount of currency, number of points, etc. that can be used by the user when purchasing data may be displayed.
- FIG. 40 shows, for example, a user who performs various analyzes (the above-mentioned application user) by deploying an AI application or an AI model to the camera 3 as an edge-side information processing device that the user manages. This is an example of a user screen G3.
- radio buttons 97 are arranged that allow selection of the type and performance of the image sensor IS installed in the camera 3, the performance of the camera 3, and the like.
- the user can purchase an information processing device as the fog server 4 via the marketplace. Therefore, radio buttons 97 for selecting each performance of the fog server 4 are arranged on the left side of the user screen G3. Further, a user who already has a fog server 4 can register the performance of the fog server 4 by inputting the performance information of the fog server 4 here.
- the user achieves the desired function by installing the purchased camera 3 (or the camera 3 purchased without going through the marketplace) at any location such as a store that the user manages.
- any location such as a store that the user manages.
- radio buttons 98 are arranged that allow selection of environmental information about the environment in which the camera 3 is installed. By appropriately selecting environmental information regarding the environment in which the camera 3 is installed, the user sets the above-mentioned optimal imaging settings for the target camera 3.
- An execution button 99 is provided on the user screen G3. By pressing the execution button 99, the screen changes to a confirmation screen for confirming the purchase and a confirmation screen for confirming the setting of environmental information. This allows the user to purchase the desired camera 3 and fog server 4, and to set environmental information regarding the camera 3.
- the image sensor IS includes a first layer (die D1) provided with a pixel array section 41a in which a plurality of pixels are arranged two-dimensionally, and a pixel array section 41a that is output from the pixel array section 41a.
- a conversion processing unit (analog circuit unit 41b) that performs A/D conversion to convert analog signals based on pixel signals into digital signals, and a second layer storage unit in which image data, which is digital data based on digital signals, is stored frame by frame. 45a, and a third layer (die D3) that includes an inference processing unit (AI image processing unit 44) that performs inference processing using image data as an input tensor.
- the second layer and the third layer can be made smaller compared to the case where the respective parts provided in the second layer and the third layer are combined into one layer. Therefore, the size of each layer can be made approximately the same as the size of the pixel array section 41a, and there is no need to create an extra area in which no components are mounted on the first layer, thereby reducing the size of the image sensor IS. Can be done. Furthermore, by providing a second layer storage unit 45a as a frame memory in the second layer, inference processing using frame images and other processing using frame images (for example, output processing of frame image data, etc.) are performed. This makes it possible to efficiently process cases. In addition, when the third layer storage section 45b is provided in the third layer, the storage capacity of the third layer storage section 45b can be reduced, and the size of the third layer storage section 45b and the size of the image sensor IS can be reduced. It becomes possible to perform conversion.
- the second layer (die D2) in the image sensor IS may be provided between the first layer (die D1) and the third layer (die D3).
- a conversion processing unit (analog circuit unit 41b) that performs A/D conversion on pixel signals read from pixels included in the pixel array unit 41a is provided in the second layer adjacent to the first layer. Processing up to A/D conversion can be performed smoothly.
- no other layer is disposed between the first layer and the second layer, wiring between layers can be facilitated, and the number of wiring members can be reduced.
- the pixel array unit 41a provided in the first layer is located away from the inference processing unit (AI image processing unit 44) provided in the third layer in the stacking direction, The influence of electromagnetic noise on charges is reduced, making it possible to reduce noise.
- the third layer (die D3) in the image sensor IS may be provided with the third layer storage section 45b serving as a working memory for inference processing.
- the inference process can be performed using the artificial intelligence model (AI model) stored in the third layer storage unit 45b provided in the same layer, thereby reducing the time required for the inference process. I can do it.
- AI model artificial intelligence model
- the conversion processing section (analog circuit section 41b) and inference processing section (AI image processing section 44) in the image sensor IS may be arranged at positions that do not overlap in the stacking direction of each layer. good. It is possible to reduce the possibility that electromagnetic noise generated during execution of inference processing by the inference processing unit will affect the result of A/D conversion. Therefore, image data with less noise (RAW image data) can be generated as digital data after A/D conversion. Furthermore, since A/D conversion and inference processing can be executed simultaneously, it is also possible to execute complex inference processing that requires a long processing time.
- the third layer (die D3) in the image sensor IS includes a processor (e.g., CPU) that is different from the processor (e.g., DSP) that functions as the inference processing unit (AI image processing unit 44).
- a processor e.g., CPU
- DSP digital signal processor
- AI image processing unit 44 the processor that functions as the inference processing unit 44.
- CV processing such as edge enhancement processing, scaling processing, and affine transformation processing can be performed using the CPU 43a with high processing power. Thereby, the processing time can be reduced compared to when the CV processing is performed by the ISP 42b.
- the third layer (die D3) of the image sensor IS is provided with an authentication processing unit (authentication function F12) that performs authentication processing regarding whether or not an artificial intelligence model used for inference processing can be deployed. It may be.
- the authentication processing unit performs, for example, a process for having the server device (cloud-side information processing device) authenticate that the image sensor IS is permitted to deploy an artificial intelligence model.
- the authentication processing unit manages necessary data such as certificates.
- the image sensor IS may receive the encrypted artificial intelligence model, but the authentication processing unit may receive the key to decrypt the encrypted artificial intelligence model. to manage.
- the authentication processing unit may manage keys for encrypting data to be output to the outside.
- Various data managed by the authentication processing unit are stored in storage units such as ROM and RAM provided in the second layer (die D2) and third layer (memory unit 45, second layer storage unit 45a, third layer storage unit 45b). is memorized. Thereby, only the artificial intelligence model received from the authorized server device can be deployed, and security can be improved. Furthermore, security can be improved for output data as well.
- the third layer (die D3) in the image sensor IS is provided with a communication control unit (communication control function F14) that performs communication control for outputting the results of inference processing to the outside. It may be.
- a communication control unit that controls an antenna provided outside the image sensor IS
- LPWA such as SIGFOX and LTE-M.
- the processing unit inside the image sensor IS (communication control unit) is It is possible to improve security by transmitting data by executing a program.
- the chip sizes of the first layer (die D1), second layer (die D2), and third layer (die D3) in the image sensor IS may be the same.
- the image sensor IS which has a unified chip size for each layer, can complete the dicing process only once by stacking each layer on the silicon wafer before dicing and then dicing. positioning is made easier. Thereby, the manufacturing process can be simplified. Note that "the same” as used herein can be considered to be the same when each layer is laminated in the state of a wafer and cut out by one dicing.
- the chip size of the third layer (die D3) in the image sensor IS is smaller than the chip size of the first layer (die D1) and the second layer (die D2). It's okay. Thereby, the cost of the third layer chip can be reduced.
- the third layer chips are pasted on one side of the second layer after dicing, only those chips that are found to be non-defective in the inspection after dicing can be used. Therefore, it is possible to improve the yield of the image sensor IS.
- a plurality of chips may be provided in the third layer (die D3) in the image sensor IS.
- the memory provided in the third layer (the third layer storage section 45b) is a highly integrated DRAM chip, and the chip that functions as a DSP or ISP is a chip manufactured using a cutting-edge process of 10 nm or smaller. be able to. That is, chips produced by different semiconductor manufacturing processes can be mixed in the same third layer. Therefore, the size can be made smaller than when these multiple chips are provided in different layers.
- the memory provided in the third layer a highly integrated chip, it is possible to reduce the size of the memory chip, and by providing a chip with a communication function in the space left open by such miniaturization, it can be multifunctional. becomes possible.
- each of the plurality of chips in the image sensor IS has a rectangular shape having a long side and a short side when viewed from above, and the plurality of chips are arranged with their long sides facing each other. It may be. For example, by arranging the long sides of a DRAM chip and a chip equipped with a DSP adjacent to each other, the number of wires between the processor and the memory can be increased, and processing speed can be increased.
- the A/D conversion in the second layer (die D2) and the inference process in the third layer (die D3) in the image sensor IS are designed so that their execution times do not overlap. may be done. Thereby, it is possible to eliminate the possibility that electromagnetic noise generated during execution of inference processing by the inference processing unit (AI image processing unit 44) will affect the result of A/D conversion.
- the image sensor IS includes a pixel array section 41a in which a plurality of pixels are arranged two-dimensionally, and a first artificial intelligence model ( Inference processing that executes a first inference process using a first AI model M1), and executes a second inference process using a second artificial intelligence model (second AI model M2) based on the result of the first inference process.
- AI image processing unit 44 For example, face detection is performed in the first inference process, and feature amount detection is performed in the second inference process. Alternatively, noise removal is performed in the first inference process, and feature quantity detection is performed in the second inference process.
- each An artificial intelligence model specialized for specific inference processing can be used, and overall highly accurate inference results can be obtained. Further, by performing multiple inference processes using multiple artificial intelligence models, it is possible to improve the functionality of the image sensor IS.
- the setting of the weighting coefficient for the artificial intelligence model is switched, so that the first artificial intelligence model (first AI model M1) and the first artificial intelligence model (first AI model M1) are switched. Switching between the two artificial intelligence models (second AI model M2) may be performed. This makes it possible to switch between artificial intelligence models with simple processing.
- the image sensor IS includes an image processing unit (CVDSP 42c) that performs image processing based on the result of the first inference process, and the inference processing unit (AI image processing unit 44) performs image processing by the image processing unit.
- the second inference process may be performed using the image subjected to this as an input tensor.
- the second inference process can be appropriately performed by the image processing unit performing image processing that improves the accuracy of the inference result of the second inference process.
- the image sensor IS includes a frame memory (second layer storage section 45a) that stores image data, and the image processing section (CVDSP 42c) performs the first inference on the image data stored in the frame memory.
- Image processing may be performed according to the inference result of the processing (first AI image processing). For example, in the first inference process, a process of detecting a predetermined subject from image data is performed.
- the image processing unit also performs processing to generate a partial image by cutting out an area where a predetermined subject is imaged from the image data (frame image) stored in the frame memory based on coordinate information about the detected subject. .
- the artificial intelligence model is switched and a process of extracting feature points of a predetermined subject from the cut out partial image is performed. Since the image sensor IS includes a frame memory, it is possible to perform image processing using image data before being subjected to the first inference process, that is, image data that is used as an input tensor in the first inference process.
- the first inference process is a process to detect a specific target object
- the second inference process is a process to detect a detected target object.
- the image processing unit (CVDSP 42c) performs image processing on the object detected by the first inference process from the image data stored in the frame memory (second layer storage unit 45a). It is also possible to perform processing to cut out an image area of .
- the detection target is a person's face, a person's body, a vehicle license plate, or the like.
- Processing to detect feature amounts includes, for example, feature amounts for a person's face as a detection target, feature amounts for detecting the skeleton and posture of a person's body as a detection target, and This is a process of detecting the numerical features of a license plate.
- the second inference process can suitably perform processes such as detecting attribute information about a person, posture information and skeletal information of a person, and character strings about a license plate using OCR.
- the inference processing unit (AI image processing unit 44) in the image sensor IS outputs image data as a result of the first inference processing (first AI image processing), and the image processing unit (CVDSP 42c) outputs image data as a result of the first inference processing (first AI image processing).
- Image processing may be performed on image data output from one inference process.
- image data obtained by removing noise from image data as a frame image is obtained as an inference result.
- the image processing unit performs image processing such as edge enhancement processing on the image data after noise removal.
- object recognition processing is performed using the image data from which noise has been removed and edges have been emphasized as an input tensor. This makes it possible to infer the subject more accurately.
- the first inference process is a process for correcting deterioration of image data as an input tensor
- the image processing unit (CVDSP 42c) performs the following as image processing: Processing may be performed to sharpen the corrected image data. Thereby, suitable image data from which feature quantities can be easily extracted can be input as the input tensor in the second artificial intelligence model.
- the image sensor IS includes a mask processing unit (AI image processing unit 44 or privacy mask processing unit) that performs mask processing (privacy mask processing) to mask a predetermined area in image data. PM) and a communication control unit (communication control function F14) that performs transmission control to transmit image data subjected to mask processing to other devices.
- a mask processing unit AI image processing unit 44 or privacy mask processing unit
- a communication control unit communication control function F14
- the predetermined area may be an area where a person is imaged.
- an image in which the person is masked is output from the image sensor IS, so that the privacy of the subject can be protected.
- the mask processing unit (AI image processing unit 44 or privacy mask processing unit PM) performs mask processing on the image data after the image processing, and the first artificial intelligence model
- the image data input to the second artificial intelligence model may be image data that has not been subjected to mask processing.
- the image sensor IS includes a frame memory (second layer storage section 45a) that stores image data, and a mask processing section (privacy mask processing section PM) that stores image data in the image data stored in the frame memory.
- Masking processing may be performed by changing the pixel value of a predetermined area to a predetermined value. Thereby, the masking process can be realized by changing part of the data stored in the frame memory, which requires less processing load.
- the mask processing unit (privacy mask processing unit PM) in the image sensor IS may perform mask processing (privacy mask processing) using an artificial intelligence model.
- the mask processing unit may use an artificial intelligence model that detects a person included in the image data and performs a process of masking the image area.
- the mask processing unit (privacy mask processing unit PM) in the image sensor IS uses the inference result of the first inference process or the inference result of the second inference process to mask Processing may be performed. For example, when a first inference process using an artificial intelligence model for detecting a person is executed in the image sensor IS, the mask processing unit executes a process of masking a part of the image area using the result of the inference process. It's okay. Thereby, efficient mask processing can be performed using the inference results. Note that when performing mask processing on the input image data of the first inference process, one artificial intelligence model that performs both the first inference process and the mask process may be used.
- the image sensor IS includes a ROM (memory unit 45) in which a program executed by the mask processing unit (privacy mask processing unit PM) is stored. You can leave it there. Thereby, it is possible to make it difficult to perform unauthorized processing such as unauthorized mask processing or avoidance of mask processing for protecting privacy. That is, it is possible to increase the possibility that appropriate mask processing will be reliably executed.
- ROM memory unit 45
- the information processing method includes a first inference process using a first artificial intelligence model based on image data output from a pixel array unit 41a in which a plurality of pixels are arranged two-dimensionally, and a first inference process.
- a computer device serving as an image sensor IS executes a second inference process using a second artificial intelligence model based on the result.
- the program according to the present technology is a program readable by a computer device, and causes the arithmetic processing unit of the image sensor IS to execute each process shown in FIGS. 22, 23, and 24.
- Such a program can be recorded in advance in an HDD (Hard Disk Drive) as a recording medium built into equipment such as a computer device, or in a ROM in a microcomputer having a CPU.
- the program may be a flexible disk, CD-ROM (Compact Disk Read Only Memory), MO (Magneto Optical) disk, DVD (Digital Versatile Disc), Blu-ray Disc (registered trademark), magnetic disk, semiconductor It can be stored (recorded) temporarily or permanently in a removable recording medium such as a memory or a memory card.
- a removable recording medium can be provided as so-called package software.
- it can also be downloaded from a download site via a network such as a LAN (Local Area Network) or the Internet.
- LAN Local Area Network
- the image sensor IS includes a pixel array section 41a in which a plurality of pixels are arranged two-dimensionally, and a frame memory (second frame memory) that stores image data output from the pixel array section 41a.
- a layer storage unit 45a an image processing unit (CVDSP 42c) that performs image processing on the image data stored in the frame memory, and an artificial intelligence model using the image data subjected to image processing by the image processing unit as an input tensor.
- an inference processing unit AI image processing unit 44
- image processing is not performed on input data for each line, but on input data that is image data including at least a plurality of lines of pixel data. Therefore, the image processing unit can perform processing on the entire input image. Compared to similar processing using ISP, which processes data line by line, when processing the entire input image as image data stored in frame memory, it is processed into line data. Since there is no need to do this, it is possible to speed up the processing and reduce the processing load.
- the image processing section (CVDSP 42c) and the inference processing section (AI image processing section 44) in the image sensor IS may be provided as different processors. Thereby, an appropriate processor can be applied according to the processing content of each of the image processing section and the inference processing section.
- the image processing unit (CVDSP 42c) in the image sensor IS may perform CV processing. Furthermore, in the image sensor IS, the CV processing may include at least a portion of edge enhancement processing, scaling processing, and affine transformation processing.
- the image processing unit (CVDSP 42c) performs such CV processing on the frame image, efficient processing can be performed. Specifically, when performing CV processing using the ISP, since the ISP processes each line data, it is necessary to convert the image data to line data and perform the CV processing. On the other hand, by configuring the image processing section with a DSP or the like, CV processing can be performed without converting the frame image into line data, so that processing efficiency can be improved.
- the image processing unit (CVDSP 42c) in the image sensor IS may generate the input tensor of the artificial intelligence model.
- image data and the like suitably corrected by the image processing unit are input to the artificial intelligence model as an input tensor. Therefore, highly accurate inference processing can be performed.
- the information processing method includes a process of storing image data output from a pixel array section 41a in which a plurality of pixels are arranged two-dimensionally, image processing on the stored image data, and image processing.
- a computer device executes inference processing using an artificial intelligence model using image data as an input tensor.
- the program in the present technology is a program readable by a computer device, and causes the arithmetic processing unit of the image sensor IS to execute each process shown in FIGS. 22, 23, and 24.
- the present technology can also adopt the following configuration.
- a pixel array section in which a plurality of pixels are arranged two-dimensionally; a frame memory that stores image data output from the pixel array section; an image processing unit that performs image processing on the image data stored in the frame memory;
- An image sensor comprising: an inference processing unit that performs inference processing using an artificial intelligence model using image data subjected to image processing by the image processing unit as an input tensor.
- the image sensor according to (1) above, wherein the image processing section and the inference processing section are provided as different processors.
- the CV processing includes at least a portion of edge enhancement processing, scaling processing, and affine transformation processing.
- the image processing unit generates an input tensor for the artificial intelligence model.
- a program readable by a computer device A function for storing image data output from a pixel array section in which a plurality of pixels are arranged two-dimensionally; a function of executing image processing on the stored image data; A program that causes a computer device to realize a function of executing inference processing using an artificial intelligence model using image data subjected to the image processing as an input tensor.
- a pixel array section in which a plurality of pixels are arranged two-dimensionally; an image processing unit that outputs second image data obtained by performing image processing on the first image data; an inference processing unit that performs inference processing using an artificial intelligence model on the second image data and outputs an inference result,
- the image processing uses a plurality of lines in the first image data for each processing unit.
- Image sensor
- a pixel array unit in which a plurality of pixels are arranged two-dimensionally to generate and output first image data; an inference processing unit that performs inference processing using an artificial intelligence model on data based on the first image data and outputs an inference result; an image processing unit that outputs second image data obtained by performing image processing on the first image data based on the inference result, The image processing uses a plurality of lines in the first image data for each processing unit.
- Image sensor
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Signal Processing (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Vascular Medicine (AREA)
- Image Processing (AREA)
- Transforming Light Signals Into Electric Signals (AREA)
Priority Applications (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202380036686.7A CN119111076A (zh) | 2022-05-10 | 2023-04-24 | 图像传感器、信息处理方法和程序 |
| US18/860,726 US20250292563A1 (en) | 2022-05-10 | 2023-04-24 | Image sensor, information processing method, and program |
| JP2024520363A JPWO2023218936A1 (https=) | 2022-05-10 | 2023-04-24 | |
| EP23803418.5A EP4525473A4 (en) | 2022-05-10 | 2023-04-24 | Image sensor, information processing method, and program |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2022077704 | 2022-05-10 | ||
| JP2022-077704 | 2022-05-10 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2023218936A1 true WO2023218936A1 (ja) | 2023-11-16 |
Family
ID=88730311
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2023/016162 Ceased WO2023218936A1 (ja) | 2022-05-10 | 2023-04-24 | イメージセンサ、情報処理方法、プログラム |
Country Status (6)
| Country | Link |
|---|---|
| US (1) | US20250292563A1 (https=) |
| EP (1) | EP4525473A4 (https=) |
| JP (1) | JPWO2023218936A1 (https=) |
| CN (1) | CN119111076A (https=) |
| TW (1) | TW202409978A (https=) |
| WO (1) | WO2023218936A1 (https=) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2025134692A1 (ja) * | 2023-12-18 | 2025-06-26 | ソニーセミコンダクタソリューションズ株式会社 | 信号処理装置、信号処理方法、プログラム |
| WO2025225483A1 (ja) * | 2024-04-25 | 2025-10-30 | 株式会社ソニー・インタラクティブエンタテインメント | 画像処理装置、画像処理方法、及びプログラム |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2018051809A1 (ja) | 2016-09-16 | 2018-03-22 | ソニーセミコンダクタソリューションズ株式会社 | 撮像装置、及び、電子機器 |
| JP2020025263A (ja) * | 2018-07-31 | 2020-02-13 | ソニーセミコンダクタソリューションズ株式会社 | 積層型受光センサ及び電子機器 |
Family Cites Families (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US12034015B2 (en) * | 2018-05-25 | 2024-07-09 | Meta Platforms Technologies, Llc | Programmable pixel array |
| CN115280760A (zh) * | 2020-03-19 | 2022-11-01 | 索尼半导体解决方案公司 | 固态成像装置 |
| JP2023525950A (ja) * | 2020-05-07 | 2023-06-20 | メタ プラットフォームズ テクノロジーズ, リミテッド ライアビリティ カンパニー | スマートセンサ |
-
2023
- 2023-04-24 JP JP2024520363A patent/JPWO2023218936A1/ja active Pending
- 2023-04-24 US US18/860,726 patent/US20250292563A1/en active Pending
- 2023-04-24 CN CN202380036686.7A patent/CN119111076A/zh active Pending
- 2023-04-24 WO PCT/JP2023/016162 patent/WO2023218936A1/ja not_active Ceased
- 2023-04-24 EP EP23803418.5A patent/EP4525473A4/en active Pending
- 2023-05-02 TW TW112116252A patent/TW202409978A/zh unknown
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2018051809A1 (ja) | 2016-09-16 | 2018-03-22 | ソニーセミコンダクタソリューションズ株式会社 | 撮像装置、及び、電子機器 |
| JP2020025263A (ja) * | 2018-07-31 | 2020-02-13 | ソニーセミコンダクタソリューションズ株式会社 | 積層型受光センサ及び電子機器 |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP4525473A4 |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2025134692A1 (ja) * | 2023-12-18 | 2025-06-26 | ソニーセミコンダクタソリューションズ株式会社 | 信号処理装置、信号処理方法、プログラム |
| WO2025225483A1 (ja) * | 2024-04-25 | 2025-10-30 | 株式会社ソニー・インタラクティブエンタテインメント | 画像処理装置、画像処理方法、及びプログラム |
Also Published As
| Publication number | Publication date |
|---|---|
| CN119111076A (zh) | 2024-12-10 |
| US20250292563A1 (en) | 2025-09-18 |
| EP4525473A1 (en) | 2025-03-19 |
| TW202409978A (zh) | 2024-03-01 |
| EP4525473A4 (en) | 2025-08-13 |
| JPWO2023218936A1 (https=) | 2023-11-16 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| EP4440130A1 (en) | Information processing device, information processing method, and program | |
| WO2023238723A1 (ja) | 情報処理装置、情報処理システム、情報処理回路及び情報処理方法 | |
| WO2023189439A1 (ja) | 情報処理装置、情報処理システム | |
| WO2023218936A1 (ja) | イメージセンサ、情報処理方法、プログラム | |
| EP4439294A1 (en) | Information processing device, information processing method, and program | |
| JP2024059428A (ja) | 信号処理装置、信号処理方法、記憶媒体 | |
| WO2023218935A1 (ja) | イメージセンサ、情報処理方法、プログラム | |
| EP4439357A1 (en) | Information processing device, information processing method, image-capturing device, and control method | |
| WO2023218934A1 (ja) | イメージセンサ | |
| WO2025197575A1 (ja) | 信号処理装置、情報処理装置 | |
| JP7713507B2 (ja) | 情報処理装置、情報処理方法、及び、プログラム | |
| EP4571545A1 (en) | Method for processing information, server device, and information processing device | |
| EP4567669A1 (en) | Information processing device, information processing method, and information processing system | |
| WO2024202366A1 (ja) | 情報処理装置、情報処理方法、記録媒体、推論装置、制御方法 | |
| WO2024241917A1 (ja) | 情報処理装置、情報処理方法、プログラム | |
| WO2024202501A1 (ja) | 撮像装置、撮像装置システム、プログラム保護方法及び記憶媒体 | |
| TW202416181A (zh) | 資訊處理裝置、資訊處理方法、電腦可讀取之非暫時性記憶媒體及終端裝置 | |
| TW202433409A (zh) | 影像處理裝置、影像處理方法及記錄媒體 | |
| WO2024014293A1 (ja) | 送信装置、受信装置、情報処理方法 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23803418 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2024520363 Country of ref document: JP Kind code of ref document: A |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 202380036686.7 Country of ref document: CN |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 18860726 Country of ref document: US |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2023803418 Country of ref document: EP |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| ENP | Entry into the national phase |
Ref document number: 2023803418 Country of ref document: EP Effective date: 20241210 |
|
| WWP | Wipo information: published in national office |
Ref document number: 18860726 Country of ref document: US |