WO2024029347A1 - 情報処理装置、情報処理方法、及び情報処理システム - Google Patents

情報処理装置、情報処理方法、及び情報処理システム Download PDF

Info

Publication number
WO2024029347A1
WO2024029347A1 PCT/JP2023/026530 JP2023026530W WO2024029347A1 WO 2024029347 A1 WO2024029347 A1 WO 2024029347A1 JP 2023026530 W JP2023026530 W JP 2023026530W WO 2024029347 A1 WO2024029347 A1 WO 2024029347A1
Authority
WO
WIPO (PCT)
Prior art keywords
network
feature map
data
artificial intelligence
intermediate feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2023/026530
Other languages
English (en)
French (fr)
Japanese (ja)
Inventor
健司 山川
凌平 川崎
良仁 浴
英太 柳沢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sony Semiconductor Solutions Corp
Original Assignee
Sony Semiconductor Solutions Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sony Semiconductor Solutions Corp filed Critical Sony Semiconductor Solutions Corp
Priority to CN202380055835.4A priority Critical patent/CN119768798A/zh
Priority to US18/998,530 priority patent/US20260030881A1/en
Priority to KR1020257005536A priority patent/KR20250048266A/ko
Priority to JP2024538918A priority patent/JPWO2024029347A1/ja
Priority to EP23849902.4A priority patent/EP4567669A4/en
Publication of WO2024029347A1 publication Critical patent/WO2024029347A1/ja
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/285Selection of pattern recognition techniques, e.g. of classifiers in a multi-classifier system
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/091Active learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/096Transfer learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/098Distributed learning, e.g. federated learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/72Data preparation, e.g. statistical preprocessing of image or video features
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/94Hardware or software architectures specially adapted for image or video understanding
    • G06V10/95Hardware or software architectures specially adapted for image or video understanding structured as a network, e.g. client-server architectures
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/776Validation; Performance evaluation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/54Surveillance or monitoring of activities, e.g. for recognising suspicious objects of traffic, e.g. cars on the road, trains or boats

Definitions

  • the present technology relates to an information processing device, a method thereof, and an information processing system.
  • the present technology relates to an artificial intelligence model that has a neural network and uses detected data from a sensor device as input data. Concerning technology for adaptation.
  • AI camera an imaging device
  • AI Artificial Intelligence
  • one or more AI cameras may be placed in a store, and recognition results regarding customer attributes (e.g., gender, age group, etc.) and behavior patterns may be obtained through image recognition processing, and these recognition results may be transmitted via the Internet, etc.
  • a business that provides a system for presenting information to users can be considered. Since AI processing is performed on the AI camera side (edge side), compared to a system in which the server device side (cloud side) performs AI processing based on captured images acquired from each camera, processing is distributed and communication It is possible to reduce the amount of data.
  • the server device re-learns the AI model that the AI camera has.
  • the environment in which the AI camera is placed for example, in a store in a country where the customer population is limited, such as Japan, or in a store in a country where the customer population is diverse, such as the United States. It is assumed that re-learning will be performed to adapt the AI camera to the environmental conditions in which it is used, such as differences in the area in which it is used, such as where it is placed.
  • Patent Document 1 discloses that an AI model having a DNN (Deep Neural Network) is divided into a first DNN processing unit and a second DNN processing unit, and a feature map obtained by the first DNN processing unit in the previous stage is used as input data. , a configuration is disclosed in which a second DNN processing unit in the latter stage performs inference processing (for example, object recognition processing).
  • DNN Deep Neural Network
  • an increase in the number of required learning times also means an increase in the number of learning input data that must be prepared. If it is assumed that the user of the AI model to be retrained will upload the training input data for retraining to the cloud side, an increase in the number of training input data means that the user will upload the training input data to the cloud side. This results in an increase in the amount of data to be transmitted, and as a result, an increase in the amount of communication data required to adapt the AI model to the environmental conditions in which the AI camera is used.
  • This technology was created in consideration of the above circumstances, and it shortens the time required to adapt the edge-side artificial intelligence model to the usage environment conditions of the sensor device, reduces the amount of communication data, and reduces the amount of personal information.
  • the purpose is to reduce the possibility of leakage.
  • a first information processing device obtains information in a predetermined intermediate layer of the artificial intelligence model when the input data is provided to an artificial intelligence model that has a neural network and uses detected data by a sensor device as input data.
  • input an intermediate feature map from an external device and select one of the subsequent networks from among a plurality of candidates for subsequent networks that are subsequent networks to the predetermined intermediate layer in the artificial intelligence model, based on the intermediate feature map.
  • a second-stage network acquisition unit that generates one of the second-stage networks based on the learned network as a base and the intermediate feature map, and externally transmits configuration data of the second-stage network selected or generated by the second-stage network acquisition unit.
  • a transmission processing unit that performs a process of transmitting data to.
  • a corresponding subsequent network may be generated from the base trained network by, for example, distillation.
  • the number of times of learning required to generate a candidate post-stage network can be made smaller than the number of times of learning required to re-learn the entire network including the previous-stage network.
  • the number of times of learning required to generate a corresponding post-stage network from the trained network that is the base can also be made smaller than the number of times of learning required to re-learn the entire network.
  • the time required to generate a subsequent network suitable for the environmental conditions in which the sensor device is used can be shortened. Further, by reducing the number of times of learning required, it is possible to reduce the number of learning input data required to generate a subsequent network suitable for the environmental conditions in which the sensor device is used. Furthermore, according to the above configuration, in order to obtain one subsequent network suitable for the operating environment conditions of the sensor device, it is sufficient to obtain the intermediate feature map from the edge side. Since the intermediate feature map is data obtained by processing input data in an intermediate layer, it is difficult to identify personal information as it is.
  • the information processing device when the information processing device provides the input data to an artificial intelligence model that has a neural network and uses detected data by a sensor device as input data, the An intermediate feature map obtained from a predetermined intermediate layer of the model is input from an external device, and based on the intermediate feature map, one is selected from among a plurality of candidates for a subsequent network that is a network subsequent to the predetermined intermediate layer in the artificial intelligence model.
  • a subsequent-stage network acquisition process that selects one of the subsequent-stage networks or generates one of the subsequent-stage networks based on a learned network that is a base and the intermediate feature map; and
  • This is an information processing method that performs a transmission process of transmitting setting data of a downstream network to the outside.
  • Such a first information processing method also provides the same effect as the first information processing device according to the present technology described above.
  • a second information processing device is configured to obtain information in a predetermined intermediate layer of the artificial intelligence model when the input data is provided to an artificial intelligence model that has a neural network and uses detection data from a sensor device as input data.
  • a transmission processing unit that performs a process of transmitting an intermediate feature map to the outside; and an external device, based on the intermediate feature map transmitted by the transmission processing unit, in a network downstream of the predetermined intermediate layer in the artificial intelligence model.
  • One of the downstream networks selected from a plurality of candidates for a certain downstream network, or one generated by an external device based on the learned network serving as the base and the intermediate feature map transmitted by the transmission processing unit.
  • an edge-side information processing device that performs inference processing based on detection data of a sensor device
  • an external device uses an intermediate feature map as a method for determining a downstream network suitable for the operating environment conditions of the sensor device. Either a method of selecting one post-stage network from among multiple candidates for post-stage networks based on the base training network or a method of generating one post-stage network based on the base trained network and an intermediate feature map is adopted.
  • An information processing device is provided that makes it possible to do this.
  • the information processing device when the information processing device provides the input data to an artificial intelligence model having a neural network and using the detection data by the sensor device as input data, the information processing device A transmission process of transmitting an intermediate feature map obtained in a predetermined intermediate layer of the model to the outside, and an external device transmitting an intermediate feature map obtained in a predetermined intermediate layer of the model to the outside based on the intermediate feature map transmitted in the transmission process.
  • One of the downstream networks selected from a plurality of candidates for downstream networks, or one generated by an external device based on the learned network that is the base and the intermediate feature map transmitted in the transmission process.
  • Such a second information processing method also provides the same effect as the second information processing device according to the present technology described above.
  • the information processing system provides intermediate features obtained in a predetermined intermediate layer of the artificial intelligence model when the input data is given to the artificial intelligence model that has a neural network and uses detected data from a sensor device as input data.
  • a first transmission processing unit that performs a process of transmitting a map to the outside, and an external device of the device having the first transmission processing unit, and the artificial One of the post-stage networks is selected from among a plurality of candidates for post-process networks that are post-process networks after the predetermined intermediate layer in the intelligent model, or the trained network that is the base and the a second-stage network acquisition unit that generates one of the second-stage networks based on an intermediate feature map; and the second-stage network selected or generated by the second-stage network acquisition unit, which is provided in an external device of the device including the first transmission processing unit; a second transmission processing unit that performs a process of transmitting the setting data to the outside, a reception processing unit that is provided in a device having the first transmission processing unit and performs a process of receiving the setting data,
  • FIG. 1 is a block diagram showing an example of a schematic configuration of an information processing system as an embodiment.
  • FIG. 1 is a block diagram showing an example of a hardware configuration of an information processing device included in an information processing system as an embodiment.
  • FIG. FIG. 1 is a block diagram showing a configuration example of a camera including a sensor device as an embodiment. It is an explanatory view about the structural example of the sensor device as an embodiment. It is an explanatory diagram of an outline of an AI model in an embodiment.
  • FIG. 2 is an explanatory diagram of an initial AI model in the embodiment. It is an explanatory diagram of division of an initial AI model, and a front stage network and a rear stage network.
  • FIG. 6 is an explanatory diagram of an example of a method for generating candidates for a subsequent-stage network.
  • FIG. 7 is an explanatory diagram of a plurality of candidates for a subsequent network generated by relearning.
  • FIG. 3 is a diagram for explaining the respective functions of a server device and an image sensor related to selection of a subsequent network.
  • FIG. 6 is an explanatory diagram of an example of a method of selecting one subsequent network based on an intermediate feature map.
  • FIG. 3 is an explanatory diagram of functions possessed by each of the server device and image sensor related to the deployment of the selected downstream network.
  • FIG. 6 is a diagram illustrating a visualized image of an intermediate feature map obtained at two different dividing positions for the initial AI model.
  • FIG. 6 is an explanatory diagram of an image area related to personal information in an intermediate feature map.
  • FIG. 7 is an explanatory diagram of another method of setting the dividing position.
  • 3 is a flowchart illustrating an example of a processing procedure for realizing an adaptation method according to an embodiment.
  • FIG. 2 is an explanatory diagram of a method of generating a subsequent-stage network using active learning.
  • FIG. 3 is an explanatory diagram of an example of a method for generating a small-sized post-stage network using knowledge distillation.
  • FIG. 7 is an explanatory diagram of another example of a method for generating a small-sized post-stage network using knowledge distillation.
  • FIG. 12 is a flowchart illustrating an example of a processing procedure for realizing an adaptation method corresponding to the generation of a small-sized post-stage network by knowledge distillation.
  • FIG. 1 is a diagram for explaining an example of operation of an information processing system as an embodiment.
  • FIG. 7 is a diagram for explaining another example of operation of the information processing system as an embodiment. It is an explanatory diagram about a modification of AI customization.
  • FIG. 7 is an explanatory diagram of a modification example in which the output of the front-stage network is fed back to the control of imaging settings.
  • FIG. 1 is a block diagram showing a schematic configuration example of an information processing system 100 as an embodiment of the present technology.
  • the information processing system 100 includes a server device 1, one or more user terminals 2, a plurality of cameras 3, and a fog server 4.
  • the server device 1 is configured to be able to perform mutual communication with the user terminal 2 and the fog server 4 via a network 5 such as the Internet.
  • the server device 1, the user terminal 2, and the fog server 4 are configured as an information processing device equipped with a microcomputer having a CPU (Central Processing Unit), a ROM (Read Only Memory), and a RAM (Random Access Memory). .
  • a microcomputer having a CPU (Central Processing Unit), a ROM (Read Only Memory), and a RAM (Random Access Memory).
  • the user terminal 2 is an information processing device that is assumed to be used by a user who is a recipient of a service using the information processing system 100.
  • the server device 1 is an information processing device that is assumed to be used by a service provider.
  • Each camera 3 is equipped with an image sensor such as a CCD (Charge Coupled Device) type image sensor or a CMOS (Complementary Metal Oxide Semiconductor) type image sensor, and captures an image of a subject and outputs image data (captured image data) as digital data. obtain. Furthermore, as will be described later, each camera 3 also has a function of performing AI processing, which is processing using an AI (Artificial Intelligence) model, on captured images. Each camera 3 is configured to be capable of data communication with the fog server 4, and can transmit various data such as processing result information indicating the result of image processing using an AI model to the fog server 4, and can send various data from the fog server 4 to the fog server 4. It is possible to receive.
  • an image sensor such as a CCD (Charge Coupled Device) type image sensor or a CMOS (Complementary Metal Oxide Semiconductor) type image sensor
  • AI processing is processing using an AI (Artificial Intelligence) model
  • each camera 3 may be used as a variety of surveillance cameras.
  • surveillance cameras for indoors such as stores, offices, and residences
  • surveillance cameras for monitoring outdoors such as parking lots and streets (including traffic surveillance cameras, etc.)
  • cameras for FA (Factory Automation) and IA (Industrial Automation) include surveillance cameras on production lines, surveillance cameras that monitor inside and outside of cars, etc.
  • a surveillance camera is used in a store
  • a plurality of cameras 3 are placed at predetermined positions in the store, and the user can monitor the customer demographics (gender, age group, etc.) and the behavior (flow line) in the store. It is conceivable to make it possible to confirm the following.
  • information on the customer demographics of these customers, information on the flow line in the store, information on the congestion status at the checkout register (for example, information on the waiting time at the checkout register), etc. can be generated. is possible.
  • each camera 3 is placed at each position near the road so that the user can recognize information such as the license plate number (vehicle number), car color, and car model of passing vehicles. In that case, it is conceivable to generate information such as the license plate number, car color, car model, etc. as the above-mentioned analysis information.
  • the cameras should be placed so that they can monitor each parked vehicle, and monitor whether there are any suspicious persons acting suspiciously around each vehicle.
  • the cameras may be possible to notify the user of the presence of the suspicious person and the attributes of the suspicious person (gender, age group, clothing, etc.).
  • the fog server 4 is arranged for each monitoring target, for example, in the above-mentioned store monitoring application, the fog server 4 is placed in the monitored store together with each camera 3.
  • the fog server 4 is not limited to providing one for each monitoring target, but it is also possible to provide one fog server 4 for a plurality of monitoring targets.
  • the information processing system 100 may also be configured such that the fog server 4 is omitted, each camera 3 is directly connected to the network 5, and the server device 1 directly receives transmission data from a plurality of cameras 3. .
  • the server device 1 is an information processing device that has a function of comprehensively managing the information processing system 100. As shown in the figure, the server device 1 has a license authorization function F1, an account service function F2, and an AI service function F3 as functions related to the management of the information processing system 100.
  • the license authorization function F1 is a function that performs various types of authentication-related processing. Specifically, in the license authorization function F1, processing related to device authentication of each camera 3 and processing related to authentication of data such as an AI model used in the camera 3 are performed.
  • a device ID is issued for each camera 3 when the camera 3 is connected via the network 5 (in this example, the connection is via the fog server 4). Processing takes place. Furthermore, regarding the authentication of the AI model, a process is performed to issue a unique ID (AI model ID) for the AI model that has been applied for registration.
  • AI model ID a unique ID
  • the license authorization function F1 various keys and certificates for performing secure communication between the camera 3 and the server device 1 are sent to the manufacturer of the camera 3 (in particular, the manufacturer of the image sensor 30, which will be described later). In addition to the process of issuing the certificate to the manufacturer (manufacturer), the process of suspending and renewing the validity of the certificate is also performed. Furthermore, in the license authorization function F1, when user registration (registration of account information accompanied by issuance of a user ID) is performed by the account service function F2 described below, the camera 3 (device ID above) purchased by the user and A process of linking the information with the user ID is also performed.
  • the account service function F2 is a function that generates and manages user account information.
  • the account service function F2 receives input of user information and generates account information based on the input user information (generates account information including at least user ID and password information).
  • the AI service function F3 is a function for providing the user with a service related to the use of the camera 3 as an AI camera.
  • This AI service function F3 is a function of deploying an AI model to the camera 3 based on instructions from the user. Deployment here means a transmission process for installing an AI model into a target device so that it can be used.
  • one of the AI service functions F3 can include the function related to the generation of analysis information described above. That is, it is a function that generates analysis information of a subject based on processing result information of AI processing in the camera 3 and performs processing for allowing the user to view the generated analysis information via the user terminal 2.
  • one of the AI service functions F3 can include an AI model relearning function. That is, it is a relearning function for the AI model installed in the camera 3. In this embodiment, the relearning function performs processing to adapt the AI model to the operating environment conditions of the camera 3 (image sensor), but this point will be explained later.
  • the server device 1 alone realizes the license authorization function F1, the account service function F2, and the AI service function F3, but these functions are shared and realized by a plurality of information processing devices. It is also possible to have a configuration. For example, it is conceivable that one information processing device performs each of the above functions. Alternatively, it is also possible to adopt a configuration in which one of the functions described above is shared by a plurality of information processing apparatuses.
  • FIG. 2 is a block diagram showing an example of the hardware configuration of the server device 1. As shown in FIG. Note that the hardware configuration examples of the user terminal 2 and the fog server are also similar to those shown in FIG. 2.
  • the server device 1 includes a CPU 11.
  • the CPU 11 functions as an arithmetic processing unit that performs the various processes described above as the processes of the server device 1, and stores data in the ROM 12 or a nonvolatile memory unit 14 such as an EEP-ROM (Electrically Erasable Programmable Read-Only Memory).
  • EEP-ROM Electrically Erasable Programmable Read-Only Memory
  • Various processes are executed according to the programs currently running or programs loaded from the storage unit 19 into the RAM 13.
  • the RAM 13 also appropriately stores data necessary for the CPU 11 to execute various processes.
  • the CPU 11, ROM 12, RAM 13, and nonvolatile memory section 14 are interconnected via a bus 23.
  • An input/output interface (I/F) 15 is also connected to this bus 23.
  • the input/output interface 15 is connected to an input section 16 consisting of an operator or an operating device.
  • the input unit 16 may be various operators or operating devices such as a keyboard, mouse, keys, dial, touch panel, touch pad, or remote controller.
  • a user's operation is detected by the input unit 16, and a signal corresponding to the input operation is interpreted by the CPU 11.
  • a display section 17 made of an LCD (Liquid Crystal Display) or an organic EL (Electro-Luminescence) display, and an audio output section 18 made of a speaker or the like, either integrally or separately.
  • the display unit 17 is used to display various information, and is configured by, for example, a display device provided in the housing of the computer device, a separate display device connected to the computer device, or the like.
  • the display unit 17 displays images for various image processing, moving images to be processed, etc. on the display screen based on instructions from the CPU 11. Further, the display unit 17 displays various operation menus, icons, messages, etc., ie, as a GUI (Graphical User Interface), based on instructions from the CPU 11.
  • GUI Graphic User Interface
  • the input/output interface 15 may be connected to a storage section 19 made up of an HDD (Hard Disk Drive), a solid-state memory, or the like, and a communication section 20 made up of a modem or the like.
  • a storage section 19 made up of an HDD (Hard Disk Drive), a solid-state memory, or the like
  • a communication section 20 made up of a modem or the like.
  • the communication unit 20 performs communication processing via a transmission path such as the Internet, and communicates with various devices by wire/wireless communication, bus communication, etc.
  • a drive 21 is also connected to the input/output interface 15 as necessary, and a removable recording medium 22 such as a magnetic disk, optical disk, magneto-optical disk, or semiconductor memory is appropriately installed.
  • the drive 21 can read data files such as programs used for each process from the removable recording medium 22.
  • the read data file is stored in the storage section 19, and images and sounds included in the data file are outputted on the display section 17 and the audio output section 18. Further, computer programs and the like read from the removable recording medium 22 are installed in the storage unit 19 as necessary.
  • software for the processing of this embodiment can be installed via network communication by the communication unit 20 or the removable recording medium 22.
  • the software may be stored in the ROM 12, storage unit 19, etc. in advance.
  • the server device 1 is not limited to being configured by a single computer device as shown in FIG. 2, but may be configured by systemizing a plurality of computer devices.
  • the plurality of computer devices may be systemized using a LAN (Local Area Network) or the like, or may be placed at a remote location via a VPN (Virtual Private Network) using the Internet or the like.
  • the plurality of computer devices may include computer devices as a server group (cloud) that can be used by a cloud computing service.
  • FIG. 3 is a block diagram showing an example of the configuration of the camera 3.
  • the camera 3 includes an image sensor 30, an imaging optical system 31, an optical system drive section 32, a control section 33, a memory section 34, a communication section 35, and a sensor section 36.
  • the image sensor 30, the control section 33, the memory section 34, the communication section 35, and the sensor section 36 are connected via a bus 37, and are capable of mutual data communication.
  • the imaging optical system 31 includes lenses such as a cover lens, zoom lens, and focus lens, and an aperture (iris) mechanism. This imaging optical system 31 guides light (incident light) from the subject and focuses it on the light receiving surface of the image sensor 30 .
  • the optical system drive unit 32 comprehensively represents the zoom lens, focus lens, and aperture mechanism drive units included in the imaging optical system 31.
  • the optical system drive unit 32 includes actuators for driving each of the zoom lens, focus lens, and aperture mechanism, and a drive circuit for the actuators.
  • the control unit 33 is configured with a microcomputer having, for example, a CPU, a ROM, and a RAM, and the CPU executes various processes according to programs stored in the ROM or programs loaded in the RAM, thereby controlling the camera. Performs overall control of step 3.
  • a microcomputer having, for example, a CPU, a ROM, and a RAM, and the CPU executes various processes according to programs stored in the ROM or programs loaded in the RAM, thereby controlling the camera. Performs overall control of step 3.
  • control unit 33 instructs the optical system drive unit 32 to drive the zoom lens, focus lens, aperture mechanism, etc.
  • the optical system drive unit 32 moves the focus lens and zoom lens, opens and closes the aperture blades of the aperture mechanism, etc. in response to these drive instructions.
  • the control unit 33 controls writing and reading of various data to and from the memory unit 34 .
  • the memory unit 34 is a nonvolatile storage device such as an HDD or a flash memory device, and is used to store data used by the control unit 33 to execute various processes. Furthermore, the memory unit 34 can also be used as a storage destination (recording destination) for image data output from the image sensor 30.
  • the control unit 33 performs various data communications with external devices via the communication unit 35.
  • the communication unit 35 in this example is configured to be able to perform data communication with at least the fog server 4 shown in FIG. 1.
  • the communication unit 35 may be able to communicate via the network 5 and perform data communication with the server device 1.
  • the sensor unit 36 comprehensively represents sensors other than the image sensor 30 included in the camera 3.
  • Examples of the sensors included in the sensor unit 36 include a GNSS (Global Navigation Satellite System) sensor and altitude sensor for detecting the position and altitude of the camera 3, a temperature sensor for detecting the environmental temperature, and a sensor for detecting the movement of the camera 3.
  • Examples include motion sensors such as acceleration sensors and angular velocity sensors.
  • the image sensor 30 is configured as a solid-state imaging device such as a CCD type or a CMOS type, and as shown in the figure, includes an imaging section 41, an image signal processing section 42, an internal sensor control section 43, an AI processing section 44, a memory section 45, and a computer. It includes a vision processing section 46 and a communication interface (I/F) 47. Each of these units is capable of communicating data with each other via a bus 48.
  • This image sensor 30 is an embodiment of an information processing device according to the present technology.
  • the imaging section 41 includes a pixel array section in which pixels having photoelectric conversion elements such as photodiodes are arranged two-dimensionally, and a readout circuit that reads out electrical signals obtained by photoelectric conversion from each pixel included in the pixel array section. ing.
  • the readout circuit performs, for example, CDS (Correlated Double Sampling) processing, AGC (Automatic Gain Control) processing, etc. on the electrical signal obtained by photoelectric conversion, and further performs A/D (Analog to Digital) conversion processing.
  • the image signal processing unit 42 performs preprocessing, synchronization processing, YC generation processing, resolution conversion processing, codec processing, etc. on the captured image signal as digital data after A/D conversion processing.
  • Pre-processing includes clamp processing to clamp the black levels of R (red), G (green), and B (blue) to predetermined levels for the captured image signal, and correction processing between the R, G, and B color channels. etc.
  • color separation processing is performed so that the image data for each pixel has all R, G, and B color components. For example, in the case of an image sensor using a Bayer array color filter, demosaic processing is performed as color separation processing.
  • a luminance (Y) signal and a color (C) signal are generated (separated) from R, G, and B image data.
  • the resolution conversion process the resolution conversion process is performed on image data that has been subjected to various types of signal processing.
  • the codec processing the image data that has been subjected to the various processes described above is subjected to encoding processing for recording or communication, and file generation, for example.
  • video file formats such as MPEG-2 (MPEG: Moving Picture Experts Group) and H. It is possible to generate files in formats such as H.264. It is also conceivable to generate a still image file in a format such as JPEG (Joint Photographic Experts Group), TIFF (Tagged Image File Format), or GIF (Graphics Interchange Format).
  • the in-sensor control unit 43 includes, for example, a microcomputer configured with a CPU, ROM, RAM, etc., and controls the operation of the image sensor 30 in an integrated manner. For example, the in-sensor control unit 43 issues instructions to the imaging unit 41 to control the execution of imaging operations. It also controls the execution of processing for the AI processing section 44, image signal processing section 42, and computer vision processing section 46. Further, the in-sensor control unit 43 is configured to set the AI model (AI model setting data) in the AI processing unit 44 when the AI model (AI model setting data) is deployed from the server device 1 to the image sensor 30 as described later. Perform the processing to make it. That is, this is a process of setting the AI model in the AI processing unit 44 so that the AI processing unit 44 can execute AI processing using the AI model.
  • AI model AI model setting data
  • the AI processing unit 44 includes a programmable arithmetic processing device such as a CPU, an FPGA (Field Programmable Gate Array), and a DSP (Digital Signal Processor), and performs AI processing on the captured image.
  • a programmable arithmetic processing device such as a CPU, an FPGA (Field Programmable Gate Array), and a DSP (Digital Signal Processor), and performs AI processing on the captured image.
  • An example of the AI processing performed by the AI processing unit 44 is image recognition processing.
  • Image recognition processing here broadly refers to processing that recognizes image content, such as recognition processing of the type of subject (e.g., people, animals, cars, buildings, etc.), and recognition of the presence or absence of the subject and its range. recognition processing (so-called object detection processing).
  • the AI processing function by the AI processing unit 44 can be switched by changing the AI model (AI processing algorithm) set in the AI processing unit 44.
  • the memory section 45 is made up of a nonvolatile memory, and is used to store data necessary for the AI processing section 44 to perform AI processing. Specifically, the setting data of the AI model (for example, various weighting coefficients used in convolution calculations on the neural network, data indicating the structure of the neural network, etc.) is stored in the memory unit 45. Furthermore, in this example, the memory unit 45 is also used to hold captured image data processed by the image signal processing unit 42.
  • the setting data of the AI model for example, various weighting coefficients used in convolution calculations on the neural network, data indicating the structure of the neural network, etc.
  • the computer vision processing unit 46 performs rule-based image processing as image processing on the captured image data.
  • Examples of the rule-based image processing here include super-resolution processing and the like.
  • the communication interface 47 is an interface that communicates with various units connected via the bus 37, such as the control unit 33 and the memory unit 34 outside the image sensor 30. For example, the communication interface 47 performs communication for acquiring an AI model used by the AI processing unit 44 from the outside under the control of the in-sensor control unit 43. Further, processing result information of the AI processing by the AI processing unit 44 is output to the outside of the image sensor 30 via the communication interface 47 .
  • the in-sensor control unit 43 is able to perform data communication with the server device 1 via the communication interface 47, the communication unit 35, and the fog server 4.
  • the internal sensor control unit 43 receives various data such as AI model setting data from the server device 1 as described later, and transmits various data such as processing result information by the AI processing unit 44 to the server device 1. It is possible to send.
  • the image sensor 30 has an AI processing section 44 that performs AI processing, an internal sensor control section 43 as a computer device, etc. in addition to the imaging section 41 having a pixel array section. It is said that An example of the structure of such an image sensor 30 will be explained with reference to FIG. 4. Note that the structure shown in FIG. 4 is just an example, and other structures can of course be adopted.
  • the image sensor 30 in this example has a two-layer structure (stacked structure) in which two dies, a die D1 and a die D2, are stacked.
  • the image sensor 30 in this example is configured as a one-chip semiconductor device in which a die D1 and a die D2 are bonded together.
  • the die D1 is a die on which an imaging section 41 is formed
  • the die D2 is a die on which an image signal processing section 42, an in-sensor control section 43, an AI processing section 44, a memory section 45, a computer vision processing section 46, and a communication interface 47 are formed. It is a die with The die D1 and the die D2 are physically and electrically connected by, for example, an interchip bonding technique such as Cu--Cu bonding.
  • AI model 50 used in the camera 3 of this embodiment.
  • the AI model 50 is assumed to be an AI model having a neural network, specifically, a DNN (Deep Neural Network). More specifically, it is an AI model that uses a captured image (an RGB image in this example) obtained by the imaging unit 41 of the image sensor 30 as input data and performs inference processing as image recognition processing.
  • a post-processing unit 55 decodes recognition result information indicating the result of image recognition processing based on the output data of the AI model 50.
  • the image recognition process is a process of recognizing attributes such as the age and gender of the target subject as a person. decodes information indicating the recognition results of attributes such as gender and gender.
  • the AI model 50 calculates a value representing a score (likelihood) for the recognition result, and the post-processing unit 55 decodes this value to obtain information including the score as recognition result information.
  • the AI model 50 that performs inference processing as image recognition processing as described above, it is assumed that the AI model 50 as the initial AI model 51 is first deployed to the image sensor 30 in the camera 3. .
  • FIG. 6 is an explanatory diagram of the initial AI model 51.
  • the initial AI model 51 a general-purpose AI model 50 that has been subjected to machine learning so as to be compatible with various usage environment conditions of the image sensor 30 is prepared.
  • FIG. 6A is an explanatory diagram of machine learning for generating the initial AI model 51.
  • a DNN network 50n having a predetermined network structure for realizing inference processing as image recognition processing is prepared, and a plurality of image data as learning input data and each image data A training data set is prepared, which consists of the following information and label data indicating the correct answer information of the image recognition results.
  • the image data to be prepared as input data for learning should be prepared in a way that corresponds to each of the assumed usage environmental conditions of the image sensor 30 so that the initial AI model 51 after learning has versatility. do. That is, captured images obtained when imaging is performed under each usage environment condition are prepared. For example, if the assumed usage environment conditions are those related to usage regions such as Japan, the United States, and Europe, prepare captured images for each usage region that would be obtained if imaging was performed in that region. do.
  • a method is adopted in which the initial AI model is divided at a predetermined position, and only the latter-stage network, which is a later-stage network than a predetermined intermediate layer, is targeted for customization.
  • FIG. 7 is an explanatory diagram of the division of the initial AI model 51 and the front-stage network and the rear-stage network.
  • the initial AI model 51 is divided with respect to the plurality of intermediate layers that the initial AI model 51 has, using a predetermined interlayer position as the division position Dv (FIG. 7A).
  • the network on the front stage side of this division position Dv is called the "front stage network”, and the network on the rear stage side is called the “second stage network” (FIG. 7B).
  • the front-stage network outputs a feature map obtained in the intermediate layer immediately before the dividing position Dv.
  • the feature map obtained in the intermediate layer immediately before the dividing position Dv, which is thus outputted by the front-stage network will be hereinafter referred to as "intermediate feature map IRM.”
  • the intermediate feature map IRM when customizing the downstream network, the intermediate feature map IRM is transmitted from the image sensor 30 side (edge side) to the server device 1 side (cloud side).
  • the intermediate feature map IRM is data that makes it difficult to identify personal information as it is. Therefore, even if it is necessary to transmit the intermediate feature map IRM from the image sensor 30 side to the server device 1 side as described above, it is possible to reduce the possibility of leakage of personal information. That is, in adapting the edge-side artificial intelligence model to the usage environment conditions of the sensor device, the possibility of leakage of personal information is reduced.
  • the division position Dv is determined at the second and subsequent interlayer positions among the interlayer positions of the intermediate layer in the initial AI model 51.
  • the intermediate feature map IRM is output data of the second and subsequent intermediate layers in the initial AI model 51.
  • the second and subsequent intermediate layers it can be said that data in which it is more difficult to identify personal information is obtained as output data than in the output data of the first intermediate layer (because the input data is more dimensionally compressed). Therefore, by setting the dividing position Dv as described above, it is possible to enhance the effect of reducing the possibility of leakage of personal information.
  • a customization method for the initial AI model 51 a plurality of candidates for later-stage networks are prepared, and one of the candidates for the latter-stage network is selected as an intermediate network acquired from the target image sensor 30. A method of selection is adopted based on the feature map IRM.
  • FIG. 8 is an explanatory diagram of an example of a method for generating candidates for a subsequent network.
  • the plurality of post-stage network candidates are generated by relearning only the post-stage network in the initial AI model 51 using different training data sets.
  • N types of learning data sets (from the first learning data set to the Nth learning data set) are prepared as learning data sets for relearning, and the initial AI model
  • N types of re-learning are performed using each corresponding one of these N types of training data sets to generate N types of subsequent-stage networks.
  • the relearning of only the latter-stage network can be performed with the weighting coefficients in the former-stage network fixed.
  • the learning data sets used include different types of image data included as learning input data among the learning data sets.
  • a learning data set is used, each of which includes image data captured in different environments as learning input data.
  • the first training dataset includes a plurality of image data captured in the first environment as training input data
  • the second training dataset includes a plurality of image data captured in the first environment as training input data. This includes a plurality of image data captured in a second environment different from the first environment.
  • N downstream networks are created, each of which is compatible with one of the N types of environments, as shown in Figure 9. Network candidates are generated.
  • the N types of downstream network candidates generated in this way are stored in a storage device that can be read by the CPU 11 of the server device 1, such as the storage unit 19 in the server device 1.
  • the server device 1 selects one downstream network suitable for the usage environment conditions of the image sensor 30, based on the intermediate feature map IRM acquired from the target image sensor 30.
  • FIG. 10 is a diagram for explaining the respective functions of the server device 1 and the image sensor 30 related to such selection of a subsequent network.
  • the server device 1 has a function as a subsequent-stage network acquisition section F11
  • the image sensor 30 has a function as a transmission processing section F31.
  • the function as the subsequent network acquisition unit F11 is a function realized by software processing by the CPU 11 of the server device 1
  • the function as the transmission processing unit F31 is a function realized by software processing by the in-sensor control unit 43 in the image sensor 30. It is said to be a function realized by .
  • the transmission processing unit F31 performs a process of transmitting to the outside the intermediate feature map IRM obtained when input data is given to the initial AI model 51.
  • the intermediate feature map IRM obtained when the captured image obtained by the imaging unit 41 (that is, the image captured in the usage environment of the image sensor 30) is given as input data to the initial AI model 51 is transmitted to the server device 1. It performs processing for sending to.
  • the transmission processing unit F31 (in-sensor control unit 43) performs a process of outputting the intermediate feature map IRM to the outside of the image sensor 30 via the communication interface 47, and also sends the intermediate feature map IRM to the control unit 33, for example.
  • An instruction is given to execute the process of transmitting the IRM to the server device 1.
  • the intermediate feature map IRM is transmitted to the server device 1 via the communication unit 35 in the camera 3 and the fog server 4.
  • the downstream network acquisition unit F11 inputs the intermediate feature map IRM transmitted from the image sensor 30 side in this way, and selects one of the multiple downstream network candidates based on the input intermediate feature map IRM. Select the downstream network.
  • various methods can be considered for selecting one subsequent network from a plurality of candidates for subsequent networks based on the intermediate feature map IRM.
  • an intermediate feature map IRM is given as input data for each candidate of the subsequent network to execute inference processing, and based on the score for the inference result calculated by the post-processing unit 55,
  • One possible method is to select one subsequent network.
  • the latter network with the best score is selected as the latter network suitable for the usage environment of the target image sensor 30.
  • a binary search method or the like may be employed to reduce the amount of processing involved in selection.
  • scores are calculated by giving intermediate feature maps IRM as input data in a predetermined order for N candidates, and when a subsequent network with a score equal to or greater than a predetermined threshold is detected, the subsequent network is adapted. It is also possible to select it as a subsequent network.
  • the selection of the subsequent network based on the intermediate feature map IRM may be performed using AI.
  • AI an AI model obtained by machine learning using intermediate feature map IRM as learning input data and correct answer information of a subsequent network to be selected for the intermediate feature map IRM as training data.
  • FIG. 12 is a diagram for explaining the functions of the server device 1 and the image sensor 30 related to the deployment of the selected downstream network.
  • the server device 1 has a function of a transmission processing section F12
  • the image sensor 30 has a function of a reception processing section F32, as functions related to the deployment of the selected latter-stage network.
  • the functions of the transmission processing unit F12 and reception processing unit F32 are realized by software processing by the CPU 11 of the server device 1 and software processing by the in-sensor control unit 43 of the image sensor 30, respectively. Ru.
  • the transmission processing unit F12 performs a process of transmitting the configuration data of the downstream network selected by the downstream network acquisition unit F11 to the outside. Specifically, processing for transmitting setting data of the downstream network selected by the downstream network acquisition unit F11 to the image sensor 30 via the communication unit 20 is performed. By performing this process, the setting data of the subsequent network is transmitted to the image sensor 30 via the fog server 4.
  • the reception processing section F32 performs a process of receiving the setting data of the subsequent network transmitted by the transmission processing section F12. This is based on the intermediate feature map IRM sent by the transmission processing unit F31 (see FIG. 10) described above, and the server device 1 receives configuration data for one downstream network selected from a plurality of candidates for downstream networks. In other words, it is something that performs processing.
  • the subsequent network in the initial AI model 51 set in the AI processing unit 44 is updated based on the setting data received by the reception processing unit F32 in this manner. .
  • the AI processing unit 44 performs inference processing using the updated downstream network, that is, the downstream network selected by the server device 1 as being compatible with the usage environment conditions of the image sensor 30. becomes.
  • FIG. 13 illustrates a visualized image of the intermediate feature map IRM obtained at two different division positions Dv for the initial AI model 51. Specifically, the visualization image of the intermediate feature map IRM when the dividing position Dv is the interlayer position between the first and second intermediate layers, and the visualization image of the intermediate feature map IRM when the dividing position Dv is the interlayer position between the third and fourth intermediate layers.
  • a visualized image of the intermediate feature map IRM is illustrated as an example. From FIG. 13, it can be seen that the image size of the intermediate feature map IRM tends to become smaller in intermediate layers subsequent to the first intermediate layer, and personal information becomes difficult to identify as the image content when visualized. I understand.
  • the dividing position Dv may be set as follows.
  • FIG. 14 shows a comparison between a captured image as input data to the initial AI model 51 and a visualized image of the intermediate feature map IRM obtained in the initial AI model 51 when the captured image is given as input data.
  • the captured image includes an image area (referred to as "image area Ar1") in which a person's face is photographed as an image area related to personal information.
  • the dividing position Dv is determined so that the number of pixels in the image area Ar2 is less than a predetermined number of pixels.
  • the "predetermined number of pixels” may be a number of pixels that makes it difficult to identify personal information when the intermediate feature map IRM is visualized. It is desirable to have 144 pixels, and more preferably 64 pixels corresponding to 8 ⁇ 8 pixels.
  • Another method for setting the division position Dv is to define the intermediate feature map IRM as data that cannot be decoded by the decoding unit of the autoencoder obtained through self-encoding learning of the target AI model. It will be done.
  • self-encoding learning is preliminary learning for creating an autoencoder, and specifically means unsupervised learning that matches output data with input data.
  • FIG. 15 is an explanatory diagram of this other setting method.
  • self-encoding learning is performed on the DNN network 50n used for the initial AI model 51 to generate the autoencoder 60.
  • the intermediate feature map IRM is input to the decoding section 60a of the autoencoder 60, and it is determined whether the intermediate feature map IRM can be decoded. This determination may be made based on the result of comparing the image data used as input data to obtain the intermediate feature map IRM and the output data of the decoding section 60a.
  • the division positions Dv are sequentially transitioned to the latter stage side, and for each division position Dv, the intermediate feature map IRM is determined using the decoding unit 60a as described above. At this time, it is conceivable to set the division position Dv at which it is determined that the intermediate feature map IRM cannot be decoded as the division position Dv to be adopted.
  • FIG. 16 is a flowchart showing an example of a processing procedure for realizing the adaptation method as the embodiment described above.
  • the process indicated as "server device” is executed by the CPU 11 in the server device 1 based on a program stored in a predetermined storage device such as the ROM 12, and the process indicated as "image sensor” is executed by the image sensor.
  • the CPU of the in-sensor control unit 43 in 30 executes the program based on a program stored in a predetermined storage device such as a ROM of the in-sensor control unit 43.
  • step S101 the CPU 11 waits until an instruction for a target edge, that is, an instruction for the image sensor 30 to be adapted for the downstream network, is made.
  • this target edge instruction is given by the user terminal 2 to the server device 1 based on an operation input made by the user to the user terminal 2.
  • step S102 the CPU 11 instructs the target image sensor 30 to execute an intermediate feature map generation operation. That is, it instructs to execute the operation of generating the intermediate feature map IRM.
  • the in-sensor control unit 43 waits for an instruction to execute such an intermediate feature map generation operation in step S201, and when there is an instruction to execute such an intermediate feature map generation operation, the generation operation execution process is performed in step S202.
  • the intermediate feature map IRM is generated by causing the imaging unit 41 to perform an imaging operation and by providing the captured image obtained by the imaging operation as input data to the initial AI model 51 in the AI processing unit 44. .
  • step S203 following step S202 the in-sensor control unit 43 performs a process of transmitting the intermediate feature map IRM to the server device 1. This corresponds to the processing of the transmission processing section F31 described above.
  • the CPU 11 waits to receive the intermediate feature map IRM from the image sensor 30 side in step S103, and when the intermediate feature map IRM is received, in step S104, the CPU 11 waits to receive the intermediate feature map IRM from the image sensor 30 side.
  • the subsequent network selection process is executed based on the following. Specifically, as described above as the subsequent network acquisition unit F11, it performs a process of selecting one subsequent network from among a plurality of subsequent network candidates based on the received intermediate feature map IRM. Note that a specific example of the method for selecting a subsequent network based on the received intermediate feature map IRM has already been explained, so a redundant explanation will be avoided.
  • step S105 the CPU 11 performs a process of transmitting the setting data of the selected downstream network to the image sensor 30 (that is, a process corresponding to the above-described transmission processing unit F12), and ends the series of processes shown in FIG. .
  • the in-sensor control unit 43 waits in step S204 to receive the setting data transmitted in step S105, and when the setting data is received, in step S205, the in-sensor control unit 43 waits to receive the setting data transmitted in step S105. Performs the subsequent network setting process. That is, the downstream network of the initial AI model 51 in the AI processing unit 44 is updated based on the setting data.
  • the in-sensor control unit 43 finishes the series of processes shown in FIG. 16 in response to executing the process of step S205.
  • downstream network selected by the server device 1 is sent as is to the edge side, but the selected downstream network is re-learned as fine tuning, and the downstream network after the relearning is sent to the edge side. It is also possible to send it to the other side.
  • FIG. 17 is an explanatory diagram of a method for generating a downstream network using active learning.
  • a small network is used as the downstream network.
  • the corresponding environment images are used as learning input data for each of the N types of environments assumed as the usage environment of the image sensor 30.
  • Perform machine learning using As a result, N environmentally aware trained networks are obtained as small-sized downstream networks.
  • the subsequent network in this case is a small network
  • high inference performance cannot be expected even if the above-mentioned environment-aware learning is performed. Therefore, we will perform machine learning using active learning to generate a small but environment-specific downstream network.
  • the score is medium (for example, in the range of about 0.5 to about 0.7, etc.)
  • the input image data obtained is presented to the annotator (see FIG. 17A).
  • image data of a type with a medium score is represented as "type B" image data.
  • relearning is performed for the entire network including the subsequent network formed by the small network (see FIG. 17B).
  • the downstream network acquisition unit F11 in the server device 1 selects from among the multiple downstream network candidates generated by active learning as described above, based on the intermediate feature map IRM input from the target image sensor 30. Select one downstream network.
  • a technique known as knowledge distillation there is a technique known as knowledge distillation.
  • One specific method is to prepare a general-purpose, large-sized master AI as a teacher model, and perform distillation on it.
  • the downstream network in the initial AI model 51 is used as a teacher model as a general-purpose, large-scale master AI, and the teacher model is distilled to create a small downstream network. generate.
  • the distillation is performed not only to reduce the size of the network, but also to adapt the model according to the usage environment conditions of the target image sensor 30.
  • distillation basically uses the inference result of the teacher model for the same input data as a soft target, and trains the student model so that the inference result of the student model approaches the soft target.
  • the network can be downsized and the usage environmental conditions of the image sensor 30 can be reduced.
  • the model will also be adapted accordingly.
  • the subsequent network acquisition unit F11 in this case performs a process of generating one subsequent network based on the learned network that is the base and the intermediate feature map IRM.
  • the downstream network acquisition unit F11 in this case uses the downstream network of the initial AI model 51 as a teacher model and performs distillation based on the intermediate feature map IRM input from the image sensor 30 side. The aim is to create a compact downstream network suitable for the usage environmental conditions.
  • Another method for generating a small post-stage network using distillation is a method that uses a post-process network that has been trained to be specific to the environment to a certain extent as a teacher model, rather than using a general-purpose post-process network as the teacher model. can also be mentioned.
  • FIG. 19 is an explanatory diagram of the different method.
  • a plurality of large-scale post-stage networks each trained to suit a different usage environment, are prepared as candidates for the teacher model in distillation (see FIG. 19A).
  • the downstream network acquisition unit F11 in this case selects a corresponding one from among the plurality of large downstream network candidates prepared in this way based on the intermediate feature map IRM input from the target image sensor 30 side.
  • the large-scale downstream network is selected as the teacher model (see FIG. 19B). That is, the usage environment of the image sensor 30 is estimated from, for example, the numerical distribution of the input intermediate feature map IRM, and the large-scale post-stage network corresponding to the estimated usage environment is selected.
  • the subsequent network acquisition unit F11 performs distillation using the selected large-scale subsequent network as a teacher model, and distills the input data of the teacher model and student model into an intermediate feature map IRM input from the image sensor 30 side.
  • one subsequent network suitable for the usage environment of the image sensor 30 is generated (see FIG. 19C).
  • distillation As an alternative method as described above, it is possible to support a wider range of usage environments than when performing distillation using a general-purpose large-scale downstream network as a teacher model as shown in FIG. I can do it.
  • FIG. 20 shows an example of the processing procedure of the server device 1 and the image sensor 30 corresponding to the case where a small-sized post-stage network is generated by distillation as described above. Note that the example of the processing procedure on the image sensor 30 side is the same as that described above with reference to FIG. 16, so a redundant explanation will be avoided.
  • the CPU 11 executes the process in step S110 instead of the process in step S104 shown in FIG. 16, and the process in step S111 in place of the process in step S105.
  • the CPU 11 performs distillation processing on the large-scale downstream network based on the received intermediate feature map IRM.
  • This distillation process may be a distillation process using a general-purpose large-scale post-stage network as a teacher model, as explained in FIG. It is conceivable to use one of the distillation processes using the large-scale post-stage network as a teacher model.
  • step S ⁇ b>111 following step S ⁇ b>110 the CPU 11 performs a process of transmitting the setting data of the small downstream network obtained through the distillation process to the image sensor 30 .
  • Step 1 indicated as “initial learning” is learning of the initial AI model 51.
  • the initial AI model 51 is trained in the operator cloud of country B.
  • the image data used for learning crosses the border between countries (from country A to country B).
  • the AI vendor in country C instructs the operator cloud to perform learning of the initial AI model 51.
  • Step 2 is a step for preparing multiple customized post-stage networks.
  • the pre-stage network and post-stage network (tentative) of the initial AI model 51 obtained in Step 1 are version), and perform machine learning (post-stage learning #1, #2, #3, etc.) using different training datasets as re-learning of the post-stage network, and then perform machine learning (post-stage learning #1, #2, #3, etc.) using Prepare your network.
  • learning for country D uses the learning data set for country D stored in database #1 of country D
  • learning for country E uses the learning data set stored in database #2 of country E.
  • Step 2 it is conceivable that, for example, an AI vendor in country C issues the execution instruction for the latter-stage learning. Note that the subsequent learning in Step 2 may be performed in the operator cloud of Country G instead of in Country B (that is, it may be performed in a country different from Step 1).
  • Step 3 indicated as "IRM generation” is a step in which the edge (camera 3 in this example) in country I (AI usage site) generates an intermediate feature map IRM based on the captured image.
  • a pre-stage network is required to generate the intermediate feature map IRM, but in this example, the pre-stage network is transmitted in advance from the operator cloud of country B to the image sensor 30 at the edge, and In the processing unit 44), an intermediate feature map IRM is generated based on the captured image.
  • the generated intermediate feature map IRM is sent from the CPU (control unit 33) to the fog server (fog server 4), and then sent from the fog server to the customer cloud in country I, and from the customer cloud to the operator cloud. Sent. At this time, the image data does not cross the border (because it has been sanitized as an intermediate feature map IRM).
  • the instruction to generate the intermediate feature map IRM may be given by, for example, an AI user in country I.
  • the AI processing may be performed by the CPU or fog server as described later, and in that case, the transmission to the front-stage network for generating the intermediate feature map IRM is performed for the CPU or fog server.
  • the customer cloud exists in country J, which is different from country I where the AI is used.
  • Step 4 shown as "post-stage selection” means selection of a post-stage network based on the intermediate feature map IRM.
  • the operator cloud in country B (or country G or country H) selects a corresponding downstream network from among the multiple downstream networks prepared in Step 2, based on the intermediate feature map IRM sent from the edge side in Step 3.
  • the AI user in country I instructs the operator cloud to select and execute the subsequent network.
  • Step 5 the operator cloud deploys a combined network (that is, the entire AI model) that combines the selected downstream network and the upstream network to the edge (image sensor 30 or CPU or fog server), and Inference processing using an AI model is performed.
  • a combined network that is, the entire AI model
  • the edge image sensor 30 or CPU or fog server
  • Step 2 it is assumed that learning is performed using image data as input data for learning, but as shown in the operational example of FIG. 22, "post-stage learning” of Step 2 It is also conceivable to use the intermediate feature map IRM instead of the data as input data for learning.
  • the first-stage network of the initial AI model 51 obtained in Step 1 is distributed in advance to the training data collection sites in countries D, E, and F, and each country uses it as input data for training.
  • An intermediate feature map IRM is generated in advance.
  • Step 2 "post-stage learning,” the post-stage network (tentative version) of the initial AI model 51 is re-trained using a training dataset containing intermediate feature maps IRM obtained in each of these countries. Run to generate multiple custom downstream networks.
  • FIG. 23A shows the generation of the initial AI model 51.
  • the initial AI model 51 an AI model that can absorb characteristics variations of the image sensor 30 in the camera 3 is created as an example.
  • image data that reflects the characteristic variations of the image sensor 30 as the image data in the learning data set.
  • the set of image data for example, a set including a mixture of image data captured by a plurality of image sensors 30 may be prepared. Thereby, it is possible to prevent variations in inference accuracy due to variations in characteristics of the image sensor 30.
  • a data set with variations other than the sensor characteristics of the image sensor 30 may be used as a learning data set for absorbing variations in inference accuracy.
  • a dataset with variations in camera installation conditions and environmental conditions may be used as a dataset with variations in camera installation conditions and environmental conditions.
  • the initial AI model 51 generated as described above is retrained using a learning data set for overall customization, as shown in FIG. 23B.
  • we perform retraining using different custom training datasets two types #A and #B in the diagram), and create an overall AI model customized for each purpose. , to generate a custom global model.
  • training data sets #a, #b, ... are prepared as training data sets for customizing the downstream network of the custom overall model 51-1
  • training datasets #a, #b, ... are prepared as training datasets for customizing the downstream network of the custom overall model 51-2.
  • Training datasets # ⁇ , # ⁇ , ... are prepared as training datasets for the training, and by retraining the subsequent network using these training datasets, the custom overall model 51-1 is created.
  • custom downstream networks #a, #b, . . . are generated, and for the downstream network of the custom overall model 51-2, custom downstream networks # ⁇ , # ⁇ , . . . are generated. It shows.
  • the relearning of the entire AI model explained in FIG. 23B may be performed for customization for each application, such as indoor monitoring use and outdoor monitoring use, and the relearning described in FIG. 23C may be performed.
  • the relearning described in FIG. 23C may be performed.
  • the second stage relearning it is possible to make the division position Dv of the previous stage network/second stage network different from that in the first stage relearning.
  • the overall AI model custom overall model
  • the dividing position Dv at the time of second-stage relearning in the first stage is set to [A]/[B ][C]
  • step-by-step network relearning in the first step, customization is performed roughly by region, such as the United States/Europe/Japan, and in the second step, customization is performed by state, for example, in the United States. It is conceivable that customization can be done in units of smaller divisions such as.
  • the in-sensor control unit 43 executes the processing as the transmission processing unit F31 and the reception processing unit F32 in response to the case where the inference processing using the post-stage network is performed inside the image sensor 30.
  • the inference processing using the latter-stage network is performed by a processor as the control unit 33 (a processor outside the image sensor 30 in the camera 3).
  • the processing is performed by a processor serving as the control unit 33.
  • a configuration may be considered in which the processor of the fog server 4 performs the inference processing using the downstream network, and in that case, the processor of the fog server 4 performs the processing as the transmission processing section F31 and the reception processing section F32.
  • imaging broadly means obtaining image data capturing a subject.
  • the image data referred to here is a general term for data consisting of multiple pixel data, and the pixel data includes not only data indicating the intensity of the amount of light received from the subject, but also information such as the distance to the subject, polarization information, and temperature.
  • the "captured image” obtained by the “imaging sensor” includes data as a gradation image that shows information on the intensity of the amount of light received for each pixel, and data as a distance image that shows information on the distance to the subject for each pixel. data, data as a polarization image showing polarization information of incident light for each pixel, data as a thermal image showing temperature information for each pixel, etc.
  • the input data for the AI model is not limited to images captured by an image sensor, but it is also possible to use data other than captured images, such as sound data collected by a microphone.
  • the present technology is suitable when data having personally identifiable information content is used as input data for an AI model.
  • the configuration of the front-stage network remains unchanged, but the front-stage network may be selected from a plurality of candidates.
  • the configuration of the front-end network may need to be changed depending on the type of image sensor (e.g., RGB sensor, IR (infrared) sensor, etc.), or the configuration may need to be changed depending on the type of race in the image.
  • the configuration of the front-stage network should be changed. Therefore, by selecting a pre-stage network suitable for these conditions, inference performance can be improved.
  • machine learning for generating the initial AI model 51 and machine learning for generating multiple candidate post-stage networks may be performed within the same country, or may be performed in different countries. It will be done. It is also conceivable that the machine learning for generating multiple candidate post-stage networks will be divided among different countries.
  • the estimated information IS here means estimated information related to the subject obtained as a result of the AI model as the front-stage network performing AI processing based on the input image.
  • Examples of the information include estimated information about the imaging scene, such as imaging, imaging under clear skies/imaging under rainy weather, and estimated information about the range of the target subject such as ROI (Region of Interest).
  • Such estimation information IS obtained by the front-stage network is given to the imaging control unit 65 in the camera 3, and is used to control, for example, imaging settings related to brightness such as shutter speed, aperture value, and ISO sensitivity, focus, white balance, and noise reduction. Controls predetermined imaging settings such as intensity imaging settings.
  • the intermediate feature map IRM output from the front-stage network as an InputTensor of the rear-stage network may be used as the estimation information IS.
  • the estimation information IS it is conceivable to use only the intermediate feature map IRM as the estimation information IS, or to feed back the intermediate feature map IRM and other estimation information as the estimation information IS to control the imaging settings.
  • the first information processing device as an embodiment has a neural network and inputs input data to an artificial intelligence model that uses detected data by a sensor device (image sensor 30) as input data.
  • the intermediate feature map obtained in a predetermined intermediate layer of the artificial intelligence model when given is input from an external device, and based on the intermediate feature map, multiple A subsequent network acquisition unit (F11) that selects one subsequent network from among the candidates or generates one subsequent network based on the base learned network and intermediate feature map, and the subsequent network acquisition unit selects Alternatively, it includes a transmission processing unit (F12) that performs a process of transmitting the generated configuration data of the subsequent network to the outside.
  • F12 transmission processing unit
  • a corresponding subsequent network may be generated from the base trained network by, for example, distillation.
  • the number of times of learning required to generate a candidate post-stage network can be made smaller than the number of times of learning required to re-learn the entire network including the previous-stage network.
  • the number of times of learning required to generate a corresponding post-stage network from the trained network that is the base can also be made smaller than the number of times of learning required to re-learn the entire network.
  • the time required to generate a subsequent network suitable for the environmental conditions in which the sensor device is used can be shortened. Further, by reducing the number of times of learning required, it is possible to reduce the number of learning input data required to generate a subsequent network suitable for the environmental conditions in which the sensor device is used. Furthermore, according to the above configuration, in order to obtain one subsequent network suitable for the operating environment conditions of the sensor device, it is sufficient to obtain the intermediate feature map from the edge side. Since the intermediate feature map is data obtained by processing input data in an intermediate layer, it is difficult to identify personal information as it is.
  • the first information processing device of the embodiment it is possible to shorten the time required to adapt the edge-side artificial intelligence model to the operating environment conditions of the sensor device, and the artificial intelligence Since the amount of learning input data that model users must send to the cloud side for adapting the artificial intelligence model is reduced, the amount of communication data required for adapting the artificial intelligence model is reduced. I can do it. Furthermore, as mentioned above, the intermediate feature map is data that is difficult to identify personal information as it is, so when adapting the edge-side artificial intelligence model to the usage environment conditions of the sensor device, the leakage of personal information is necessary.
  • the intermediate feature map is output data of the second and subsequent intermediate layers in the artificial intelligence model. It can be said that the second and subsequent intermediate layers provide output data that is more difficult to identify personal information than the output data of the first intermediate layer. Therefore, the effect of reducing the possibility of leakage of personal information can be enhanced.
  • the sensor device is an image sensor
  • the artificial intelligence model is an artificial intelligence model for performing image recognition processing using a captured image obtained by the image sensor as input data.
  • the intermediate feature map is data in which the number of pixels in the image area related to personal information when visualized is less than 144 pixels. That is, it is the output data of the intermediate layer that is dimensionally compressed to the extent that the number of pixels in an image area related to personal information, such as a face image area when the target subject is a person, is less than 144 pixels. Thereby, the effect of reducing the possibility of leakage of personal information can be enhanced.
  • the intermediate feature map is data in which the number of pixels in the image area related to personal information when visualized is less than 64 pixels.
  • the output data of the intermediate layer that is dimensionally compressed to the extent that the number of pixels in the image area related to personal information is less than 64 pixels.
  • the intermediate feature map is data that cannot be decoded by the decoding unit of the autoencoder obtained by self-encoding learning of the artificial intelligence model (see FIG. 15, etc.) ).
  • the intermediate feature map is data that cannot be decoded by the decoding unit of the autoencoder obtained by self-encoding learning of the artificial intelligence model (see FIG. 15, etc.) ).
  • the downstream network acquisition unit selects one downstream network from among the multiple candidates for downstream networks based on the intermediate feature map (see FIG. (See Figure 16, etc.)
  • the intermediate feature map see FIG. (See Figure 16, etc.)
  • the time required to adapt the edge-side artificial intelligence model to the operating environment conditions of the sensor device can be shortened, and the user of the artificial intelligence model can Since the amount of learning input data to be sent to the cloud side is reduced, it is possible to reduce the amount of communication data required for adapting the artificial intelligence model.
  • the plurality of candidates for the subsequent network are generated by machine learning as active learning (see FIG. 17, etc.). This makes it possible to generate small networks as candidates for subsequent networks. Therefore, it is suitable when a device that performs inference processing using a post-stage network is a device with poor hardware resources.
  • the subsequent network acquisition unit generates one subsequent network based on the learned network that is the base and the intermediate feature map (see FIGS. 18 to 20). etc.).
  • the edge-side artificial intelligence model when adapting the edge-side artificial intelligence model to the operating environment conditions of the sensor device, it is only necessary to generate a corresponding later-stage network from the base trained network, and the entire network including the previous-stage network needs to be retrained. This makes it possible to reduce the number of times the learning is required compared to the case where the training is performed.
  • the time required to adapt the edge-side artificial intelligence model to the operating environment conditions of the sensor device can be shortened, and the user of the artificial intelligence model can Since the amount of learning input data to be sent to the cloud side is reduced, it is possible to reduce the amount of communication data required for adapting the artificial intelligence model.
  • the subsequent-stage network acquisition unit acquires the first subsequent-stage network by performing knowledge distillation using the base trained network as a teacher model based on the intermediate feature map. is being generated.
  • knowledge distillation based on the intermediate feature map input from the external device is performed as knowledge distillation using the trained network as the base as the teacher model.
  • a first information processing method as an embodiment is such that when an information processing device provides input data to an artificial intelligence model that has a neural network and uses detected data by a sensor device as input data, a predetermined intermediate point of the artificial intelligence model is The intermediate feature map obtained in the layer is input from an external device, and one subsequent network is selected from among multiple candidates for subsequent networks that are networks subsequent to the predetermined intermediate layer in the artificial intelligence model, based on the intermediate feature map. , or a subsequent network acquisition process that generates one subsequent network based on the base learned network and the intermediate feature map, and a transmission process that transmits to the outside the configuration data of the subsequent network selected or generated in the subsequent network acquisition process.
  • This is an information processing method that performs the following. Even with such a first information processing method, it is possible to obtain the same operation and effect as the first information processing device according to the above-described embodiment.
  • the second information processing device (image sensor 30) as an embodiment has a neural network, and when input data is given to an artificial intelligence model that uses detected data by the sensor device as input data, a predetermined intermediate value of the artificial intelligence model A transmission processing unit (F31) that performs a process of transmitting the intermediate feature map obtained in the layer to the outside, and an external device that performs processing to transmit the intermediate feature map obtained in the layer to the outside, and an external device that performs processing to transmit the intermediate feature map obtained in the layer to the outside, One downstream network selected from among multiple candidates for downstream networks that is a network of A reception processing unit (F32) that performs processing to receive configuration data of any of the downstream networks, and an inference processing unit that performs inference processing using the downstream network realized by the configuration data received by the reception processing unit.
  • an external device uses an intermediate feature map as a method for determining a downstream network suitable for the operating environment conditions of the sensor device. Either a method is adopted in which one subsequent network is selected from among multiple candidates for subsequent networks based on the base network, or a method is adopted in which one subsequent network is generated based on the base trained network and an intermediate feature map.
  • An information processing device is provided that enables the above. Therefore, similarly to the first information processing device described above, the second information processing device of the embodiment also shortens the time required to adapt the edge-side artificial intelligence model to the operating environment conditions of the sensor device.
  • the amount of learning input data that the user of the artificial intelligence model must send to the cloud for adaptation of the artificial intelligence model is reduced, and the It is possible to reduce the amount of communication data required. Furthermore, since the intermediate feature map is used, it is possible to reduce the possibility of leakage of personal information when adapting the edge-side artificial intelligence model to the operating environment conditions of the sensor device.
  • the intermediate feature map is output data of the second and subsequent intermediate layers in the artificial intelligence model.
  • the output data tends to be data in which it is more difficult to identify personal information than in the output data of the first intermediate layer. Therefore, the effect of reducing the possibility of leakage of personal information can be enhanced.
  • the sensor device is an image sensor
  • the artificial intelligence model is an artificial intelligence model for performing image recognition processing using a captured image obtained by the image sensor as input data.
  • the intermediate feature map is data in which the number of pixels in the image area related to personal information when visualized is less than 144 pixels. That is, it is the output data of the intermediate layer that is dimensionally compressed to the extent that the number of pixels in an image area related to personal information, such as a face image area when the target subject is a person, is less than 144 pixels. Thereby, the effect of reducing the possibility of leakage of personal information can be enhanced.
  • the intermediate feature map is data in which the number of pixels in the image area related to personal information when visualized is less than 64 pixels.
  • the output data of the intermediate layer that is dimensionally compressed to the extent that the number of pixels in the image area related to personal information is less than 64 pixels.
  • the intermediate feature map is data that cannot be decoded by the decoding unit of the autoencoder obtained by self-encoding learning of the artificial intelligence model.
  • a predetermined intermediate point of the artificial intelligence model is A transmission process that transmits an intermediate feature map obtained in the layer to the outside, and an external device that transmits the intermediate feature map obtained in the layer to the outside, and based on the intermediate feature map sent in the transmission process, the external device transmits multiple Settings for one of the following networks selected from among the candidates, or one of the following networks generated by an external device based on the base learned network and the intermediate feature map sent in the transmission process.
  • This information processing method performs a reception process of receiving data, and an inference process of performing an inference process using a downstream network realized by setting data received in the reception process. Even with such a second information processing method, it is possible to obtain the same operation and effect as the second information processing device according to the above-described embodiment.
  • An information processing system (No. 100) according to an embodiment of the present invention provides information obtained in a predetermined intermediate layer of an artificial intelligence model when input data is given to an artificial intelligence model that has a neural network and uses detected data by a sensor device as input data.
  • a first transmission processing unit transmission processing unit F31
  • transmission processing unit F31 transmission processing unit that performs a process of transmitting an intermediate feature map to the outside, and an intermediate feature map that is provided in an external device of the device having the first transmission processing unit and transmitted by the first transmission processing unit.
  • one of the following networks is selected from a plurality of candidates for subsequent networks that are subsequent networks to the predetermined intermediate layer in the artificial intelligence model, or the base trained network and the information transmitted by the first transmission processing unit are selected.
  • a second-stage network acquisition unit (F11) that generates one second-stage network based on the intermediate feature map; and a second-stage network selected or generated by the second-stage network acquisition unit, which is provided in an external device of the device having the first transmission processing unit; a second transmission processing section (transmission processing section F12) that performs a process of transmitting the setting data to the outside, and a reception processing section (F32) that is provided in the device having the first transmission processing section and performs a process of receiving the setting data. and an inference processing unit (AI processing unit 44) that is provided in the device having the first transmission processing unit and performs inference processing using a downstream network realized by the setting data received by the reception processing unit.
  • AI processing unit 44 an inference processing unit
  • the present technology can also adopt the following configuration. (1) An intermediate feature map obtained in a predetermined intermediate layer of the artificial intelligence model when the input data is given to an artificial intelligence model having a neural network and using detection data by a sensor device as input data is input from an external device, and the Based on the intermediate feature map, one of the latter networks is selected from among a plurality of candidates for later networks that are later networks than the predetermined intermediate layer in the artificial intelligence model, or one of the latter networks is selected as a base trained network and the intermediate network.
  • An information processing device comprising: a transmission processing unit that performs a process of transmitting to the outside setting data of the downstream network selected or generated by the downstream network acquisition unit.
  • a transmission processing unit that performs a process of transmitting to the outside setting data of the downstream network selected or generated by the downstream network acquisition unit.
  • the intermediate feature map is output data of a second or subsequent intermediate layer in the artificial intelligence model.
  • the sensor device is an image sensor, and the artificial intelligence model is an artificial intelligence model for performing image recognition processing using a captured image obtained by the image sensor as input data.
  • Information processing device is (4) The information processing device according to (3), wherein the intermediate feature map is data such that the number of pixels in an image area related to personal information when visualized is less than 144 pixels.
  • the intermediate feature map is data such that the number of pixels in an image area related to personal information when visualized is less than 64 pixels.
  • the intermediate feature map is data that cannot be decoded by a decoding unit of an autoencoder obtained by self-encoding learning of the artificial intelligence model.
  • the downstream network acquisition unit selects one of the downstream network candidates from among the multiple candidates for the downstream network, based on the intermediate feature map.
  • the plurality of candidates for the latter-stage network are generated by machine learning as active learning.
  • the information processing device An intermediate feature map obtained in a predetermined intermediate layer of the artificial intelligence model when the input data is given to an artificial intelligence model having a neural network and using detection data by a sensor device as input data is input from an external device, and the Based on the intermediate feature map, one of the latter networks is selected from among a plurality of candidates for later networks that are later networks than the predetermined intermediate layer in the artificial intelligence model, or one of the latter networks is selected as a base trained network and the intermediate network. a subsequent network acquisition process that generates one of the subsequent networks based on the feature map; An information processing method, comprising: transmitting to the outside setting data of the downstream network selected or generated in the downstream network acquisition process.
  • a reception processing unit that performs An information processing device, comprising: an inference processing unit that performs inference processing using the downstream network realized by the configuration data received by the reception processing unit.
  • the information processing device according to (12) wherein the intermediate feature map is output data of a second or subsequent intermediate layer in the artificial intelligence model.
  • the sensor device is an image sensor
  • the artificial intelligence model is an artificial intelligence model for performing image recognition processing using a captured image obtained by the image sensor as input data.
  • Information processing device is an image sensor
  • the artificial intelligence model is an artificial intelligence model for performing image recognition processing using a captured image obtained by the image sensor as input data.
  • Information processing device is an image sensor
  • the artificial intelligence model is an artificial intelligence model for performing image recognition processing using a captured image obtained by the image sensor as input data.
  • the information processing device a transmission process of transmitting to the outside an intermediate feature map obtained in a predetermined intermediate layer of the artificial intelligence model when the input data is given to an artificial intelligence model having a neural network and inputting data detected by a sensor device; , Based on the intermediate feature map transmitted in the transmission process, an external device selects one of the post-stage networks from among a plurality of candidates for post-stage networks that are post-stage networks of the predetermined intermediate layer in the artificial intelligence model; or a reception process in which an external device receives configuration data for any one of the latter networks generated based on the learned base network and the intermediate feature map transmitted in the transmission process; An information processing method comprising: performing an inference process using the downstream network realized by the configuration data received in the reception process.
  • a first transmission processing unit a downstream network that is provided in an external device of the device having the first transmission processing section and is a network at a subsequent stage of the predetermined intermediate layer in the artificial intelligence model, based on the intermediate feature map transmitted by the first transmission processing section; a second stage that selects one of the second-stage networks from among the plurality of candidates, or generates one of the second-stage networks based on a learned network as a base and the intermediate feature map transmitted by the first transmission processing unit; a network acquisition unit; a second transmission processing unit that is provided in an external device of the device having the first transmission processing unit and performs a process of transmitting to the outside configuration data of the downstream network selected or generated by the downstream network acquisition unit; a reception processing unit that is provided in a device having the first transmission processing unit and
  • Information processing system 100 Information processing system 1 Server device 3 Camera 4 Fog server 5 Network 11 CPU 30 Image sensor 33 Control unit 35 Communication unit 37 Bus 41 Imaging unit 43 In-sensor control unit 44 AI processing unit 47 Communication interface 48 Bus D1, D2 Die 50 AI model 50n DNN network 51 Initial AI model 55 Post-processing unit 60 Auto encoder 60a Decoding unit Dv Division position F11 Post-stage network acquisition unit F12 Transmission processing unit F31 Transmission processing unit F32 Reception processing unit

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)
PCT/JP2023/026530 2022-08-04 2023-07-20 情報処理装置、情報処理方法、及び情報処理システム Ceased WO2024029347A1 (ja)

Priority Applications (5)

Application Number Priority Date Filing Date Title
CN202380055835.4A CN119768798A (zh) 2022-08-04 2023-07-20 信息处理装置、信息处理方法及信息处理系统
US18/998,530 US20260030881A1 (en) 2022-08-04 2023-07-20 Information processing device, information processing method, and information processing system
KR1020257005536A KR20250048266A (ko) 2022-08-04 2023-07-20 정보 처리 장치, 정보 처리 방법, 및 정보 처리 시스템
JP2024538918A JPWO2024029347A1 (https=) 2022-08-04 2023-07-20
EP23849902.4A EP4567669A4 (en) 2022-08-04 2023-07-20 INFORMATION PROCESSING DEVICE, INFORMATION PROCESS AND INFORMATION PROCESSING SYSTEM

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
JP2022-125012 2022-08-04
JP2022125012 2022-08-04
JP2022170177 2022-10-24
JP2022-170177 2022-10-24

Publications (1)

Publication Number Publication Date
WO2024029347A1 true WO2024029347A1 (ja) 2024-02-08

Family

ID=89848827

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2023/026530 Ceased WO2024029347A1 (ja) 2022-08-04 2023-07-20 情報処理装置、情報処理方法、及び情報処理システム

Country Status (6)

Country Link
US (1) US20260030881A1 (https=)
EP (1) EP4567669A4 (https=)
JP (1) JPWO2024029347A1 (https=)
KR (1) KR20250048266A (https=)
CN (1) CN119768798A (https=)
WO (1) WO2024029347A1 (https=)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09120434A (ja) * 1995-10-24 1997-05-06 Suzuki Motor Corp 文字認識装置
JP2021135739A (ja) * 2020-02-27 2021-09-13 株式会社日立製作所 運転状態分類システム、および、運転状態分類方法
WO2022064656A1 (ja) * 2020-09-25 2022-03-31 日本電信電話株式会社 処理システム、処理方法及び処理プログラム

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6695947B2 (ja) 2018-09-21 2020-05-20 ソニーセミコンダクタソリューションズ株式会社 固体撮像システム、画像処理方法及びプログラム

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09120434A (ja) * 1995-10-24 1997-05-06 Suzuki Motor Corp 文字認識装置
JP2021135739A (ja) * 2020-02-27 2021-09-13 株式会社日立製作所 運転状態分類システム、および、運転状態分類方法
WO2022064656A1 (ja) * 2020-09-25 2022-03-31 日本電信電話株式会社 処理システム、処理方法及び処理プログラム

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4567669A4 *

Also Published As

Publication number Publication date
EP4567669A4 (en) 2025-12-17
CN119768798A (zh) 2025-04-04
KR20250048266A (ko) 2025-04-08
EP4567669A1 (en) 2025-06-11
US20260030881A1 (en) 2026-01-29
JPWO2024029347A1 (https=) 2024-02-08

Similar Documents

Publication Publication Date Title
EP4440130A1 (en) Information processing device, information processing method, and program
US20250199789A1 (en) Information processing apparatus and information processing system
WO2023238723A1 (ja) 情報処理装置、情報処理システム、情報処理回路及び情報処理方法
JP2025084825A (ja) 撮像システム
WO2023218936A1 (ja) イメージセンサ、情報処理方法、プログラム
US20250028506A1 (en) Information processing device, information processing method, and program
JP2024059428A (ja) 信号処理装置、信号処理方法、記憶媒体
WO2024029347A1 (ja) 情報処理装置、情報処理方法、及び情報処理システム
WO2023218935A1 (ja) イメージセンサ、情報処理方法、プログラム
US20240414007A1 (en) Information processing device, information processing method, imaging device, and control method
CN121532793A (zh) 高效图像数据处理
WO2025150483A1 (en) Information processing apparatus, information processing method, program, and recording medium
WO2025197575A1 (ja) 信号処理装置、情報処理装置
WO2025126714A1 (ja) 情報処理装置、情報処理方法、及び、プログラム
WO2025127023A1 (ja) 情報処理装置及び情報処理システム
WO2026063181A1 (ja) 画像処理装置、画像処理方法
WO2024202366A1 (ja) 情報処理装置、情報処理方法、記録媒体、推論装置、制御方法
WO2023218934A1 (ja) イメージセンサ
US20260004122A1 (en) Information processing device, information processing method, computer-readable non-transitory storage medium, and terminal device
JP2024059288A (ja) 画像処理装置、画像処理方法および記録媒体
WO2024034413A1 (ja) 情報処理方法、サーバ装置、および情報処理装置
WO2024202501A1 (ja) 撮像装置、撮像装置システム、プログラム保護方法及び記憶媒体
WO2024241917A1 (ja) 情報処理装置、情報処理方法、プログラム
JP2025040847A (ja) 撮像装置、送信方法、情報処理装置、情報処理方法、特徴量配信方法
WO2026058652A1 (ja) 情報処理装置、情報処理方法、情報処理システム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23849902

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2024538918

Country of ref document: JP

WWE Wipo information: entry into national phase

Ref document number: 202380055835.4

Country of ref document: CN

WWE Wipo information: entry into national phase

Ref document number: 18998530

Country of ref document: US

ENP Entry into the national phase

Ref document number: 20257005536

Country of ref document: KR

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 2023849902

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2023849902

Country of ref document: EP

Effective date: 20250304

WWP Wipo information: published in national office

Ref document number: 202380055835.4

Country of ref document: CN

WWP Wipo information: published in national office

Ref document number: 1020257005536

Country of ref document: KR

WWP Wipo information: published in national office

Ref document number: 2023849902

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 18998530

Country of ref document: US