WO2022190157A1 - Imaging device and video processing system - Google Patents

Imaging device and video processing system Download PDF

Info

Publication number
WO2022190157A1
WO2022190157A1 PCT/JP2021/008913 JP2021008913W WO2022190157A1 WO 2022190157 A1 WO2022190157 A1 WO 2022190157A1 JP 2021008913 W JP2021008913 W JP 2021008913W WO 2022190157 A1 WO2022190157 A1 WO 2022190157A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
video processing
imaging device
feature amount
processing system
Prior art date
Application number
PCT/JP2021/008913
Other languages
French (fr)
Japanese (ja)
Inventor
嵩臣 神田
Original Assignee
株式会社日立国際電気
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社日立国際電気 filed Critical 株式会社日立国際電気
Priority to PCT/JP2021/008913 priority Critical patent/WO2022190157A1/en
Priority to JP2023504880A priority patent/JP7448721B2/en
Publication of WO2022190157A1 publication Critical patent/WO2022190157A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/18Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast

Definitions

  • the present invention relates to an imaging device and a video processing system, and more particularly to an imaging device and a video processing system that are capable of inference processing by machine learning and have video processing functions for privacy protection.
  • Patent Document 1 discloses a method of protecting privacy by performing processing such as reversible mosaic processing and mask processing on captured images.
  • the processed image can be restored to the original image by performing corresponding restoration processing.
  • Patent Document 1 if restoration information including restoration information for performing restoration processing is leaked to the outside, a malicious third party can perform restoration processing and obtain the original image. In order to prevent this, it is necessary to distribute the irreversible image over the LAN, but in that case the original image cannot be restored. As a result, it becomes impossible to perform face recognition and action recognition using image recognition technology or the like.
  • an object of the present invention to provide an imaging device and a video processing system capable of transmitting predetermined information about images while protecting image information at a higher level.
  • one of the representative imaging devices of the present invention acquires an image by photographing a video, detects a predetermined area from the image, resizes the detected area, and detects it. It is characterized by extracting a feature amount of an area and outputting an image arranged in a detection area of the obtained image as a mask image in which the extracted feature amount is arranged two-dimensionally.
  • one of the video processing systems of the present invention includes an imaging device and a video processing device, wherein the imaging device acquires an image by capturing a video, detects a predetermined region from the resizing the detected area, extracting a feature amount of the detection area, and outputting an image arranged in the detection area of the acquired image as a mask image in which the extracted feature amount is arranged two-dimensionally, , the image output by the imaging device is input, the feature amount is obtained from the mask image, and an inference process is performed based on the feature amount.
  • FIG. 1 is a block diagram showing one embodiment of the video processing system of the present invention.
  • FIG. 2 is a block diagram showing an example of the processing system section of FIG.
  • FIG. 3 is a diagram showing an example of processing for calculating feature amounts applied in the video processing system of the present invention.
  • FIG. 4 is a diagram showing an example of processing of the imaging device in the video processing system of the present invention.
  • FIG. 5 is a diagram showing an example of processing of the video processing device in the video processing system of the present invention.
  • FIG. 1 is a block diagram showing one embodiment of the video processing system of the present invention.
  • the video processing system in FIG. 1 includes an imaging device 1 and a video processing device 5 .
  • the imaging device 1 includes an imaging section 2 and a processing system section 3 .
  • the video processing device 5 also includes a processing system section 6 and a display output section 7 .
  • the display output unit 7 may be configured separately from the video processing device 5 instead of being provided in the video processing device 5 .
  • a personal computer, a tablet computer, a server, or the like can be applied to the video processing device 5 .
  • the imaging device 1 has a configuration of one or more cameras, and can be arranged in various places. For example, it may be installed at a monitoring location as a monitoring camera.
  • the imaging unit 2 is a camera configuration that obtains information by forming an image of incident light on an imaging device via a lens and a diaphragm.
  • the imaging device here include a CCD (Charge-Coupled Device) image sensor and a CMOS (Complementary Metal Oxide Semiconductor) image sensor.
  • the obtained information is sent to the processing system section 3 .
  • the imaging unit 2 can perform imaging processing using a video processing IC (Integrated Circuit) such as an FPGA (Field Programmable Gate Array). On the other hand, this video processing IC may be integrated with the processing system section 3 .
  • a video processing IC Integrated Circuit
  • FPGA Field Programmable Gate Array
  • the processing system unit 3 acquires information captured by the imaging unit 2 and performs the processing of FIG. 4, which will be described later. A specific configuration example will be described later with reference to FIG. 2, and specific processing contents will be described later with reference to FIG.
  • the processed information is sent to the processing system section 6 .
  • the processing system section 6 acquires information from the processing system section 3 and performs the processing of FIG. 5, which will be described later.
  • a specific configuration example will be described later with reference to FIG. 2, and specific processing contents will be described later with reference to FIG.
  • the display output unit 7 is a device that can display the content processed by the processing system unit 6. For example, it is displayed by a structure such as a liquid crystal display (LCD), an organic EL (OEL) display, a touch panel, or the like.
  • LCD liquid crystal display
  • OEL organic EL
  • Information can be exchanged between the imaging device 1 and the video processing device 5 via the Internet network or the like. For example, it is connected to a LAN or the like. Alternatively, information may be exchanged via a dedicated communication line. That is, it is possible to check the processing contents of the imaging device 1 at a remote location with the video processing device 5 .
  • the imaging device 1 and the video processing device 5 may not be one-to-one, and one imaging device 1 may correspond to a plurality of video processing devices 5.
  • a plurality of imaging devices 1 may correspond to one imaging device.
  • the video processing device 5 may correspond.
  • the image processing device 5 may be configured to enable setting and operation of the imaging device 1 .
  • FIG. 2 is a block diagram showing an example of the processing system section of FIG. A specific example of the processing system units 3 and 6 will be described as a computer system 300 in FIG.
  • the major components of computer system 300 include one or more processors 302 , memory 304 , terminal interfaces 312 , storage interfaces 314 , I/O (input/output) device interfaces 316 , and network interfaces 318 . These components may be interconnected via memory bus 306 , I/O bus 308 , bus interface 309 and I/O bus interface 310 .
  • Computer system 300 may include one or more processing units 302 A and 302 B, collectively referred to as processor 302 .
  • processor 302 executes instructions stored in memory 304 and may include an on-board cache.
  • CPU Central Processing Unit
  • FPGA Field-Programmable Gate Array
  • GPU Graphics Processing Unit
  • Memory 304 may include random access semiconductor memory, storage devices, or storage media (either volatile or non-volatile) for storing data and programs. Memory 304 also represents the entire virtual memory of computer system 300 and may include the virtual memory of other computer systems connected to computer system 300 over a network. Memory 304 may conceptually be viewed as a single entity, but may be more complex arrangements, such as hierarchies of caches and other memory devices.
  • the memory 304 may store all or part of the programs, modules, and data structures that implement the functions described in this embodiment.
  • memory 304 may store application 350 .
  • Application 350 may include instructions or descriptions that perform the functions described below on processor 302, or may include instructions or descriptions that are interpreted by other instructions or descriptions.
  • Application 350 may be implemented in hardware via semiconductor devices, chips, logic gates, circuits, circuit cards, and/or other physical hardware devices instead of or in addition to processor-based systems.
  • Application 350 may include data other than instructions or descriptions.
  • Other data input devices such as cameras and sensors may also be provided in direct communication with bus interface 309 , processor 302 , or other hardware of computer system 300 .
  • Computer system 300 may include bus interface 309 that provides communication between processor 302 , memory 304 , display system 324 , and I/O bus interface 310 .
  • I/O bus interface 310 may couple to I/O bus 308 for transferring data to and from various I/O units.
  • I/O bus interface 310 connects via I/O bus 308 to a plurality of I/O interfaces 312, 314, 316, and 318, also known as I/O processors (IOPs) or I/O adapters (IOAs).
  • IOPs I/O processors
  • IOAs I/O adapters
  • Display system 324 may include a display controller, display memory, or both. The display controller can provide video, audio, or both data to display device 326 .
  • Computer system 300 may also include one or more sensors or other devices configured to collect data and provide such data to processor 302 .
  • the display system 324 may be connected to a display device 326 such as a single display screen, television, tablet, or handheld device.
  • Display device 326 may include speakers for rendering audio.
  • speakers for rendering audio may be connected to the I/O interface.
  • the functionality provided by display system 324 may be implemented by an integrated circuit that includes processor 302 .
  • bus interface 309 may be implemented by an integrated circuit including processor 302 .
  • the I/O interface has the ability to communicate with various storage or I/O devices.
  • terminal interface 312 may be a user output device such as a video display, speaker television, etc., or a user input device such as a keyboard, mouse, keypad, touchpad, trackball, buttons, light pen, or other pointing device.
  • Any user I/O device 320 can be attached.
  • a user inputs input data and instructions to the user I/O device 320 and the computer system 300 by operating the user input device using the user interface, and receives output data from the computer system 300. good too.
  • the user interface may be displayed on a display device or played by speakers via the user I/O device 320, for example.
  • the storage interface 314 allows attachment of one or more disk drives or direct access storage devices 322 .
  • Storage device 322 may be implemented as any secondary storage device.
  • the contents of memory 304 may be stored in storage device 322 and read from storage device 322 as needed.
  • I/O device interface 316 may provide an interface to other I/O devices.
  • Network interface 318 may provide a communication pathway to allow computer system 300 and other devices to communicate with each other. This communication path may be, for example, network 330 .
  • computer system 300 includes a bus structure that provides a direct communication path between processor 302, memory 304, bus interface 309, display system 324, and I/O bus interface 310
  • computer system 300 is hierarchically organized. , star or web configuration point-to-point links, multiple hierarchical buses, parallel or redundant communication paths.
  • I/O bus interface 310 and I/O bus 308 are shown as a single unit, in reality computer system 300 may include multiple I/O bus interfaces 310 or multiple I/O buses 308 . may be provided.
  • multiple I/O interfaces are shown to separate the I/O bus 308 from various communication paths leading to various I/O devices, some or all of the I/O devices may be connected to a single interface. It may be connected directly to the system I/O bus.
  • Computer system 300 may be a device that receives requests from other computer systems (clients) that do not have a direct user interface, such as a multi-user mainframe computer system, a single-user system, or a server computer.
  • the display device 326 is an arbitrary configuration and may or may not be provided.
  • the imaging unit 2 can be applied as the user I/O device 320 . 2 is applied as the processing system unit 6 in FIG. 1, the display device 326 can be applied as the display output unit 7.
  • FIG. Also, the network 330 can be applied as a network interposed between the processing system section 3 and the processing system section 6 .
  • FIG. 3 is a diagram showing an example of processing for calculating feature amounts applied in the video processing system of the present invention.
  • FIG. 3 shows a configuration example of machine learning by CNN (Convolution Neural Networks) for estimating a person from a face image.
  • CNN Convolution Neural Networks
  • the number above each layer is the number of neurons in that layer, but these are just examples.
  • a portion of a specific image is input from the input layer 11, transmitted to the first convolution layer 12 and pooling layer 13, and connected to the convolution layer 12 and pooling layer 13, which are subsequent layers. After these processes there are fully connected layers, an input layer 16, an intermediate layer 17 and an output layer 18.
  • FIG. The number of neurons in the output layer 18 is equivalent to the number of classes. When face recognition is performed, it is almost equivalent to the number of people that can be specified.
  • a part of a specific image is input from the input layer 11, for example, a 200 ⁇ 200 image is input after being resized to 64 ⁇ 64.
  • the input layer 11 acquires image information of a specific size (64 ⁇ 64 pixels in FIG. 3).
  • the example in FIG. 3 is an image of a person's face captured by face detection.
  • the convolution layer 12 performs convolution processing.
  • the image acquired in the input layer 11 is filtered. Filtering reduces the size (60 ⁇ 60 in FIG. 3). Then, the number of filters prepared (eight in FIG. 3) is output.
  • pooling processing is performed in the pooling layer 13 .
  • the information output from the convolutional layer 12 is compressed. This halves the size (30 ⁇ 30 in FIG. 3).
  • the convolution layer 14 performs convolution processing.
  • the information compressed by the pooling layer 13 is further filtered to reduce the size (26 ⁇ 26 in FIG. 3). Then, the number of filters prepared (16 in FIG. 3) is output.
  • the pooling layer 15 performs pooling processing.
  • the information output from the convolution layer 14 is compressed. This halves the size (13 ⁇ 13 in FIG. 3).
  • the pooling layer 15 rearranges the three-dimensional information (13 ⁇ 13 ⁇ 16) into one-dimensional information (2704).
  • the information indicates the feature amount.
  • the convolution layer and the pooling layer are repeated twice (two layers), but the number of repetitions is not limited to this and may be more.
  • a mask image can be formed from the input layer 16 of the fully connected layer.
  • the mask image here means an image whose original image cannot be identified (if it is a face, it is not possible to identify someone from the image alone). This processing is irreversible video processing, and once the mask image is formed, the original image cannot be restored.
  • one-dimensional information 16-1 (2704 in FIG. 3), which is the information of the input layer 16 of the fully connected layer, is converted to two-dimensional image information 16-2 (52 ⁇ in FIG. 3). 52).
  • the information at this time can be held as image information, such as color density information in the case of a black and white image, and color type and color density information in the case of a color image.
  • a black-and-white image can be converted as 8-bit information per pixel
  • an RGB color image can be converted as 24-bit information per pixel.
  • the 52 ⁇ 52 pixel image information is expanded to a 200 ⁇ 200 pixel mask image 16-3. This is conversion processing for adjusting the size of the face image that was originally captured.
  • the created mask image 16-3 is returned to the original one-dimensional information for inference processing.
  • the mask image 16-3 (200 ⁇ 200 in FIG. 3) is restored by resizing the two-dimensional image information 16-4 (52 ⁇ 52 in FIG. 3) before stretching, and then the one-dimensional The information 16-1 (2704 in FIG. 3) is rearranged. This makes it possible to temporarily convert the information of the input layer 16 of the fully connected layer into the mask image 16-3 and put it on the image.
  • the intermediate layer 17 of the fully connected layer 1000 neurons are applied in FIG. This is an example and suitable numbers can be applied as needed. Also, the number of intermediate layers 17 may be increased to form a plurality of layers.
  • the number of neurons is the number of classes and corresponds to the number of classes that can be classified.
  • the person is estimated from the neuron that fires the most, such as Mr. A, Mr. B, and Mr. C.
  • Such an inference process can classify 100 people.
  • FIG. 4 is a diagram showing an example of processing of an imaging device in the video processing system of the present invention.
  • the processing here is performed by the imaging device 1 side, and is performed by the processing system unit 3 of the imaging device 1 unless otherwise specified.
  • irreversible image processing is performed.
  • the imaging device 1 first performs video shooting 21 . This is performed by the image pickup unit 2 and can be realized by an image pickup device and a video processing IC such as an FPGA. Filming is done on video. For example, shooting is performed at 30 frames per second (30 fps) or more.
  • the image captured by the imaging unit 2 is sent to the processing system unit 3 for each image of one frame, and can be processed.
  • Face detection 22 is a process of identifying the shape of a human face and detecting a range containing the face. This is done automatically using existing techniques. If a human face is identified, that area is detected. In addition, since the processing described later is performed, it is possible to perform detection processing when the range identified as a face has a certain number of pixels or more. If the number of bits handled by one neuron in the input layer 16 is the same as the number of bits in one pixel, the minimum range is set to 52 ⁇ 52 pixels in the example of FIG.
  • resizing 23 of the detection area is performed in the detection area resizing section. This resizes the area detected by face detection 22 to a predetermined size. Since the area detected by the face detection 22 is not constant, this resizing is to convert the area into a predetermined size suitable for the calculation of the next feature amount. In the example of FIG. 4, a process of converting 200 ⁇ 200 pixels to 64 ⁇ 64 pixels is performed.
  • feature quantity calculation 24 of the detection area is performed in the feature quantity calculation unit.
  • a feature amount necessary for face recognition is obtained using CNN or the like.
  • the calculation of this feature amount is the same as the processing from the input layer 11 to the input layer 16 of the fully connected layer described with reference to FIG.
  • rearrange/resize 25 the feature values processing is performed to convert the data into a format of a size applicable to the region where face detection has been performed.
  • the number of feature amount neurons calculated in the input layer 16 of the fully connected layer is 2704, and when this is converted to two dimensions, it becomes a 52 ⁇ 52 area.
  • the area detected by face detection 22 is 200 ⁇ 200.
  • the data of one neuron is expanded to about 4 pixels and allocated. As a result, the data of the 52 ⁇ 52 area are converted to the data of the 200 ⁇ 200 area.
  • the processing of rearranging/resizing the feature quantity 25 here is the same as the processing from the one-dimensional information 16-1 to the mask image 16-3 described with reference to FIG.
  • this feature amount must fit in the minimum image size data area for face detection, but depending on this minimum size, for example, the output of a pooling layer in the middle of CNN can be treated as a feature amount.
  • the rearranged feature amount is subjected to mask processing 26 on the original image detected by face detection 22 .
  • This is arranged on the original image by applying the rearranged feature amount (200 ⁇ 200) to the area detected by the face detection 22 as the mask image 16-3. Since the mask image 16-3 is an image with color types and densities based on feature amounts, it differs from the original image of the area detected by the face detection 22 and has information different from that of a human face.
  • masking metadata addition 27 is performed on the image that has undergone masking 26 .
  • the index number of the image subjected to the mask processing, the coordinates of the starting point on the image, the length of one side thereof, and the like are given.
  • information for specifying the masked area and information for specifying the masked image are added.
  • external output 28 is performed.
  • codec processing is performed in order to compress the transmission capacity.
  • a lossy codec is generally used, but depending on the application, only intermittent transmission of images is sufficient, in which case a lossless codec may be used.
  • the information externally output here is sent to the video processing device 5 via the Internet network or the like.
  • FIG. 5 is a diagram showing an example of processing of the video processing device in the video processing system of the present invention.
  • the processing here is performed on the video processing device 5 side, and is performed by the processing system unit 6 of the video processing device 5 unless otherwise specified.
  • inference processing by machine learning is performed to identify a person.
  • the feature amount extraction/resize/rearrangement unit extracts, resizes, and rearranges 32 the feature amount from the metadata of the video data.
  • the mask image 16-3 is extracted from the video data.
  • the range can be specified from the assigned metadata.
  • it is returned to two-dimensional image information 16-4 (52 ⁇ 52 in FIG. 5), and further rearranged into one-dimensional information 16-5 (2704 in FIG. 5).
  • This gives the value of the feature quantity. It should be noted that since this value undergoes processing such as resizing and codec on the way, the data value may deviate slightly and may not match completely. However, this deviation does not affect the process of obtaining the inference result from the next feature amount, and a value that is the same as or close to the original feature amount (one-dimensional information 16-1) is obtained.
  • the above processing can be performed by storing the information about the individual's face in the video processing device 5. For example, when outputting classes for 100 people, it is possible to store information for 100 people and identify individuals from feature amounts. Also, if the person does not correspond to the person recorded in advance, it is possible to prepare one class for outputting that the person is another person.
  • the data structure of the feature amount and the parameters for extracting the feature amount such as the parameters of the neural network are shared in advance between the imaging device 1 and the video processing device 5 .
  • the image processing apparatus 5 may have a function of setting the imaging apparatus 1 as well.
  • the imaging device 1 has a human detection function, detects the whole person, and masks the whole person with a two-dimensional image containing a feature amount. Then, the video processing device 5 infers what the behavior of the masked person is from the feature amount. In this case, the class outputs for each type of human action.
  • decoding the masked part restores the original image of a person, for example, and if that is leaked, all personal information contained in the image will be leaked.
  • the label information such as the name associated with face recognition can Minimal information with only action label information.
  • the transmission capacity can be reduced by embedding the feature amount in the mask image 16-3.
  • the present invention is not limited to the above-described embodiments, and includes various modifications.
  • the above-described embodiments have been described in detail in order to explain the present invention in an easy-to-understand manner, and are not necessarily limited to those having all the configurations described.
  • the process of embedding the feature amount in the mask image 16-3 is performed in order to reduce the transmission capacity.
  • SYMBOLS 1... Imaging device, 2... Imaging part, 3... Processing system part, 5... Video processing apparatus, 6... Processing system part, 7... Display output part, 11... Input layer, 12... Convolution layer, 13... Pooling layer, 14 Convolution layer 15 Pooling layer 16 Input layer of fully connected layer 17 Intermediate layer of fully connected layer 18 Output layer of fully connected layer 21 Video shooting 22 Face detection 23 Resize of detection area 24 Feature amount calculation of detection area 25 Rearrangement/resize of feature amount 26 Mask processing to original image 27 Addition of mask processing metadata 28 External output 31 Video input , 32... Extraction/resize/rearrangement of feature quantity 33... Acquisition of inference result from feature quantity 300... Computer system 302... Processor 302A, 302B... Processing device 304...
  • Memory 306 Memory bus 308... I/O bus 309 Bus interface 310 I/O bus interface 312 Terminal interface 314 Storage interface 316 I/O device interface 318 Network interface 320 User I/O device 322 ... storage device, 324 ... display system, 326 ... display device, 330 ... network, 350 ... application

Abstract

The purpose of the present invention is to provide an imaging device and a video processing system that can obtain predetermined information related to an image while protecting image information at a higher level. This invention involves an imaging device (1) and a video processing device (5). The imaging device (1) captures a video to obtain an image, detects a predetermined area in the image, resizes the detected area to extract feature values of the resized detected area, and outputs an image obtained by placing a mask image (16-3), in which the extracted feature values are arranged two-dimensionally, on the detected area of the obtained image. The video processing device (5) receives the image output by the imaging device, obtains the feature values from the mask image (16-3), and performs an inference process on the basis of the feature values.

Description

撮像装置及び映像処理システムImaging device and image processing system
 本発明は、撮像装置及び映像処理システムに関し、特に、機械学習で推論処理可能でプライバシー保護のための映像加工処理機能を有する撮像装置及び映像処理システムに関する。 The present invention relates to an imaging device and a video processing system, and more particularly to an imaging device and a video processing system that are capable of inference processing by machine learning and have video processing functions for privacy protection.
 近年、監視カメラなどで多数の人物を撮影するカメラの需要が増えている。これらのカメラはLAN(Local Area Network)に接続され、遠隔から映像監視ができるというメリットがある。一方で、セキュリティを突破された場合は、撮影された情報が流出する等して、プライバシー保護の観点で問題となることもある。 In recent years, there has been an increase in demand for cameras that capture a large number of people, such as surveillance cameras. These cameras are connected to a LAN (Local Area Network) and have the advantage of being able to monitor images remotely. On the other hand, if security is breached, captured information may be leaked, which may pose a problem in terms of privacy protection.
 そこで、特許文献1では撮影画像に対して、可逆型のモザイク処理やマスク処理などの加工処理を行うことによって、プライバシー保護を行う手法が開示されている。加工処理された画像は、対応する復元処理を行うことによって、元画像を復元することができる。 Therefore, Patent Document 1 discloses a method of protecting privacy by performing processing such as reversible mosaic processing and mask processing on captured images. The processed image can be restored to the original image by performing corresponding restoration processing.
特開2009-33738号公報JP-A-2009-33738
 特許文献1では、仮に復元処理を行うための復元情報も含めて外部に流失した場合、悪意のある第三者が復元処理を行い元の画像を入手することが可能となる。これを防ぐためには非可逆の画像をLAN上に配信する必要があるが、その場合は、元画像を復元することができない。このため、画像認識技術などによる顔認識や行動認識を行うことができなくなる。 In Patent Document 1, if restoration information including restoration information for performing restoration processing is leaked to the outside, a malicious third party can perform restoration processing and obtain the original image. In order to prevent this, it is necessary to distribute the irreversible image over the LAN, but in that case the original image cannot be restored. As a result, it becomes impossible to perform face recognition and action recognition using image recognition technology or the like.
 本発明は、上記課題に鑑みて、画像情報のより高い保護を行いながら画像に関する所定の情報を伝えることができる撮像装置及び映像処理システムを提供することを目的とする。 In view of the above problems, it is an object of the present invention to provide an imaging device and a video processing system capable of transmitting predetermined information about images while protecting image information at a higher level.
 上記目的を達成するため、代表的な本発明の撮像装置の一つは、映像を撮影して画像を取得し、前記画像内から所定の領域を検出し、検出した検出領域をリサイズして検出領域の特徴量を抽出し、前記抽出した特徴量を二次元に配列したマスク画像として前記取得した画像の検出領域に配置した画像を出力することを特徴とする。 In order to achieve the above object, one of the representative imaging devices of the present invention acquires an image by photographing a video, detects a predetermined area from the image, resizes the detected area, and detects it. It is characterized by extracting a feature amount of an area and outputting an image arranged in a detection area of the obtained image as a mask image in which the extracted feature amount is arranged two-dimensionally.
 さらに本発明の映像処理システムの一つは、撮像装置と、映像処理装置とを備え、前記撮像装置は、映像を撮影して画像を取得し、前記画像内から所定の領域を検出し、検出した検出領域をリサイズして検出領域の特徴量を抽出し、前記抽出した特徴量を二次元に配列したマスク画像として前記取得した画像の検出領域に配置した画像を出力し、前記映像処理装置は、前記撮像装置が出力した画像を入力して、前記マスク画像から特徴量を取得し、この特徴量に基づく推論処理を行うことを特徴とする。 Further, one of the video processing systems of the present invention includes an imaging device and a video processing device, wherein the imaging device acquires an image by capturing a video, detects a predetermined region from the resizing the detected area, extracting a feature amount of the detection area, and outputting an image arranged in the detection area of the acquired image as a mask image in which the extracted feature amount is arranged two-dimensionally, , the image output by the imaging device is input, the feature amount is obtained from the mask image, and an inference process is performed based on the feature amount.
 本発明によれば、撮像装置及び映像処理システムにおいて、画像情報のより高い保護を行いながら画像に関する所定の情報を伝えることができる。
 上記以外の課題、構成及び効果は、以下の実施形態により明らかにされる。
Advantageous Effects of Invention According to the present invention, in an imaging device and a video processing system, predetermined information regarding an image can be transmitted while protecting the image information at a higher level.
Problems, configurations, and effects other than those described above will be clarified by the following embodiments.
図1は、本発明の映像処理システムの一実施形態を示すブロック図である。FIG. 1 is a block diagram showing one embodiment of the video processing system of the present invention. 図2は、図1の処理システム部の一例を示すブロック図である。FIG. 2 is a block diagram showing an example of the processing system section of FIG. 図3は、本発明の映像処理システムで適用する特徴量を算出する処理の一例を示す図である。FIG. 3 is a diagram showing an example of processing for calculating feature amounts applied in the video processing system of the present invention. 図4は、本発明の映像処理システムにおける撮像装置の処理の一例を示す図である。FIG. 4 is a diagram showing an example of processing of the imaging device in the video processing system of the present invention. 図5は、本発明の映像処理システムにおける映像処理装置の処理の一例を示す図である。FIG. 5 is a diagram showing an example of processing of the video processing device in the video processing system of the present invention.
 本発明を実施するための形態を説明する。 A form for carrying out the present invention will be described.
 図1は、本発明の映像処理システムの一実施形態を示すブロック図である。図1の映像処理システムは、撮像装置1と映像処理装置5を備えている。そして、撮像装置1は、撮像部2と、処理システム部3を備えている。また、映像処理装置5は、処理システム部6と、表示出力部7を備えている。なお、表示出力部7は、映像処理装置5に備えず映像処理装置5とは別体で構成してもよい。映像処理装置5はパソコン、タブレット型コンピュータ、サーバなどを適用可能である。 FIG. 1 is a block diagram showing one embodiment of the video processing system of the present invention. The video processing system in FIG. 1 includes an imaging device 1 and a video processing device 5 . The imaging device 1 includes an imaging section 2 and a processing system section 3 . The video processing device 5 also includes a processing system section 6 and a display output section 7 . Note that the display output unit 7 may be configured separately from the video processing device 5 instead of being provided in the video processing device 5 . A personal computer, a tablet computer, a server, or the like can be applied to the video processing device 5 .
 撮像装置1は、1個以上のカメラの構成を備えており、様々な場所に配置可能である。例えば、監視カメラとして監視箇所に配置するなどである。 The imaging device 1 has a configuration of one or more cameras, and can be arranged in various places. For example, it may be installed at a monitoring location as a monitoring camera.
 撮像部2は、レンズや絞りを介して撮像素子に入射光を結像して情報を得るカメラの構成である。ここでの撮像素子の例としては、CCD(Charge-Coupled Device)イメージセンサやCMOS(Complementary Metal Oxide Semiconductor)イメージセンサ等があげられる。得られた情報は処理システム部3へ送られる。また、撮像部2は、FPGA(Field Programmable Gate Array)などの映像処理用IC(Integrated Circuit)を用い撮影処理を行うことができる。一方この映像処理用ICは、処理システム部3と一体化してもよい。 The imaging unit 2 is a camera configuration that obtains information by forming an image of incident light on an imaging device via a lens and a diaphragm. Examples of the imaging device here include a CCD (Charge-Coupled Device) image sensor and a CMOS (Complementary Metal Oxide Semiconductor) image sensor. The obtained information is sent to the processing system section 3 . In addition, the imaging unit 2 can perform imaging processing using a video processing IC (Integrated Circuit) such as an FPGA (Field Programmable Gate Array). On the other hand, this video processing IC may be integrated with the processing system section 3 .
 処理システム部3は、撮像部2で撮影した情報を取得して後述する図4の処理を行う。具体的な構成例については図2で後述し、具体的な処理の内容は図4で後述する。処理した情報は、処理システム部6へ送られる。 The processing system unit 3 acquires information captured by the imaging unit 2 and performs the processing of FIG. 4, which will be described later. A specific configuration example will be described later with reference to FIG. 2, and specific processing contents will be described later with reference to FIG. The processed information is sent to the processing system section 6 .
 処理システム部6は、処理システム部3からの情報を取得して後述する図5の処理を行う。具体的な構成例については図2で後述し、具体的な処理の内容は図5で後述する。 The processing system section 6 acquires information from the processing system section 3 and performs the processing of FIG. 5, which will be described later. A specific configuration example will be described later with reference to FIG. 2, and specific processing contents will be described later with reference to FIG.
 表示出力部7は、処理システム部6で処理した内容を表示できる装置である。例えば液晶ディスプレイ(LCD)、有機EL(OEL)ディスプレイ、タッチパネル等の構成により表示させる。 The display output unit 7 is a device that can display the content processed by the processing system unit 6. For example, it is displayed by a structure such as a liquid crystal display (LCD), an organic EL (OEL) display, a touch panel, or the like.
 撮像装置1と映像処理装置5の間は、インターネット網などを介して情報のやりとりを行える。例えばLAN等に接続する。この他、専用の通信回線を介して情報をやりとりしてもよい。すなわち、遠隔地にある撮像装置1の処理内容を映像処理装置5で確認できる。また、撮像装置1と映像処理装置5は1対1でなくともよく、1つの撮像装置1に対して複数の映像処理装置5が対応してもよく、複数の撮像装置1に対して1つの映像処理装置5が対応してもよい。また、映像処理装置5は、撮像装置1の設定や操作を可能に構成してもよい。 Information can be exchanged between the imaging device 1 and the video processing device 5 via the Internet network or the like. For example, it is connected to a LAN or the like. Alternatively, information may be exchanged via a dedicated communication line. That is, it is possible to check the processing contents of the imaging device 1 at a remote location with the video processing device 5 . Further, the imaging device 1 and the video processing device 5 may not be one-to-one, and one imaging device 1 may correspond to a plurality of video processing devices 5. A plurality of imaging devices 1 may correspond to one imaging device. The video processing device 5 may correspond. Also, the image processing device 5 may be configured to enable setting and operation of the imaging device 1 .
 図2は、図1の処理システム部の一例を示すブロック図である。処理システム部3、6の具体例として図2のコンピュータシステム300として説明する。 FIG. 2 is a block diagram showing an example of the processing system section of FIG. A specific example of the processing system units 3 and 6 will be described as a computer system 300 in FIG.
 コンピュータシステム300の主要コンポーネントは、1つ以上のプロセッサ302、メモリ304、端末インターフェース312、ストレージインターフェース314、I/O(入出力)デバイスインターフェース316、及びネットワークインターフェース318を含む。これらのコンポーネントは、メモリバス306、I/Oバス308、バスインターフェース309、及びI/Oバスインターフェース310を介して、相互的に接続されてもよい。 The major components of computer system 300 include one or more processors 302 , memory 304 , terminal interfaces 312 , storage interfaces 314 , I/O (input/output) device interfaces 316 , and network interfaces 318 . These components may be interconnected via memory bus 306 , I/O bus 308 , bus interface 309 and I/O bus interface 310 .
 コンピュータシステム300は、プロセッサ302と総称される1つ又は複数の処理装置302A及び302Bを含んでもよい。各プロセッサ302は、メモリ304に格納された命令を実行し、オンボードキャッシュを含んでもよい。処理装置としては、CPU(Central Processing Unit)、FPGA(Field-Programmable Gate Array)、GPU(Graphics Processong Unit)等を適用できる。 Computer system 300 may include one or more processing units 302 A and 302 B, collectively referred to as processor 302 . Each processor 302 executes instructions stored in memory 304 and may include an on-board cache. As the processing device, CPU (Central Processing Unit), FPGA (Field-Programmable Gate Array), GPU (Graphics Processing Unit), etc. can be applied.
 メモリ304は、データ及びプログラムを記憶するためのランダムアクセス半導体メモリ、記憶装置、又は記憶媒体(揮発性又は不揮発性のいずれか)を含んでもよい。また、メモリ304は、コンピュータシステム300の仮想メモリ全体を表しており、ネットワークを介してコンピュータシステム300に接続された他のコンピュータシステムの仮想メモリを含んでもよい。メモリ304は、概念的には単一のものとみなされてもよいが、キャッシュおよび他のメモリデバイスの階層など、より複雑な構成となる場合もある。 Memory 304 may include random access semiconductor memory, storage devices, or storage media (either volatile or non-volatile) for storing data and programs. Memory 304 also represents the entire virtual memory of computer system 300 and may include the virtual memory of other computer systems connected to computer system 300 over a network. Memory 304 may conceptually be viewed as a single entity, but may be more complex arrangements, such as hierarchies of caches and other memory devices.
 メモリ304は、本実施形態で説明する機能を実施するプログラム、モジュール、及びデータ構造のすべて又は一部を格納してもよい。例えば、メモリ304は、アプリケーション350を格納していてもよい。アプリケーション350は、後述する機能をプロセッサ302上で実行する命令又は記述を含んでもよく、あるいは別の命令又は記述によって解釈される命令又は記述を含んでもよい。アプリケーション350は、プロセッサベースのシステムの代わりに、またはプロセッサベースのシステムに加えて、半導体デバイス、チップ、論理ゲート、回路、回路カード、および/または他の物理ハードウェアデバイスを介してハードウェアで実施されてもよい。アプリケーション350は、命令又は記述以外のデータを含んでもよい。また、カメラやセンサ等の他のデータ入力デバイスが、バスインターフェース309、プロセッサ302、またはコンピュータシステム300の他のハードウェアと直接通信するように提供されてもよい。 The memory 304 may store all or part of the programs, modules, and data structures that implement the functions described in this embodiment. For example, memory 304 may store application 350 . Application 350 may include instructions or descriptions that perform the functions described below on processor 302, or may include instructions or descriptions that are interpreted by other instructions or descriptions. Application 350 may be implemented in hardware via semiconductor devices, chips, logic gates, circuits, circuit cards, and/or other physical hardware devices instead of or in addition to processor-based systems. may be Application 350 may include data other than instructions or descriptions. Other data input devices such as cameras and sensors may also be provided in direct communication with bus interface 309 , processor 302 , or other hardware of computer system 300 .
 コンピュータシステム300は、プロセッサ302、メモリ304、表示システム324、及びI/Oバスインターフェース310間の通信を行うバスインターフェース309を含んでもよい。I/Oバスインターフェース310は、様々なI/Oユニットとの間でデータを転送するためのI/Oバス308と連結していてもよい。I/Oバスインターフェース310は、I/Oバス308を介して、I/Oプロセッサ(IOP)又はI/Oアダプタ(IOA)としても知られる複数のI/Oインターフェース312、314、316、及び318と通信してもよい。表示システム324は、表示コントローラ、表示メモリ、又はその両方を含んでもよい。表示コントローラは、ビデオ、オーディオ、又はその両方のデータを表示装置326に提供することができる。また、コンピュータシステム300は、データを収集し、プロセッサ302に当該データを提供するように構成された1つまたは複数のセンサ等のデバイスを含んでもよい。表示システム324は、単独のディスプレイ画面、テレビ、タブレット、又は携帯型デバイスなどの表示装置326に接続されてもよい。表示装置326は、オーディオをレンダリングするためスピーカを含んでもよい。あるいは、オーディオをレンダリングするためのスピーカは、I/Oインターフェースと接続されてもよい。これ以外に、表示システム324が提供する機能は、プロセッサ302を含む集積回路によって実現されてもよい。同様に、バスインターフェース309が提供する機能は、プロセッサ302を含む集積回路によって実現されてもよい。 Computer system 300 may include bus interface 309 that provides communication between processor 302 , memory 304 , display system 324 , and I/O bus interface 310 . I/O bus interface 310 may couple to I/O bus 308 for transferring data to and from various I/O units. I/O bus interface 310 connects via I/O bus 308 to a plurality of I/O interfaces 312, 314, 316, and 318, also known as I/O processors (IOPs) or I/O adapters (IOAs). may communicate with Display system 324 may include a display controller, display memory, or both. The display controller can provide video, audio, or both data to display device 326 . Computer system 300 may also include one or more sensors or other devices configured to collect data and provide such data to processor 302 . The display system 324 may be connected to a display device 326 such as a single display screen, television, tablet, or handheld device. Display device 326 may include speakers for rendering audio. Alternatively, speakers for rendering audio may be connected to the I/O interface. Alternatively, the functionality provided by display system 324 may be implemented by an integrated circuit that includes processor 302 . Similarly, the functionality provided by bus interface 309 may be implemented by an integrated circuit including processor 302 .
 I/Oインターフェースは、様々なストレージ又はI/Oデバイスと通信する機能を備える。例えば、端末インターフェース312は、ビデオ表示装置、スピーカテレビ等のユーザ出力デバイスや、キーボード、マウス、キーパッド、タッチパッド、トラックボール、ボタン、ライトペン、又は他のポインティングデバイス等のユーザ入力デバイスのようなユーザI/Oデバイス320の取り付けが可能である。ユーザは、ユーザインターフェースを使用して、ユーザ入力デバイスを操作することで、ユーザI/Oデバイス320及びコンピュータシステム300に対して入力データや指示を入力し、コンピュータシステム300からの出力データを受け取ってもよい。ユーザインターフェースは例えば、ユーザI/Oデバイス320を介して、表示装置に表示されたり、スピーカによって再生されたりしてもよい。 The I/O interface has the ability to communicate with various storage or I/O devices. For example, terminal interface 312 may be a user output device such as a video display, speaker television, etc., or a user input device such as a keyboard, mouse, keypad, touchpad, trackball, buttons, light pen, or other pointing device. Any user I/O device 320 can be attached. A user inputs input data and instructions to the user I/O device 320 and the computer system 300 by operating the user input device using the user interface, and receives output data from the computer system 300. good too. The user interface may be displayed on a display device or played by speakers via the user I/O device 320, for example.
 ストレージインターフェース314は、1つ又は複数のディスクドライブや直接アクセス記憶装置322の取り付けが可能である。記憶装置322は、任意の二次記憶装置として実装されてもよい。メモリ304の内容は、記憶装置322に記憶され、必要に応じて記憶装置322から読み出されてもよい。I/Oデバイスインターフェース316は、他のI/Oデバイスに対するインターフェースを提供してもよい。ネットワークインターフェース318は、コンピュータシステム300と他のデバイスが相互的に通信できるように、通信経路を提供してもよい。この通信経路は、例えば、ネットワーク330であってもよい。 The storage interface 314 allows attachment of one or more disk drives or direct access storage devices 322 . Storage device 322 may be implemented as any secondary storage device. The contents of memory 304 may be stored in storage device 322 and read from storage device 322 as needed. I/O device interface 316 may provide an interface to other I/O devices. Network interface 318 may provide a communication pathway to allow computer system 300 and other devices to communicate with each other. This communication path may be, for example, network 330 .
 コンピュータシステム300は、プロセッサ302、メモリ304、バスインターフェース309、表示システム324、及びI/Oバスインターフェース310の間の直接通信経路を提供するバス構造を備えているが、コンピュータシステム300は、階層構成、スター構成、又はウェブ構成のポイントツーポイントリンク、複数の階層バス、平行又は冗長の通信経路を含んでもよい。さらに、I/Oバスインターフェース310及びI/Oバス308が単一のユニットとして示されているが、実際には、コンピュータシステム300は複数のI/Oバスインターフェース310又は複数のI/Oバス308を備えてもよい。また、I/Oバス308を様々なI/Oデバイスに繋がる各種通信経路から分離するための複数のI/Oインターフェースが示されているが、I/Oデバイスの一部または全部が、1つのシステムI/Oバスに直接接続されてもよい。 Although computer system 300 includes a bus structure that provides a direct communication path between processor 302, memory 304, bus interface 309, display system 324, and I/O bus interface 310, computer system 300 is hierarchically organized. , star or web configuration point-to-point links, multiple hierarchical buses, parallel or redundant communication paths. Further, although I/O bus interface 310 and I/O bus 308 are shown as a single unit, in reality computer system 300 may include multiple I/O bus interfaces 310 or multiple I/O buses 308 . may be provided. Also, although multiple I/O interfaces are shown to separate the I/O bus 308 from various communication paths leading to various I/O devices, some or all of the I/O devices may be connected to a single interface. It may be connected directly to the system I/O bus.
 コンピュータシステム300は、マルチユーザメインフレームコンピュータシステム、シングルユーザシステム、又はサーバコンピュータ等の、直接的ユーザインターフェースを有しない、他のコンピュータシステム(クライアント)からの要求を受信するデバイスであってもよい。 Computer system 300 may be a device that receives requests from other computer systems (clients) that do not have a direct user interface, such as a multi-user mainframe computer system, a single-user system, or a server computer.
 図2のコンピュータシステム300を図1の処理システム部3に適用する場合は、表示装置326は任意の構成であり、備えていてもいなくてもよい。また、撮像部2はユーザI/Oデバイス320として適用可能である。また、図2のコンピュータシステム300を図1の処理システム部6として適用した場合は、表示装置326は表示出力部7として適用可能である。また、ネットワーク330は、処理システム部3と処理システム部6との間に介在するネットワークとして適用可能である。 When the computer system 300 in FIG. 2 is applied to the processing system section 3 in FIG. 1, the display device 326 is an arbitrary configuration and may or may not be provided. Also, the imaging unit 2 can be applied as the user I/O device 320 . 2 is applied as the processing system unit 6 in FIG. 1, the display device 326 can be applied as the display output unit 7. FIG. Also, the network 330 can be applied as a network interposed between the processing system section 3 and the processing system section 6 .
 図3は、本発明の映像処理システムで適用する特徴量を算出する処理の一例を示す図である。図3は、顔の画像から人物を推定するCNN(Convolution Neural Networks)による機械学習の構成例を示す。各層の上部に記載した数はその層のニューロンの数であるが、これらは一例を示している。 FIG. 3 is a diagram showing an example of processing for calculating feature amounts applied in the video processing system of the present invention. FIG. 3 shows a configuration example of machine learning by CNN (Convolution Neural Networks) for estimating a person from a face image. The number above each layer is the number of neurons in that layer, but these are just examples.
 入力層11から特定の画像の一部分が入力され、それが1層目の畳込み層12、プーリング層13と伝達され、後段の層である畳込み層12、プーリング層13とつながっている。これらの処理の後には全結合層があり、入力層16、中間層17、出力層18が存在する。出力層18のニューロンの数はクラスの数と等価である。顔認識を行う場合は特定できる人の数とほぼ等価となる。尚、入力層11から特定の画像の一部分が入力される場合、例として200×200の画像が64×64にリサイズされたのちに入力されている。 A portion of a specific image is input from the input layer 11, transmitted to the first convolution layer 12 and pooling layer 13, and connected to the convolution layer 12 and pooling layer 13, which are subsequent layers. After these processes there are fully connected layers, an input layer 16, an intermediate layer 17 and an output layer 18. FIG. The number of neurons in the output layer 18 is equivalent to the number of classes. When face recognition is performed, it is almost equivalent to the number of people that can be specified. When a part of a specific image is input from the input layer 11, for example, a 200×200 image is input after being resized to 64×64.
 入力層11では、特定の大きさの画像情報(図3では64×64ピクセル)を取得する。図3の例では、顔検出により取り込んだ人の顔の画像である。 The input layer 11 acquires image information of a specific size (64×64 pixels in FIG. 3). The example in FIG. 3 is an image of a person's face captured by face detection.
 次に、畳込み層12では畳み込み処理を行う。入力層11で取得した画像に対してフィルタをかけていく。フィルタをかけることにより、サイズは小さくなる(図3では60×60)。そして、用意したフィルタの数(図3では8個)分だけ出力される。 Next, the convolution layer 12 performs convolution processing. The image acquired in the input layer 11 is filtered. Filtering reduces the size (60×60 in FIG. 3). Then, the number of filters prepared (eight in FIG. 3) is output.
 次に、プーリング層13ではプーリング処理を行う。畳込み層12で出力した情報に対して圧縮をかけていく。これにより、サイズは半分となる(図3では30×30)。 Next, pooling processing is performed in the pooling layer 13 . The information output from the convolutional layer 12 is compressed. This halves the size (30×30 in FIG. 3).
 次に、畳込み層14では畳み込み処理を行う。プーリング層13で圧縮した情報に対して、さらにフィルタをかけて、サイズを小さくする(図3では26×26)。そして、用意したフィルタの数(図3では16個)分だけ出力される。 Next, the convolution layer 14 performs convolution processing. The information compressed by the pooling layer 13 is further filtered to reduce the size (26×26 in FIG. 3). Then, the number of filters prepared (16 in FIG. 3) is output.
 次に、プーリング層15ではプーリング処理を行う。畳込み層14で出力した情報に対して圧縮をかけていく。これにより、サイズは半分となる(図3では13×13)。 Next, the pooling layer 15 performs pooling processing. The information output from the convolution layer 14 is compressed. This halves the size (13×13 in FIG. 3).
 次に、全結合層の入力層16では、プーリング層15で三次元の情報(13×13×16)を一次元の情報(2704)に並べなおしたものである。ここでの情報は特徴量を示している。なお、図3では、畳込み層とプーリング層の繰り返しは、2回(2層)での繰り返しで示したが、これに限ることはなく、さらに多くの繰り返しとしてもよい。 Next, in the input layer 16 of the fully connected layer, the pooling layer 15 rearranges the three-dimensional information (13×13×16) into one-dimensional information (2704). The information here indicates the feature amount. In FIG. 3, the convolution layer and the pooling layer are repeated twice (two layers), but the number of repetitions is not limited to this and may be more.
 全結合層の入力層16から、マスク画像を形成することができる。マスク画像は、ここでは元の画像が特定できない(顔であれば画像のみから誰かを特定できない)画像を意味する。この処理は、非可逆な映像加工処理であり、一度マスク画像を形成すると元の画像を復元することはできなくなる。 A mask image can be formed from the input layer 16 of the fully connected layer. The mask image here means an image whose original image cannot be identified (if it is a face, it is not possible to identify someone from the image alone). This processing is irreversible video processing, and once the mask image is formed, the original image cannot be restored.
 具体的には、図3に示すように全結合層の入力層16の情報である一次元の情報16-1(図3では2704)を二次元の画像情報16-2(図3では52×52)に並べなおす。このときの情報は、画像の情報として、白黒画像であれば色の濃さの情報として、カラー画像であれば、色の種類と濃さの情報として、保持することができる。例えば、白黒の画像であれば1ピクセルが8ビットの情報として、RGBのカラー画像であれば1ピクセルが24ビットの情報として変換可能である。その52×52ピクセルの画像情報を200×200ピクセルのマスク画像16-3に引き延ばす。これは、もともと取り込んだ顔の画像の大きさに合わせるための変換処理である。 Specifically, as shown in FIG. 3, one-dimensional information 16-1 (2704 in FIG. 3), which is the information of the input layer 16 of the fully connected layer, is converted to two-dimensional image information 16-2 (52× in FIG. 3). 52). The information at this time can be held as image information, such as color density information in the case of a black and white image, and color type and color density information in the case of a color image. For example, a black-and-white image can be converted as 8-bit information per pixel, and an RGB color image can be converted as 24-bit information per pixel. The 52×52 pixel image information is expanded to a 200×200 pixel mask image 16-3. This is conversion processing for adjusting the size of the face image that was originally captured.
 そして、作成されたマスク画像16-3は推論処理のため元の一次元の情報に戻す。具体的には、マスク画像16-3(図3では200×200)を、引き延ばす前の二次元の画像情報16-4(図3では52×52)をリサイズにより戻して、さらに、一次元の情報16-1(図3では2704)に並べなおす。このことにより、全結合層の入力層16の情報を、一旦マスク画像16-3に変換して、画像に載せることが可能となる。 Then, the created mask image 16-3 is returned to the original one-dimensional information for inference processing. Specifically, the mask image 16-3 (200×200 in FIG. 3) is restored by resizing the two-dimensional image information 16-4 (52×52 in FIG. 3) before stretching, and then the one-dimensional The information 16-1 (2704 in FIG. 3) is rearranged. This makes it possible to temporarily convert the information of the input layer 16 of the fully connected layer into the mask image 16-3 and put it on the image.
 次に、全結合層の中間層17では、図3では1000個のニューロン数を適用している。これは、一例であり、必要に応じてふさわしい数が適用できる。また、中間層17の数を増やして、複数の層で構成してもよい。 Next, in the intermediate layer 17 of the fully connected layer, 1000 neurons are applied in FIG. This is an example and suitable numbers can be applied as needed. Also, the number of intermediate layers 17 may be increased to form a plurality of layers.
 次の、全結合層の出力層18では、100個のニューロン数を適用している。ここでは、このニューロン数はクラス数となり、分類可能な数に相当する。例えば、顔の認識であれば、Aさん、Bさん、Cさんというようにして、一番発火したニューロンから誰であるかを推定する。このような推論処理により、100人の人の分類が可能である。もしくは、99人の分類として、残りの1つはその他とすることも可能である。 In the output layer 18 of the next fully connected layer, 100 neurons are applied. Here, the number of neurons is the number of classes and corresponds to the number of classes that can be classified. For example, in the case of face recognition, the person is estimated from the neuron that fires the most, such as Mr. A, Mr. B, and Mr. C. Such an inference process can classify 100 people. Alternatively, it is possible to classify the 99 persons and the remaining one as others.
 図4は、本発明の映像処理システムにおける撮像装置の処理の一例を示す図である。ここでの処理は、撮像装置1側で行い、特に記載がない場合は撮像装置1の処理システム部3で行われる。ここでは、非可逆な映像加工処理が行われる。 FIG. 4 is a diagram showing an example of processing of an imaging device in the video processing system of the present invention. The processing here is performed by the imaging device 1 side, and is performed by the processing system unit 3 of the imaging device 1 unless otherwise specified. Here, irreversible image processing is performed.
 撮像装置1ではまず初めに映像撮影21を行う。これは撮像部2により行い、撮像素子とFPGAなどの映像処理用ICなどで実現できる。撮影は映像で撮影される。例えば、1秒間に30フレーム(30fps)以上等の撮影とする等である。撮像部2で撮影された映像は1フレームの画像ごとに処理システム部3へ送られそれぞれ処理を行うことができる。 The imaging device 1 first performs video shooting 21 . This is performed by the image pickup unit 2 and can be realized by an image pickup device and a video processing IC such as an FPGA. Filming is done on video. For example, shooting is performed at 30 frames per second (30 fps) or more. The image captured by the imaging unit 2 is sent to the processing system unit 3 for each image of one frame, and can be processed.
 次に、処理システム部3では、この入力された映像に対して顔検出22を行う。顔検出22は、人間の顔の形を識別し、顔を含む範囲を検出する処理である。これは既存の手法を用いて自動で行われる。人間の顔と識別した場合はその領域を検出する。また、後述する処理を行うため、顔と識別した範囲が、ある程度の画素数以上の場合に検出する処理とすることができる。入力層16の1つのニューロンが扱うビット数が、1ピクセルのビット数と同じ場合、図4の例では、最小の範囲が52×52ピクセルに設定されている。 Next, the processing system unit 3 performs face detection 22 on the input video. Face detection 22 is a process of identifying the shape of a human face and detecting a range containing the face. This is done automatically using existing techniques. If a human face is identified, that area is detected. In addition, since the processing described later is performed, it is possible to perform detection processing when the range identified as a face has a certain number of pixels or more. If the number of bits handled by one neuron in the input layer 16 is the same as the number of bits in one pixel, the minimum range is set to 52×52 pixels in the example of FIG.
 次に、検出領域のリサイズ部で検出領域のリサイズ23を行う。これは、顔検出22で検出された領域をあらかじめ決めたサイズにリサイズする。このリサイズは、顔検出22で検出される領域は一定でないため次の特徴量の計算に適した所定のサイズへの変換を行うものである。図4の例では、200×200ピクセルを64×64ピクセルへ変換する処理を行う。 Next, resizing 23 of the detection area is performed in the detection area resizing section. This resizes the area detected by face detection 22 to a predetermined size. Since the area detected by the face detection 22 is not constant, this resizing is to convert the area into a predetermined size suitable for the calculation of the next feature amount. In the example of FIG. 4, a process of converting 200×200 pixels to 64×64 pixels is performed.
 次に、特徴量計算部で検出領域の特徴量計算24を行う。ここでは、CNNなどを用いて顔認識に必要な特徴量を求める。この特徴量の計算は、図3で説明した入力層11~全結合層の入力層16までの処理と同様である。 Next, feature quantity calculation 24 of the detection area is performed in the feature quantity calculation unit. Here, a feature amount necessary for face recognition is obtained using CNN or the like. The calculation of this feature amount is the same as the processing from the input layer 11 to the input layer 16 of the fully connected layer described with reference to FIG.
 次に、特徴量の再配列/リサイズ25を行う。ここでは、顔検出を行った領域に適用できる大きさのフォーマットにデータを変換する処理を行う。全結合層の入力層16で算出された特徴量のニューロンの数は2704であり、これを二次元に変換すると52×52の領域となる。一方、顔検出22で検出した領域は200×200である。特徴量のニューロンの数から算出される二次元の領域52×52のデータを、顔検出22の領域200×200に当てはめるため、1ニューロンのデータがおおよそ4画素に拡大して割り当てる。これにより、領域52×52のデータを領域200×200のデータに変換する。なお、ここでの特徴量の再配列/リサイズ25の処理は、図3で説明した一次元の情報16-1から、マスク画像16-3までの処理と同様である。 Next, rearrange/resize 25 the feature values. Here, processing is performed to convert the data into a format of a size applicable to the region where face detection has been performed. The number of feature amount neurons calculated in the input layer 16 of the fully connected layer is 2704, and when this is converted to two dimensions, it becomes a 52×52 area. On the other hand, the area detected by face detection 22 is 200×200. In order to apply the data of a two-dimensional area of 52×52 calculated from the number of feature amount neurons to the area of 200×200 of the face detection 22, the data of one neuron is expanded to about 4 pixels and allocated. As a result, the data of the 52×52 area are converted to the data of the 200×200 area. Note that the processing of rearranging/resizing the feature quantity 25 here is the same as the processing from the one-dimensional information 16-1 to the mask image 16-3 described with reference to FIG.
 ここで、上述した拡大率が大きいほどマスクの領域の画素間やフレーム間の変化が少なくなる。これにより、画素間やフレーム間の急激な変化が緩和されて非可逆コーデックによる処理が行いやすくなる。また、この特徴量は顔検出が行われる最小の画像サイズのデータ領域に収まる必要があるが、この最小サイズによっては例えばCNNの途中のプーリング層の出力を特徴量として扱うことも可能である。 Here, the larger the enlargement ratio described above, the smaller the change between pixels in the mask area and between frames. As a result, sudden changes between pixels or between frames are alleviated, making it easier to perform processing using the irreversible codec. In addition, this feature amount must fit in the minimum image size data area for face detection, but depending on this minimum size, for example, the output of a pooling layer in the middle of CNN can be treated as a feature amount.
 次に、再配列された特徴量は顔検出22で検出された元画像へのマスク処理26が行われる。これは、顔検出22で検出した領域に再配列された特徴量(200×200)をマスク画像16-3として当てはめることにより元画像上に配置される。マスク画像16-3は、特徴量に基づく色の種類や濃さの画像のため、顔検出22で検出した領域の元画像とは異なり、人の顔とは異なる情報となっている。 Next, the rearranged feature amount is subjected to mask processing 26 on the original image detected by face detection 22 . This is arranged on the original image by applying the rearranged feature amount (200×200) to the area detected by the face detection 22 as the mask image 16-3. Since the mask image 16-3 is an image with color types and densities based on feature amounts, it differs from the original image of the area detected by the face detection 22 and has information different from that of a human face.
 次に、マスク処理26が行われた画像に対して、マスク処理メタデータ付与27が行われる。ここでは、マスク処理が行われた画像のインデックス番号や画像上の始点の座標、その一辺の長さなどが付与される。これにより、マスク処理が行われている領域を特定するために情報やマスク処理が行われた画像を特定するための情報が付与される。 Next, masking metadata addition 27 is performed on the image that has undergone masking 26 . Here, the index number of the image subjected to the mask processing, the coordinates of the starting point on the image, the length of one side thereof, and the like are given. As a result, information for specifying the masked area and information for specifying the masked image are added.
 次に、外部出力28される。ここで、外部出力する際には伝送容量を圧縮するためにコーデックによる処理が行われる。映像の場合では一般に非可逆コーデックが用いられるが、アプリケーションによっては画像の間欠伝送のみでよく、その場合は可逆コーデックを用いてもよい。ここでの外部出力された情報は、インターネット網等を介して映像処理装置5へ送られる。 Next, external output 28 is performed. Here, when outputting to the outside, codec processing is performed in order to compress the transmission capacity. In the case of video, a lossy codec is generally used, but depending on the application, only intermittent transmission of images is sufficient, in which case a lossless codec may be used. The information externally output here is sent to the video processing device 5 via the Internet network or the like.
 図5は、本発明の映像処理システムにおける映像処理装置の処理の一例を示す図である。ここでの処理は、映像処理装置5側で行い、特に記載がない場合は映像処理装置5の処理システム部6で行われる。ここでは、機械学習による推論処理を行い、人を特定する。 FIG. 5 is a diagram showing an example of processing of the video processing device in the video processing system of the present invention. The processing here is performed on the video processing device 5 side, and is performed by the processing system unit 6 of the video processing device 5 unless otherwise specified. Here, inference processing by machine learning is performed to identify a person.
 まず、図4の外部出力28において撮像装置1から出力された画像を有する映像データを映像処理装置5の映像入力部に映像入力31を行う。 First, video data having an image output from the imaging device 1 at the external output 28 in FIG.
 次に、その映像データのメタデータから特徴量の抽出/リサイズ・再配列部で、特徴量の抽出、リサイズ、再配列32の処理を行う。この処理は、まず初めに映像データから、マスク画像16-3の抽出を行う。これは、付与されているメタデータから範囲を特定することができる。次に、二次元の画像情報16-4(図5では52×52)に戻して、さらに、一次元の情報16-5(図5では2704)に並べなおす。これは、図3と同様である。これにより特徴量の値が得られる。なお、この値は、途中でリサイズやコーデック等の処理を行っているため、データの値がわずかにずれて、完全に一致しない場合もある。しかし、このずれは次の特徴量から推論結果を取得する処理に影響がない程度であり、元の特徴量(一次元の情報16-1)と同じか近しい値が得られる。 Next, the feature amount extraction/resize/rearrangement unit extracts, resizes, and rearranges 32 the feature amount from the metadata of the video data. In this process, first, the mask image 16-3 is extracted from the video data. The range can be specified from the assigned metadata. Next, it is returned to two-dimensional image information 16-4 (52×52 in FIG. 5), and further rearranged into one-dimensional information 16-5 (2704 in FIG. 5). This is similar to FIG. This gives the value of the feature quantity. It should be noted that since this value undergoes processing such as resizing and codec on the way, the data value may deviate slightly and may not match completely. However, this deviation does not affect the process of obtaining the inference result from the next feature amount, and a value that is the same as or close to the original feature amount (one-dimensional information 16-1) is obtained.
 次に、特徴量から推論結果の取得33を行う。これは、図3の全結合層16~18の処理と同様である。ここでは、特徴量から推論結果取得部によってそのクラスを特定する。図5の例の場合では、推論処理により、顔から個人を特定することができる。 Next, acquire 33 the inference result from the feature quantity. This is similar to the processing of the all-bonded layers 16-18 of FIG. Here, the class is specified by the inference result acquisition unit from the feature amount. In the case of the example of FIG. 5, an individual can be identified from the face by inference processing.
 なお、個人の顔に関する情報は、映像処理装置5に記憶しておくことで、上記の処理を行える。例えば、100人分のクラスを出力する場合は、100人分の情報を保持しておき、特徴量から個人を特定することが可能となる。また、予め記録した人に該当しない場合は、その他の人であることを出力するクラスを1つ用意しておくことも可能である。 It should be noted that the above processing can be performed by storing the information about the individual's face in the video processing device 5. For example, when outputting classes for 100 people, it is possible to store information for 100 people and identify individuals from feature amounts. Also, if the person does not correspond to the person recorded in advance, it is possible to prepare one class for outputting that the person is another person.
 また、特徴量のデータ構造やニューラルネットワークのパラメータ等の特徴量の抽出のためのパラメータ等の取り決めは、事前に撮像装置1と映像処理装置5の間で共有しておく。このことで、マスク画像16-3が映像処理装置5に送られた場合、一次元の情報16-5に戻して特徴量からクラスを出力することが可能となる。このパラメータの設定について、映像処理装置5から撮像装置1の設定も行える機能を有しておいてもよい。 In addition, the data structure of the feature amount and the parameters for extracting the feature amount such as the parameters of the neural network are shared in advance between the imaging device 1 and the video processing device 5 . As a result, when the mask image 16-3 is sent to the video processing device 5, it is possible to restore the one-dimensional information 16-5 and output the class from the feature amount. Regarding the setting of this parameter, the image processing apparatus 5 may have a function of setting the imaging apparatus 1 as well.
 上記の実施形態は、顔検出により人を特定する処理の例について示したが、人の行動についても特定できる。例えば、撮像装置1では、人検出機能を備え、人全体を検出すると共に特徴量が含まれる二次元画像により人全体をマスクする。そして、映像処理装置5では、その特徴量からマスクした人の行動が何であるかを推論するものである。この場合、クラスは人の行動の種類ごとに出力する。 Although the above embodiment shows an example of processing for identifying a person by face detection, it is also possible to identify a person's behavior. For example, the imaging device 1 has a human detection function, detects the whole person, and masks the whole person with a two-dimensional image containing a feature amount. Then, the video processing device 5 infers what the behavior of the masked person is from the feature amount. In this case, the class outputs for each type of human action.
(効果)
 上記の実施形態では、プライバシー保護が重要となる人物領域(顔や人全体)の非可逆なマスク処理が実現できる。それと同時に、その伝送先では人や行動の特定に必要なデータも含めて受信でき、必要に応じて後処理の推論を実行する。このことによってマスクされた領域でも、その人が誰であるかや行動が何であるかを判別することができる。
(effect)
In the above-described embodiment, irreversible mask processing of a person area (face or whole person) for which privacy protection is important can be realized. At the same time, the destination can also receive the data necessary to identify people and actions, and perform post-processing inferences as needed. This makes it possible to determine who the person is and what their actions are even in the masked area.
 従来の可逆なマスク処理を用いる場合、マスクされていた部分を復号すると例えば元の人の画像が復元され、それが流出すると画像に含まれるあらゆる個人情報が流出することとなる。その一方で、本実施形態による手法では万が一情報が流出し悪意のある第三者に復号されたとしても、顔認識であればそれに対応付けられる名前などのラベル情報のみ、行動認識であればその行動のラベル情報のみの最小限の情報に抑えられる。 When using conventional reversible mask processing, decoding the masked part restores the original image of a person, for example, and if that is leaked, all personal information contained in the image will be leaked. On the other hand, with the method according to the present embodiment, even if information is leaked and decrypted by a malicious third party, only the label information such as the name associated with face recognition can Minimal information with only action label information.
 さらに、撮像装置側で人認識や行動認識結果まで推論を行う場合、そのデータを伝送して、その通信を傍受されてしまうとラベル情報が流出してしまう。一方で、本実施形態では受信した映像処理装置5側で特徴量から推論を行う。このため、撮像装置1からのデータが流出したとしても、特徴量のデータ構造や、ニューラルネットワークのパラメータの構造等の取り決めが分からない限り、推論を行うことができない。このため、撮像装置1からの情報は、通信の暗号化に加えて二重に保護されており、より復号が難しいデータとすることができる。また、特徴量をマスク画像16-3に埋め込むことで伝送容量の削減をすることができる。 Furthermore, when inferring human recognition and action recognition results on the imaging device side, if the data is transmitted and the communication is intercepted, the label information will be leaked. On the other hand, in the present embodiment, inference is made from the feature quantity on the side of the video processing device 5 that has received it. Therefore, even if the data from the imaging device 1 leaks out, inference cannot be made unless the data structure of the feature amount, the structure of the parameters of the neural network, and the like are known. Therefore, the information from the imaging device 1 is double-protected in addition to the communication encryption, and can be data that is more difficult to decrypt. Also, the transmission capacity can be reduced by embedding the feature amount in the mask image 16-3.
 以上の様に、本発明の実施形態について説明してきたが、本発明は上記した実施形態に限定されるものではなく、様々な変形例が含まれる。例えば、上記した実施形態は本発明を分かりやすく説明するために詳細に説明したものであり、必ずしも説明した全ての構成を備えるものに限定されるものではない。また、各実施形態の構成の一部について、他の構成の追加・削除・置換をすることが可能である。 As described above, the embodiments of the present invention have been described, but the present invention is not limited to the above-described embodiments, and includes various modifications. For example, the above-described embodiments have been described in detail in order to explain the present invention in an easy-to-understand manner, and are not necessarily limited to those having all the configurations described. Moreover, it is possible to add, delete, or replace part of the configuration of each embodiment with another configuration.
 例えば、上記の実施形態では、伝送容量の削減のために特徴量をマスク画像16-3に埋め込む処理を行っている。しかし、画像には特徴量の情報を埋め込まない適当なマスク処理(例えば、同一の色と濃さでのマスク)を行い、特徴量の情報と画像とを分けて伝送する構成も適用できる。 For example, in the above embodiment, the process of embedding the feature amount in the mask image 16-3 is performed in order to reduce the transmission capacity. However, it is also possible to apply a configuration in which the image is subjected to appropriate mask processing (for example, masking with the same color and density) without embedding the feature amount information, and the feature amount information and the image are separately transmitted.
 また、上記の実施形態では、CNNによる例を示したが、機械学習としては、DNN(Deep Neural Networks)の手法を用いても、本発明を適用することができる。 Also, in the above embodiment, an example using CNN was shown, but the present invention can also be applied using a DNN (Deep Neural Networks) technique as machine learning.
1…撮像装置、2…撮像部、3…処理システム部、5…映像処理装置、6…処理システム部、7…表示出力部、11…入力層、12…畳込み層、13…プーリング層、14…畳込み層、15…プーリング層、16…全結合層の入力層、17…全結合層の中間層、18…全結合層の出力層、21…映像撮影、22…顔検出、23…検出領域のリサイズ、24…検出領域の特徴量計算、25…特徴量の再配列/リサイズ、26…元画像へのマスク処理、27…マスク処理メタデータ付与、28…外部出力、31…映像入力、32…特徴量の抽出/リサイズ・再配列、33…特徴量から推論結果の取得、300…コンピュータシステム、302…プロセッサ、302A、302B…処理装置、304…メモリ、306…メモリバス、308…I/Oバス、309…バスインターフェース、310…I/Oバスインターフェース、312…端末インターフェース、314…ストレージインターフェース、316…I/Oデバイスインターフェース、318…ネットワークインターフェース、320…ユーザI/Oデバイス、322…記憶装置、324…表示システム、326…表示装置、330…ネットワーク、350…アプリケーション DESCRIPTION OF SYMBOLS 1... Imaging device, 2... Imaging part, 3... Processing system part, 5... Video processing apparatus, 6... Processing system part, 7... Display output part, 11... Input layer, 12... Convolution layer, 13... Pooling layer, 14 Convolution layer 15 Pooling layer 16 Input layer of fully connected layer 17 Intermediate layer of fully connected layer 18 Output layer of fully connected layer 21 Video shooting 22 Face detection 23 Resize of detection area 24 Feature amount calculation of detection area 25 Rearrangement/resize of feature amount 26 Mask processing to original image 27 Addition of mask processing metadata 28 External output 31 Video input , 32... Extraction/resize/rearrangement of feature quantity 33... Acquisition of inference result from feature quantity 300... Computer system 302... Processor 302A, 302B... Processing device 304... Memory 306... Memory bus 308... I/O bus 309 Bus interface 310 I/O bus interface 312 Terminal interface 314 Storage interface 316 I/O device interface 318 Network interface 320 User I/O device 322 ... storage device, 324 ... display system, 326 ... display device, 330 ... network, 350 ... application

Claims (8)

  1. 映像を撮影して画像を取得し、前記画像内から所定の領域を検出し、検出した検出領域をリサイズして検出領域の特徴量を抽出し、前記抽出した特徴量を二次元に配列したマスク画像として前記取得した画像の検出領域に配置した画像を出力することを特徴とする撮像装置。 An image is obtained by photographing a video, a predetermined area is detected from the image, the detected detection area is resized to extract the feature amount of the detection area, and the extracted feature amount is arranged two-dimensionally as a mask. An imaging device, which outputs an image arranged in a detection area of the acquired image as an image.
  2.  請求項1に記載の撮像装置において、
     前記特徴量の抽出は、CNN(Convolution Neural Networks)又はDNN(Deep Neural Networks)の手法を用いて行うことを特徴とする撮像装置。
    The imaging device according to claim 1,
    An imaging apparatus, wherein the extraction of the feature amount is performed using a CNN (Convolution Neural Networks) or DNN (Deep Neural Networks) technique.
  3.  請求項1に記載の撮像装置において、
     前記所定の領域は、人の顔の領域であることを特徴とする撮像装置。
    The imaging device according to claim 1,
    The imaging device, wherein the predetermined area is an area of a person's face.
  4.  請求項1に記載の撮像装置において、
     出力する前記画像には、当該画像内における前記マスク画像の範囲を特定する情報を付与することを特徴とする撮像装置。
    The imaging device according to claim 1,
    1. An imaging apparatus, wherein the image to be output is provided with information for specifying the range of the mask image in the image.
  5.  撮像装置と、映像処理装置とを備え、
     前記撮像装置は、映像を撮影して画像を取得し、前記画像内から所定の領域を検出し、検出した検出領域をリサイズして検出領域の特徴量を抽出し、前記抽出した特徴量を二次元に配列したマスク画像として前記取得した画像の検出領域に配置した画像を出力し、
     前記映像処理装置は、前記撮像装置が出力した画像を入力して、前記マスク画像から特徴量を取得し、この特徴量に基づく推論処理を行うことを特徴とする映像処理システム。
    comprising an imaging device and a video processing device,
    The imaging device captures an image to obtain an image, detects a predetermined area from the image, resizes the detected detection area, extracts a feature amount of the detection area, and divides the extracted feature amount into two. outputting an image arranged in a detection region of the acquired image as a mask image arranged in dimensions;
    A video processing system, wherein the video processing device receives an image output from the imaging device, acquires a feature amount from the mask image, and performs an inference process based on the feature amount.
  6.  請求項5に記載の映像処理システムにおいて、
     前記特徴量の抽出及び前記推論処理は、CNN(Convolution Neural Networks)又はDNN(Deep Neural Networks)の手法を用いて行うことを特徴とする映像処理システム。
    In the video processing system according to claim 5,
    A video processing system, wherein the extraction of the feature amount and the inference processing are performed using a CNN (Convolution Neural Networks) or a DNN (Deep Neural Networks) technique.
  7.  請求項5に記載の映像処理システムにおいて、
     前記所定の領域は人の顔の領域であり、前記推論処理は人を識別する処理であることを特徴とする映像処理システム。
    In the video processing system according to claim 5,
    A video processing system, wherein the predetermined area is a human face area, and the inference process is a process for identifying a person.
  8.  請求項5に記載の映像処理システムにおいて、
     前記映像処理装置から前記撮像装置での特徴量の抽出に用いられるパラメータを設定する機能を有することを特徴とする映像処理システム。
    In the video processing system according to claim 5,
    A video processing system having a function of setting a parameter used for extraction of a feature quantity in said imaging device from said video processing device.
PCT/JP2021/008913 2021-03-08 2021-03-08 Imaging device and video processing system WO2022190157A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/JP2021/008913 WO2022190157A1 (en) 2021-03-08 2021-03-08 Imaging device and video processing system
JP2023504880A JP7448721B2 (en) 2021-03-08 2021-03-08 Imaging device and video processing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2021/008913 WO2022190157A1 (en) 2021-03-08 2021-03-08 Imaging device and video processing system

Publications (1)

Publication Number Publication Date
WO2022190157A1 true WO2022190157A1 (en) 2022-09-15

Family

ID=83227551

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2021/008913 WO2022190157A1 (en) 2021-03-08 2021-03-08 Imaging device and video processing system

Country Status (2)

Country Link
JP (1) JP7448721B2 (en)
WO (1) WO2022190157A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009033738A (en) * 2007-07-04 2009-02-12 Sanyo Electric Co Ltd Imaging apparatus, data structure of image file
JP2016162099A (en) * 2015-02-27 2016-09-05 富士通株式会社 Image determination device, image determination method, and program
JP2017033529A (en) * 2015-03-06 2017-02-09 パナソニックIpマネジメント株式会社 Image recognition method, image recognition device and program
JP2019185548A (en) * 2018-04-13 2019-10-24 キヤノン株式会社 Information processing apparatus, and information processing method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009033738A (en) * 2007-07-04 2009-02-12 Sanyo Electric Co Ltd Imaging apparatus, data structure of image file
JP2016162099A (en) * 2015-02-27 2016-09-05 富士通株式会社 Image determination device, image determination method, and program
JP2017033529A (en) * 2015-03-06 2017-02-09 パナソニックIpマネジメント株式会社 Image recognition method, image recognition device and program
JP2019185548A (en) * 2018-04-13 2019-10-24 キヤノン株式会社 Information processing apparatus, and information processing method

Also Published As

Publication number Publication date
JPWO2022190157A1 (en) 2022-09-15
JP7448721B2 (en) 2024-03-12

Similar Documents

Publication Publication Date Title
JP7057651B2 (en) Encoding privacy masked images
US9875530B2 (en) Gradient privacy masks
JP4701356B2 (en) Privacy protection image generation device
EP3816929B1 (en) Method and apparatus for restoring image
JP2017098879A (en) Monitoring device, monitoring system and monitoring method
JP2022118201A (en) Image processing system, image processing method, and program
CN109635803A (en) Image processing method and equipment based on artificial intelligence
US11521473B2 (en) Audio/video electronic device
US10863113B2 (en) Image processing apparatus, image processing method, and storage medium
US20150382000A1 (en) Systems And Methods For Compressive Sense Imaging
US10713797B2 (en) Image processing including superimposed first and second mask images
WO2023138629A1 (en) Encrypted image information obtaining device and method
Farid Image forensics
JP2018041293A (en) Image processing apparatus, image processing method, and program
WO2020238484A1 (en) Method and apparatus for detecting object in image, and vehicle and robot
CN111757172A (en) HDR video acquisition method, HDR video acquisition device and terminal equipment
TW201337835A (en) Method and apparatus for constructing image blur pyramid, and image feature extracting circuit
US20220189036A1 (en) Contour-based privacy masking apparatus, contour-based privacy unmasking apparatus, and method for sharing privacy masking region descriptor
WO2022190157A1 (en) Imaging device and video processing system
JP2006129152A (en) Imaging device and image distribution system
WO2018154827A1 (en) Image processing device, image processing system and program
EP3933776A1 (en) Method and image-processing device for anonymizing a digital colour image
CN114445864A (en) Gesture recognition method and device and storage medium
CN113965687A (en) Shooting method and device and electronic equipment
CN113891036A (en) Adaptive high-resolution low-power-consumption vision system and method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21930016

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2023504880

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21930016

Country of ref document: EP

Kind code of ref document: A1