WO2019017720A1

WO2019017720A1 - Camera system for protecting privacy and method therefor

Info

Publication number: WO2019017720A1
Application number: PCT/KR2018/008196
Authority: WO
Inventors: 양현종; 유상원; 김기윤; 김명언
Original assignee: 주식회사 이고비드
Priority date: 2017-07-20
Filing date: 2018-07-20
Publication date: 2019-01-24

Abstract

The present invention relates to a technology for recognizing human activity in a low-resolution image, and a camera system for automatically adjusting the resolution of a camera on the basis of the technology, and the present invention also relates to a camera for anonymizing a video by using a low-resolution face recognition/detection technology, and an anonymizing method therefor.

Description

Camera system and method for protecting privacy

More particularly, the present invention relates to a technology for recognizing human behavior in a low-resolution image, a camera system for automatically adjusting a shooting resolution of the camera based on the technology, and a low- To a camera system capable of anonymizing video.

Conventional camera systems have a problem of invading the privacy of a subject because they can take continuous pictures without the consent of the subject. The problem of privacy invasion can be variously as follows.

As a first example, privacy invasion of a subject by continuous shooting of unsupposed high-resolution images is emerging as an important social issue in recent years. For example, if a home (home security or smart home service) camera is hacked, there is a risk that your personal life will be watched or recorded by someone else throughout the 24 hours. In addition, high-resolution cameras such as a robot camera and a wearable camera have a problem that the battery consumption is large due to continuous high-resolution image capturing and the storage space is required, which makes it difficult to use for a long time.

Therefore, it is necessary to develop a secure camera device from the privacy invasion problem, and to design a privacy vision computer vision algorithm which makes it possible. In other words, there is a need to recognize important events and human behaviors in the image, and to prevent random high-resolution image capturing based on the recognition. In this regard, there have been recent attempts to recognize human behavior from low resolution images. However, most of the previous researches have limited in that they do not consider the intrinsic properties of low-resolution sensors. Referring to FIG. 6, when converting a high-resolution image to a low-resolution image, images originating from exactly the same scene often have completely different pixel (i.e., RGB) values due to inherent limitations that a single pixel can capture from a scene . With low-resolution transforms, even a plurality of low-resolution images originating from exactly the same scene can be completely different visual data. In other words, most of the previous studies did not take into account the characteristics of these low resolution transformations.

As a second example, there is a risk that privacy cameras may be exposed by personal hacking in a personal place.

In order to solve such privacy invasion problem, there has been a technique of intentionally shooting only low quality video, but it is difficult to judge the specific situation in video with low quality video. In other words, if unconditionally low resolution processing is applied to the entire image, the privacy is protected but the situation in the image can not be grasped because the human action or object can not be distinguished. On the contrary, when a high-resolution image is unconditionally captured, it is easy to recognize a person's behavior or an object, but it is difficult to avoid privacy problems. Therefore, there is a need for video anonymization technology that preserves detailed information in the image and low-resolution processing only the area (eg, face area) that needs to protect the privacy of the subject.

The present invention relates to a camera system for shooting a low-resolution image in a normal state to protect the privacy of a subject, a camera system for capturing a high-resolution image when a dangerous situation is determined through a behavior recognition technology for a low-resolution image, And an automatic behavior recognition method.

In addition, the present invention provides a camera system capable of automatically reducing the camera battery consumption by automatically adjusting the shooting resolution of the camera according to the importance of the situation through the low-resolution image behavior recognition technology, And to provide the above objects.

In addition, the present invention takes into account the characteristics of low-resolution image conversion, and learns a convolution neural network by mapping a plurality of low-resolution images transformed from a high-resolution image into an embedding space, And to provide an automatic behavior recognition method capable of increasing the human behavior recognition rate.

It is another object of the present invention to provide a camera system and an image anonymization method based on face recognition, which can protect the privacy of a subject by automatically performing an anonymization process on a subject in the process of capturing an image.

It is another object of the present invention to provide a camera system and an image anonymization method based on face recognition that can protect against hacking by being processed in real time without being stored on a memory of a CPU / GPU during an anonymization process on a face area in an image do.

Another object of the present invention is to provide a camera system and a facial recognition-based image anonymization method that can protect privacy by separating a face region and an area other than a face region with respect to a high-resolution image by performing an ultra-low resolution processing only on the face region do.

In order to solve the above problems, a camera system according to the present invention generates learning data based on recognition of human behavior in a low-resolution image based on a high-resolution learning image, ; Resolution image, and when a crisis is detected through the first behavior recognition that recognizes human behavior in a low-resolution image using the human behavior recognition model transmitted from the data server, the image is changed to a high- And a camera for photographing.

In addition, in the camera system, the camera determines whether the image captured by changing to the high-resolution mode is a crisis state through a second behavior recognition that recognizes human behavior in the image using the human behavior recognition model, And stores the image photographed at a high resolution when the situation is not satisfied and the image photographed at a low resolution when the situation is not a crisis.

In the camera system, the camera may further include: a photographing unit photographing a low-resolution image; A communication unit for receiving a human behavior recognition model used for recognizing human behavior in the low resolution image from the data server and transmitting the received human behavior recognition model to the image analysis unit; An image analyzing unit for determining whether the user is in a crisis state through a first behavior recognition process for recognizing a human behavior in a low resolution image using the human behavior recognition model received from the communication unit; A control unit for controlling the photographing unit to change to a higher resolution mode than the existing resolution and to photograph the photographing unit if a crisis state is determined by the image analysis unit; And a storage unit for storing a low-resolution image or a high-resolution image photographed by the photographing unit.

At this time, the image analyzing unit can determine whether the image is a crisis state through a second behavior recognition that recognizes human behavior in the image using the human behavior recognition model for the image changed to the high-resolution mode.

In the camera system, the data server may include: a resolution conversion unit that converts a high-resolution human behavior image into a plurality of low-resolution images; A plurality of low resolution images generated by the resolution conversion unit are received and divided into a spatial stream and a time stream, and convolution and pooling are performed for each stream, and a fully connected layer is added A convolution neural network learning unit for learning a convolutional neural network; And a human behavior recognition model generation unit for generating a human behavior recognition model based on the data learned by the convolution neural network learning unit.

According to another aspect of the present invention, there is provided an automatic resolution adjusting method comprising the steps of: (a) capturing an image of a camera at a low resolution; (b) receiving a human behavior recognition model used by the communication unit of the camera for recognizing human behavior in a low-resolution image from an external data server and transmitting the human behavior recognition model to an image analysis unit of the camera; (c) determining whether the image analyzing unit is in a crisis state through a first behavior recognition that recognizes human behavior in a low-resolution image using the human behavior recognition model received from the camera communication unit; (d) when the image analyzing unit determines that the image capturing unit is in a crisis state, the control unit of the camera controls the image capturing unit to change the image capturing unit to a higher resolution mode than the existing resolution image capturing unit; And (e) storing a low-resolution image or a high-resolution image captured by the photographing unit by a storage unit of the camera.

Also, the automatic resolution adjusting method may further include a step of, between the steps (d) and (e), performing a human action in the image using the human behavior recognition model And a second step of recognizing whether or not the user is in a crisis state by recognizing the second behavior.

According to another aspect of the present invention, there is provided an automatic behavior recognition method comprising: (a) converting a high-resolution human behavior image into a plurality of low-resolution images by a resolution conversion unit of a data server; (b) a convolutional neural network learning unit of the data server receives the plurality of low-resolution images and divides them into a spatial stream and a time stream, performs convolution and pooling for each stream, layer is further applied to learn a Convolutional Neural Network; (c) generating a human behavior recognition model based on data learned by the convolutional neural network learning unit; (d) photographing a low-resolution image of a photographing part of the camera; And (e) recognizing human behavior in the low-resolution image photographed by the photographing unit using the human behavior recognition model received from the data server by the image analysis unit of the camera.

According to another aspect of the present invention, there is provided a camera system including: a data server that performs learning for face recognition in an image using a learning image and generates a face recognition model; And a controller for receiving the facial recognition model from the data server, capturing a high resolution image to perform low resolution conversion, recognizing a face in the converted image using the facial recognition model transmitted from the data server, And a camera for outputting the anonymized image to the face region by re-converting the detected face region and the region other than the face region so as to have different resolutions.

According to another aspect of the present invention, there is provided a camera including: a communication unit for receiving a face recognition model for face recognition in an image from an external data server; A photographing unit for photographing a high-resolution image; A low-resolution conversion module that receives the high-resolution image photographed by the photographing unit, receives the position information of the face region and the target resolution value from the processor, and converts the high-resolution image; A processor for recognizing a face in the image converted by the low resolution conversion module using the face recognition model transmitted from the communication unit and updating the position of the face region and the target resolution value according to the face recognition result; And an output unit for outputting the image converted by the low resolution conversion module.

In this case, when the face is detected in the image converted by the low resolution conversion module through face recognition, the processor records the detected face region and stores the position information, and the remaining region excluding the stored face region And the low resolution conversion module controls to convert the remaining area excluding the stored face area to the target resolution to be converted again.

According to another aspect of the present invention, there is provided a camera including: a communication unit for receiving a face recognition model for face recognition in an image from an external data server; A photographing unit for photographing a high-resolution image; A low-resolution conversion module that receives a high-resolution image captured by the imaging unit, receives a target resolution value from the processor, and converts the high-resolution image; A processor for recognizing a face in the image converted by the low resolution conversion module using the face recognition model transmitted from the communication unit and updating the target resolution value according to the face recognition result; And an output unit for outputting the image converted by the low resolution conversion module.

In the camera, the processor may control the low-resolution conversion module to output the low-resolution image converted at the detection time when a face is detected in the image converted by the low-resolution conversion module.

According to another aspect of the present invention, there is provided an image anonymizing method, comprising the steps of: (a) capturing a high resolution image of a photographing unit of a camera; (b) a communication unit of the camera receives a face recognition model for face recognition in an image from an external data server; (c) receiving a high-resolution image photographed by the photographing unit by the low resolution conversion module of the camera, receiving the position of the face region and the target resolution value from the processor, and converting the high resolution image; (d) recognizing a face in the image converted by the low resolution conversion module using the face recognition model received from the communication unit of the camera, and updating the position of the face region and the target resolution value according to the face recognition result ; And (e) the output of the camera outputting the image converted by the low resolution conversion module.

According to another aspect of the present invention, there is provided an image anonymizing method, comprising the steps of: (a) capturing a high resolution image of a photographing unit of a camera; (b) a communication unit of the camera receives a face recognition model for face recognition in an image from an external data server; (c) receiving a high resolution image photographed by the photographing unit by the low resolution conversion module of the camera, receiving the target resolution value from the processor, and converting the high resolution image; (d) recognizing a face in the image converted by the low resolution conversion module using the face recognition model received from the communication unit of the camera processor, and updating the target resolution value according to the face recognition result; And (e) the output of the camera outputting the image converted by the low resolution conversion module.

According to the present invention, when a low-resolution image is normally photographed and a behavior recognition technology for a low-resolution image is determined to be a crisis, a high-resolution image is photographed, thereby protecting privacy of a subject.

In addition, according to the present invention, it is possible to reduce the camera battery consumption and efficiently utilize the storage space by automatically adjusting the shooting resolution of the camera according to the importance of the situation through the low-resolution image behavior recognition technology.

According to the present invention, a plurality of low-resolution images transformed from a high-resolution image are mapped into an embedding space in consideration of the characteristics of low-resolution image conversion, thereby learning a convolution neural network, The behavior recognition rate can be increased.

In addition, according to the present invention, it is possible to automatically protect the privacy of a subject by automatically performing an anonymization process on the subject in the process of shooting the image.

In addition, according to the present invention, in an anonymizing process for a face area in an image, it can be protected from hacking by real-time processing without being stored on the memory of the CPU / GPU.

In addition, according to the present invention, it is possible to protect the privacy of the high-resolution image by distinguishing the face area from the area other than the face area and performing the ultra-low resolution processing only on the face area.

1 is a block diagram of a camera system according to a first embodiment of the present invention.

2 is a view for explaining the operational concept of the camera system according to the first embodiment of the present invention.

FIG. 3 is a diagram illustrating a configuration of a camera according to a first embodiment of the present invention.

4 is a flowchart illustrating a flow of operation of the camera according to the first embodiment of the present invention.

5 is a detailed block diagram of a data server in a camera system according to a first embodiment of the present invention.

FIG. 6 is a view for explaining that even when a high-resolution image is converted to a low-resolution image, a low-resolution image derived from the same high-resolution image may have a different value.

FIG. 7 is a view showing a structured manner in which the convolutional neural network learning unit according to the first embodiment of the present invention divides an input image into a spatial stream and a temporal stream.

8 is a diagram showing the structure of a two-stream convolutional neural network applied to convolutional neural network learning according to the first embodiment of the present invention.

9 is a diagram for explaining how the convolutional neural network learning unit according to the first embodiment of the present invention performs learning for intra-image behavior recognition using an embedding space.

10 is a diagram illustrating a structure of a multi-siamese convolutional neural network applied to convolutional neural network learning according to the first embodiment of the present invention.

11 is a flowchart illustrating a process of learning a convolutional neural network among methods of automatically recognizing behaviors in a video camera system according to the first embodiment of the present invention.

FIG. 12 is a diagram showing a comparison of accuracy of behavior recognition between a case in which a method of automatically detecting an intra-video motion in a camera system according to the first embodiment of the present invention is applied and a conventional technique.

13 is a block diagram illustrating the configuration of a camera system according to a second embodiment of the present invention.

FIG. 14 is a diagram showing the structure of a convolutional neural network (CNN) used by a data server of a camera system according to a second embodiment of the present invention to generate a face recognition model.

15 is a diagram illustrating an example of a video image processed by the video anonymization method according to the second embodiment of the present invention.

16 is a block diagram showing a configuration of a camera according to a second embodiment of the present invention.

FIG. 17 is a view simply showing an anonymizing process of an image photographed by a camera according to the second embodiment of the present invention in real time.

18 is a flowchart illustrating a process of performing an image anonymization method according to the second embodiment of the present invention.

FIG. 19 is a flowchart illustrating a process of performing an image anonymizing method according to the third embodiment of the present invention.

DETAILED DESCRIPTION OF THE EMBODIMENTS Reference will now be made in detail to the preferred embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the like elements throughout. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment according to the present invention will be described in detail with reference to the accompanying drawings.

The embodiments disclosed herein should not be construed or interpreted as limiting the scope of the present invention. It will be apparent to those of ordinary skill in the art that the description including the embodiments of the present specification has various applications. Accordingly, any embodiment described in the Detailed Description of the Invention is illustrative for a better understanding of the invention and is not intended to limit the scope of the invention to embodiments.

The functional blocks shown in the drawings and described below are merely examples of possible implementations. In other implementations, other functional blocks may be used without departing from the spirit and scope of the following detailed description. Also, although one or more functional blocks of the present invention are represented as discrete blocks, one or more of the functional blocks of the present invention may be a combination of various hardware and software configurations that perform the same function.

In addition, the expression "including any element" is merely an expression of an open-ended expression, and is not to be construed as excluding the additional elements.

Further, when a component is referred to as being connected or connected to another component, it may be directly connected or connected to the other component, but it should be understood that there may be other components in between.

Also, the expressions such as 'first, second', etc. are used only to distinguish a plurality of configurations, and do not limit the order or other features between configurations.

Hereinafter, various embodiments according to the present invention will be described with reference to the drawings. 1 to 12 relate to a camera system for automatically adjusting a resolution on the basis of a behavior recognition, and FIGS. 13 to 19 relate to a second embodiment, And a camera system for recognizing and anonymizing a corresponding part.

&Lt; Embodiment 1 >

FIG. 1 is a view illustrating a configuration of a camera system according to an embodiment of the present invention, and FIG. 2 is a diagram illustrating an operation concept of a camera system according to an embodiment of the present invention.

Referring to FIG. 1, the camera system 1 of the present invention includes a behavior recognition-based resolution automatic adjustment camera 10 and a data server 20. The behavior recognition-based resolution automatic adjustment camera 10 photographs a subject, the data server 20 receives a high-resolution image, performs learning for human behavior recognition, and recognizes human behavior in a low-resolution image based on the learned data To generate a human behavior recognition model. The motion recognition based resolution control camera 10 of the present invention may include various camera devices. For example, it can be a smart home camera, a robot camera, a security CCTV, a wearable camera, a car black box camera, and the like.

The data server 20 can perform learning for recognizing human behavior in a low-resolution image from a high-resolution image stored on-line or a high-resolution image on-line. The data server 20 can generate learning data by learning a convolutional neural network, and can generate a human behavior recognition model based on the generated learning data. Detailed description of the detailed configuration of the data server 20 and a technique for recognizing human behavior in an image will be described below with reference to FIGS. 5 to 10. FIG.

Referring to FIG. 2, a camera 10 (hereinafter, referred to as a 'camera') photographs a low-resolution moving image in a usual manner to prevent privacy invasion of a subject. The camera 10 of the present invention recognizes human behavior in a low-resolution image (for example, 16x12 pixels) by applying a behavior recognition technique to the photographed image. The camera 10 can pick up an image of a high resolution (for example, 4K pixels) by selecting an optimal camera resolution according to a result of behavior recognition, and can store the captured image in a memory or transmit it to the data server 20. Therefore, the camera 10 of the present invention can solve the problem that the privacy of the subject can be violated by continuously photographing a high-resolution image even in a conventional camera or the like. In addition, the present invention reduces battery consumption of the camera 10 and enables efficient use of the storage space. The specific operation of the camera 10 of the present invention will be described with reference to FIGS. 3 to 4. FIG.

3, the camera 10 may include a photographing unit 100, a communication unit 200, an image analysis unit 300, a storage unit 400, and a control unit 500.

When the control unit 500 receives a control signal for changing from the control unit 500 to the high-resolution image shooting mode while shooting a subject at a low resolution, the shooting unit 100 can shoot a subject at a high resolution.

The communication unit 200 receives the human behavior recognition model used for recognizing human behavior in the low resolution image from the data server 20 and transmits the received human behavior recognition model to the image analysis unit 300. The data server 20 can perform learning to recognize human behavior in a low-resolution image from a high-resolution image stored on-line or a high-resolution image on-line. The high-resolution image is used to learn a Convolutional Neural Network to perform human behavior recognition in a low-resolution image photographed by the photographing unit 100. [ A high-resolution image as a learning image may be a publicly available source. A publicly available source can be, for example, an open source such as YouTube. In addition, high-resolution images can be obtained from sources that publicly provide video online. The communication unit 200 may transmit the image stored in the storage unit 400 to the data server 20.

The communication unit 200 may include at least one of a wireless communication module and a wired communication module. The wireless communication module may include a wireless network communication module, a wireless local area network (WLAN), a wireless communication module such as a Wi-Fi, a wireless fidelity or WiMAX, and a wireless personal area network (WPAN) Or the like.

The image analysis unit 300 can receive the human behavior recognition model from the communication unit 200 and recognize the human behavior in the low resolution image photographed by the photographing unit 100. [ The image analyzing unit 300 can recognize the human behavior in the low-resolution image photographed by the photographing unit 100 using the human behavior recognition model, and judge whether or not the recognized human behavior corresponds to the crisis situation recognition).

The storage unit 400 may store a low-resolution image or a high-resolution image photographed by the photographing unit 100. The storage unit 400 stores images (weapons, naked images, drugs, blood, etc.) to be recognized or analyzed by the image or image analysis unit 300 taken by the image sensing unit 100 as a dangerous situation, And may be in the form of a flash memory for providing images in a real-time streaming manner.

To this end, the storage unit 400 includes a main storage unit and an auxiliary storage unit, and stores an application program necessary for the functional operation of the camera 10. [ The storage unit 400 may include a program area and a data area. The program area is an area for storing an operating system or the like for booting the camera 10, and the data area is an area for storing an image photographed by the photographing part 100.

The control unit 500 may control the photographing unit 100, the communication unit 200, the image analysis unit 300, and the storage unit 400. The control unit 500 can control the photographing unit 100 to change the mode to a higher resolution mode than the existing resolution and take a picture if the crisis is determined by the image analysis unit 300. [ This is to maximize the privacy of the photographed subject and to minimize the battery consumption of the camera 10 and the utilization of the storage space. That is, the normal photographing unit 100 photographs with a low resolution image (ex. 16x12 pixels), and when receiving the control signal, it can photograph with a high resolution image (ex. 720x480, 1280x720, 1920x1080, 4K, etc.). The control unit 500 may control the photographing unit 100 to operate at a predetermined cycle to minimize battery consumption.

4 is a flowchart for explaining the flow of operation of the camera according to the first embodiment of the present invention.

Referring to FIG. 4, the photographing unit 100 photographs a normal low-resolution image and transmits the photographed image to the image analysis unit 300 (S210). The image analyzing unit 300 determines whether the user is in a crisis state by recognizing a first action in the low resolution image using the human behavior recognition model received from the communication unit 200 at step S220. The control unit 500 transmits a control signal so that the low resolution image is stored in the storage unit 400 when the image analysis unit 300 determines that the human action in the low resolution image does not correspond to the crisis situation.

If the human behavior in the low-resolution image is determined to be a crisis by the image analysis unit 300, the control unit 500 transmits a control signal to the imaging unit 100 to capture the subject as a high-resolution image (S240). The image analyzing unit 300 determines whether a high-resolution image photographed by the photographing unit 100 is in a crisis state through a second behavior recognition that recognizes human behavior in the image using the human behavior recognition model at step S250. When the image analysis unit 300 determines that the human action in the high-resolution image is a crisis state, the control unit 500 transmits a control signal to store the high-resolution image in the storage unit 400, A control signal is transmitted so that an image photographed at a low resolution is stored in the storage unit 400 (S260).

As described above, the camera 10 of the present invention can recognize the human behavior by performing two human behavior recognizations on the photographed image, and finally store the photographed image. Therefore, it is possible to reduce the error of the behavior recognition, to protect the privacy of the object and to appropriately use the storage space.

5, the data server 20 may include a resolution conversion unit 21, a convolutional neural network learning unit 22, and a human behavior recognition model generation unit 23.

The resolution conversion unit 21 can convert the high-resolution human behavior image received from the communication unit 200 into a plurality of low-resolution images. Multiple low-resolution images can be obtained using a technique called Inverse Super Resolution (ISR). The main motivation of Inverse Super Resolution (ISR) is that a high resolution image contains a quantity of information corresponding to a low resolution image set, and a behavior recognition method can be performed by applying another low resolution image transformation to a single high resolution image. (M. S. Ryoo, B. Rothrock, C. Fleming, and H. J. Yang. Privacy-preserving human activity recognition from extreme low resolution. In AAAI, 2017.)

Resolution conversion section 21 generates n low-resolution images (i.e., V _ik ) for each high-resolution training image X _i by applying transform sets F _k and D _k as shown in the following equation (1) .

Where _Fk is the camera motion transformation and _Dk is the downsampling operator. Although F _k can be any affine transformation, the present invention regards the combination of transformation, scaling, and rotation as motion transformation. Also, standard mean downsampling is used for D _k . A sufficient number of low resolution transformed images V _ik generated from the learning samples X _{i are} provided to the convolutional neural network CNN and the classifier 330 so that efficient learning can be performed.

The convolutional neural network learning unit 22 receives a plurality of low resolution images generated by the resolution conversion unit 21 and divides them into a spatial stream and a time stream (optical flow), and performs convolution and pooling ) And apply a fully connected layer to learn the Convolutional Neural Network.

Referring to FIG. 7, the convolutional neural network learning unit 22 can know that spatial streams and time streams derived from low-resolution images are used as parameters. At this time, the spatial resolution of the low-resolution image is 16 × 12 pixels. More specifically, the spatial stream is configured to input the RGB pixel values of each frame (i.e., the input dimension is 16x12x3), and the temporal stream is the 10-frame connection of the X and Y optical flow images (i.e., 16x12x20) Lt; / RTI > The X and Y optical flow images are constructed by calculating the " x (and y) optical flow magnitude " per pixel.

The human behavior recognition model generation unit 23 can generate a human behavior recognition model for recognizing human behavior in an image photographed by the photographing unit 100 based on data learned by the convolution neural network learning unit 22 have. The human behavior recognition model may correspond to a classifier. The classifier can shorten the image information and motion information included in the image based on the learned data when a new image is input, and can recognize the behavior in the image using the reduced information. The image analysis technique of the present invention can be applied not only to a convolutional neural network (CNN) but also to a support vector machine (SVM).

8 is a diagram illustrating a structure of a two-stream convolutional neural network applied to convolutional neural network learning according to an embodiment of the present invention.

Referring to FIG. 8, the convolutional neural network learning unit 22 can perform learning for behavior recognition using a two-stream convolutional neural network. A two-stream convolution neural network is applied to each frame of the image. They are summarized using time pyramids and generate a single video representation. Let a 2-stream network applied to each frame V ^t of video V at time t be h (V ^t ). Then, the expression f (V;?) Is calculated according to the following equation (2).

Here, "," denotes the vector concatenation operator, T is the number of frames in the image V, and fc represents the set of fully connected layers to be applied at the top of the connection. The size of h (V ^t ) is 512-D: 256 x 2. θ is a set of parameters in CNN that require learning from learning data. max is a temporary maximum pooling operator that computes the maximum value of each element. In the example, four levels of time pyramids were used (i.e., a total of 15 maximum poolings).

A fully connected layer and a softmax layer can be applied to f (V) to learn the classifier. Let g be such a layer, y = g (f (V;?)). Where y is an activity class label. The learning g (f (V; [theta]) having classification loss using the low resolution image generated by the resolution converting unit 21 can provide a basic image classification model. Figure 8 shows the overall structure of a two-stream CNN model using this time pyramid.

FIG. 9 is a diagram for explaining how the convolutional neural network learning unit 22 according to the first embodiment of the present invention performs learning for intra-image behavior recognition using an embedding space.

The two-stream network design of FIG. 8 can classify behavioral images by learning model parameters optimized for classification, but does not consider the characteristics of ultra-low resolution images in which different low resolution data are generated due to different transformations applied to the same scene . In order for the classifier to better reflect the characteristics of the ultralow-resolution image, the embedding space, which maps to the same embedding location, regardless of the conversion to different low-resolution images with the same semantic content, . Embedding learning is commonly optimized for both embedded learning and classification using the learned embedding in an end-to-end manner, and more generalized (i.e., overfilled) Enables learning of classifiers.

Referring to FIG. 9, embedding learning is performed by minimizing the embedding distance between negative pairs while maximizing the embedding distance between positive pairs.

10 is a diagram illustrating a structure of a multi-siamese convolutional neural network applied to convolutional neural network learning according to an embodiment of the present invention.

Referring to FIG. 10, the resolution converting unit 21 converts a first batch composed of n low-resolution images having different resolutions and a plurality of different high-resolution images converted from the same high-resolution image, A second batch composed of n low-resolution images having the same size as the training data, and transmits the training data to the convolutional neural network learning unit 22. [

The convolutional neural network learning unit 22 receives the first and second arrangements, divides each of the images into a spatial stream and a temporal stream, performs convolution and pooling for each stream, After the layer (Fully connected layer) is further applied, the embedding distance is mapped by mapping to the embedding space, so that learning for motion recognition in an image can be performed.

The Siamese neural network is a concept often used to learn similarity measures between two inputs, with two networks sharing the same parameters. The goal of the Siamese network is to learn the embedding space in which similar items (low-resolution images in the case of the present invention) are placed nearby. More specifically, a sample corresponding to a positive pair that should be located close in the embedding space and a sample corresponding to a negative pair that should be located remotely are used for learning.

(V;?) is applied to the arbitrary low-resolution images V _i and V _j twice during the learning, x _i = f (V;?) ;?) and x _j = f (V _j ;?) are obtained. Where (x _i, x _j) may ssangil the positive or negative of the pair. The contrast loss for learning the network parameter [theta] is expressed by the following equation (3).

Where m is a predetermined margin, B is the batch of low resolution learning examples used, i and j are the index of the learning pair in the batch. y ' ₍ _{i, j} ₎ is a binary number, 1 for positive pairs and 0 for negative pairs.

In embedding learning for low-resolution intra-image behavior recognition, the positive pair is composed of two low-resolution images derived from the same high-resolution image, and the negative pair is composed of two low-resolution images derived from different high-resolution images. Figure 9 shows embedding learning with contrast loss. In addition, since the goal is to finally classify the low-resolution images by learning y = g (f (V;?)), The network must be learned by using the combining loss function as shown in the following equation (4).

Where L _class (θ) is the standard classification loss of the network y = g (f (V; θ)), and λ ₁ , λ ₂ are the weights.

The multi-siamese convolutional neural network of the present invention has 2n network copies sharing the same parameter θ for f (V; θ), unlike the standard cyamises network with only two network sharing parameters. The multi-siamese convolutional neural network maps each copy to each of the n different low resolution transforms (i.e., F _k ). Therefore, the embedding distance can be kept small by using the contrast loss. And the multi-siamuse convolutional neural network has n more network copies using images that do not match the scene of the first n branch to form a negative learning pair.

Let X _ik = f (V _ik;? ), where V _ik is obtained by applying the transform F _k to X _i . Two types of placement are randomly generated based on placement B of the original high resolution learning image. B ₁ is the arrangement of the low-resolution images generated in the single high-resolution image Xi, and B ₂ is the arrangement of the randomly selected low-resolution images. B ₁ produces a positive pair, and B ₂ produces a negative pair. The size of B ₁ and B ₂ should be n. B ₁ is obtained by applying n different low-resolution transforms to each example Xi of B, and each result V _ik = D _k F _k X _i is provided to the first n branches of the multi-siamese network. The low-resolution example Vj of B ₂ is provided directly to the remaining n branches of the siamese network. Therefore, the new loss function is expressed by the following equation (5).

That is, the multi-siamese convolution neural network model of the present invention simultaneously considers a plurality of low-resolution transforms for embedding learning. The new loss function essentially takes all pairs of n low resolution transforms as positive pairs and considers the same number of negative pairs using a separate arrangement.

The final loss function is calculated by combining the multi-cimase contrast loss with the standard classification loss performed in equation (4). FIG. 10 shows the overall process of learning multi-cyamics embedding and classifier. This can be seen as Siamese CNN, which combines multiple contrast losses of different low resolution pairs.

The multi-siamese convolution neural network model of the present invention uses three fully connected layers for embedding learning and classification. After applying the time pyramid, we obtain an intermediate representation of 7680-D (ie, 15 × 256 × 2) per image. Next we get a fully connected layer of size 8192. Embedding learning occurs after the second full-connect layer, where x has a dimension of 8192-D. Classification is performed by adding another fully connected layer and a soft max layer at the top.

11 is a flowchart illustrating a step of learning a convolutional neural network among methods of automatically recognizing behaviors in a video camera system according to an embodiment of the present invention.

Referring to FIG. 11, the data server 20 receives a high-resolution learning image (S1101). The resolution conversion unit 21 converts the received high-resolution human behavior image into a plurality of low-resolution images (S1102). At this time, the resolution converting unit 21 converts a first batch composed of n low resolution images having different resolutions and a plurality of different high resolution images converted from the same high resolution image, (S1103), and transmits the second batch to the convolutional neural network learning unit 22. The convolutional neural network learning unit 22 generates a second batch of low-resolution images as learning data (S1103). The convolutional neural network learning unit 22 receives a plurality of low-resolution images and divides them into a spatial stream and a temporal stream, performs a convolution and a pooling for each stream, and performs au, a fully connected layer, And the convolutional neural network is learned (S1104). At this time, the convolutional neural network learning unit 22 receives the first and second arrangements, divides each of the images into a spatial stream and a time stream, and performs convolution and pooling for each stream After the Fully connected layer is further applied, the embedding distance is mapped to the embedding space to perform learning for intra-image motion recognition (S1105).

FIG. 12 is a view comparing the accuracy of the behavior recognition between the case where the method of automatically detecting the behavior in the camera of the camera system according to the embodiment of the present invention and the accuracy of the behavior recognition between the prior art.

FIG. 12 shows the results of comparing the accuracy of classification using a 16x12 HMDB dataset or a DogCentric activity dataset in the case of (1) applying basic 1-stream CNN, (2) applying 2-stream CNN, Table. For the above two CNNs, we divided into (i) learning without using multiple low-resolution transforms, (ii) learning with multiple low-resolution transformations applied but without embedding learning, and (iii) learning with Siamese embedding learning. Respectively.

The number of low-resolution transformations used in the experiment is 15. These transformations were randomly selected from a uniform pool. A total of 25 motion transforms F _k were provided in the X and Y directions of {-5, -2.5, 0, +2.5, + 5}%. In addition, a total of 75 transforms were provided with three different rotations of the angle {-5, 0, 5} degrees. 27 transforms out of 75 were randomly selected.

Referring to FIG. 12A, it can be seen that the case of applying the siamese learning to the 2-stream CNN has the highest accuracy. In other words, it can be seen that learning the embedding space using the cyamuse network structure has a great effect on the behavior classification. The use of contrast loss based on multiple low resolution transformations is more robust in classifier learning and reduces overfitting of training data.

12B is a table showing the result of comparing the present invention with the conventional technique. As shown in the table, it can be seen that the classification accuracy according to the present invention is higher than 8%. That is, the method of applying the embedding learning to the 2-stream multi-Siamese CNN, which is one embodiment of the present invention, yields the best result in the 16x12 resolution motion recognition.

12C and 12D are tables showing the results of experiments using DogCentric activity dataset. As a result, it can be seen that the automatic detection method of the behavior in the camera system of the camera according to the embodiment of the present invention shows the best accuracy in the behavior recognition, and thus is the most effective in privacy protection.

&Lt; Embodiment 2 >

Referring to FIG. 13, the privacy camera system 5 may include a camera 50 and a data server 60.

The camera 50 receives the face recognition model from the data server 60, captures a high-resolution image and performs low-resolution conversion. The camera 50 uses the face recognition model transmitted from the data server 60, And an image that has been anonymized with respect to the face region can be output by re-converting the detected face region and the region other than the face region according to the face recognition result to have different resolutions. The camera 50 of the present invention may include various camera devices. For example, it can be a smart home camera, a robot camera, a security CCTV, a wearable camera, a car black box camera, and the like. The specific configuration and function of the camera 50 will be described below with reference to FIG. 15 to FIG.

The data server 60 can perform learning for face recognition in an image using a learning image and generate a face recognition model. The learning image may include both a high-resolution image and a low-resolution image. The data server 60 uses a convolutional neural network (CNN) in generating a face recognition model. Convolutional neural network (CNN) is an algorithm with a structure that learns to analyze and classify input image (image frame).

In the present invention, the Fully convolutional network (FCN) is newly applied to the ultra-low resolution face recognition. Such an FCN model can analyze and classify whether or not a specific pixel of an input image (image frame) is a face pixel using learning data (a plurality of images including a 10x10 face).

The Fully convolutional network (FCN) structure for performing face recognition in a low-resolution image can be confirmed from FIG.

14A, the Fully Convolutional Network (FCN) is composed of 19 convolution layers and 3 deconvolution layers, and the number of channels of each layer (depth) is shown in the table of Fig. 14B.

This FCN is learned to be specialized for 10x10 size face recognition (detection), and learning can be performed using a face image database. The learning follows a backpropagation method in the convolutional neural network. When the learning of the FCN is completed, the FCN judges whether a new input image corresponds to a face for each pixel to derive a probability that the corresponding pixel is a face. The size of the face pixel may be 10 x 10 or larger.

If it is determined that the FCN corresponds to the face pixel, a sliding window method is applied to find the position and size of the face. A sliding window is a method of checking how many face pixels are present in a box by applying a bounding box of a predetermined size to all positions of the image. Using this, we find a box containing a lot of pixels with a high probability of a face, and non-maximum suppression of such boxes to find a final bounding box. Non-maximum suppression refers to thinning edges obtained from image processing. Non-maximum suppression is performed to find a sharper line because the edges found using Gaussian mask and sob mask are blurred edges that are easily crumpled. In other words, non-maximum suppression is a process of comparing the pixel values of 8 directions based on the center pixel and removing the center pixel when the center pixel is the largest.

At this time, using an integral image saves computation time. An integral image is simply an image with the sum of the next pixel plus the previous pixel. When the integral image is used, the sum of pixel values of a specific area can be obtained very easily.

FIG. 15 is a diagram illustrating an example of an image that has been anonymized by the face recognition based real-time automatic image anonymization method of the present invention.

Referring to FIG. 15, the leftmost image is a high-resolution image, the middle image is a low-resolution image, and the rightmost image corresponds to an anonymized image. As shown in FIG. 15, the camera and image anonymizing method according to the second embodiment of the present invention can convert only a face area into an ultra-low resolution image and photograph a region outside the face area while maintaining a high resolution.

16 is a block diagram showing the configuration of a camera 50 according to the second embodiment of the present invention.

16, the camera 50 of the present invention includes a communication unit 150, a photographing unit 250, a low resolution conversion module 350, a processor 450, an output unit 550, and a storage unit 650 can do.

The communication unit 150 may receive the face recognition model for performing face recognition in the image from the data server 60 and transmit the received face recognition model to the processor 450. As described above, the face recognition model used for recognizing a face in an image captured by the camera 50 is generated by the data server 60. [ In addition, the communication unit 150 may transmit the image captured and stored by the camera 50 to the data server 60.

The communication unit 150 may include at least one of a wireless communication module and a wired communication module. The wireless communication module may include a wireless network communication module, a wireless local area network (WLAN), a wireless communication module such as a Wi-Fi, a wireless fidelity or WiMAX, and a wireless personal area network (WPAN) Or the like.

The photographing unit 250 can photograph a high-resolution image with respect to a subject. The photographing unit 250 transmits the photographed high resolution image to the low resolution conversion module 350.

Resolution conversion module 350 receives the high resolution image from the image sensing unit 250 and receives the position of the face region and the target resolution value from the processor 450 to convert the high resolution image. In this case, the position of the initial face region to be input may be an empty value that is not set, and the initial target resolution value may be a predetermined low resolution value. The ultralow resolution value may be 16x12 pixels. That is, the low resolution conversion module 350 initially performs a low resolution conversion on the entire image without distinguishing the regions other than the face region and the face region.

The low-resolution conversion module 350 receives the position of the face region stored from the processor 450, and converts the face region and the non-face region in the image. At this time, the face area can be converted into a predetermined low resolution resolution, and the area outside the face area can be converted into the target resolution value. The low resolution conversion module 350 may be implemented as a circuit and only receives the face area position information and the target resolution value from the processor 450 and converts the image frame continuously supplied from the photographing unit 250, It does not have storage space or memory.

The low resolution conversion module 350 may convert the high resolution image by receiving only the target resolution value from the processor 450. That is, the low resolution conversion module 350 can perform the low resolution conversion on the entire image without distinguishing the face region separately.

The processor 450 can recognize a face in the image converted by the low resolution conversion module 350 using the face recognition model transmitted from the communication unit 150. [ The processor 450 may be a processing unit such as a CPU (Central Processing Unit) or a GPU (Graphics Processing Unit). The processor 450 updates the position of the face area and the target resolution value according to the face recognition result. That is, when a face is detected in the image converted by the low resolution conversion module 350, the processor 450 records the detected face region, stores the position information, and searches for a target Increase the resolution value.

That is, the processor 450 causes the low resolution conversion module 350 to convert again to the increased resolution for the remaining area outside the detected face area. Thereafter, the processor 450 performs face recognition again on the converted image with the increased resolution, and repeats the above process.

When the resolution of the image converted again by the low resolution conversion module 350 in accordance with the increased target resolution value matches the resolution of the high resolution image photographed by the photographing unit 250, And stops outputting the image and outputs the image.

(S * N) x (s * N) size or more on the processor 450, if the detected face size is NxN and the resolution is increased to the previous s times It is not recorded. For example, if s is 1.5 times and N is 10, an anonymized video can be taken without ever writing a face image larger than 15x15 in the CPU / GPU memory.

That is, the camera 50 of the present invention repeatedly performs face recognition / detection and resolution conversion while maintaining the resolution of the face area in the image to NxN and increasing the resolution of the area outside the face area to the final target resolution. An image output through this process is called an " anonymizing video ".

The processor 450 stops increasing the resolution of the instantaneous image in which the face is detected, instead of increasing only the resolution of the remaining region except for the detected face region when a face is detected in the image, and immediately outputs the low- . In this case, the resolution of the area other than the face area is increased to speed up the calculation speed compared with the case of repeating resolution conversion and face recognition, but there is a disadvantage in that the anonymization processing is not precise.

The output unit 550 outputs the image converted by the low resolution conversion module 350. That is, the output unit 550 outputs the anonymized image.

The storage unit 650 may store an image output by the output unit 550. The storage unit 650 includes a main storage device and an auxiliary storage device, and stores an application program necessary for the functional operation of the camera 50. [ The storage unit 650 may include a program area and a data area.

FIG. 17 is a view simply showing an anonymizing process of an image taken by a camera 50 according to the second embodiment of the present invention in real time.

17, the photographing unit 250 transmits the photographed high resolution image to the low resolution conversion module 350. The low resolution conversion module 350 converts the received high resolution image into a low resolution image and transmits the low resolution image to the processor 450 do. The processor 450 performs face recognition on the low-resolution image, transfers the face region location information and the target resolution value according to the face recognition result to the low resolution conversion module 350, and performs the conversion again. When the target resolution value reaches the final target resolution, the image converted by the low resolution conversion module 350 is output.

Referring to FIG. 18, the photographing unit 250 transmits the photographed high-resolution image to the low resolution conversion module 350 (S1810). The low resolution conversion module 350 receives the position information of the face region and the target resolution value from the processor 450 and converts the face region in the high resolution image to the super low resolution in operation S1820. Resolution conversion (S1830). However, since there is no face region position information at the initial conversion, the resolution conversion is performed at a very low resolution with respect to the entire image. The processor 450 performs facial recognition on the image converted by the low resolution conversion module 350 using the facial recognition model transmitted from the communication unit 150 (S1840). If a new face is recognized in the face recognition result image, the processor 450 adds the position information of the detected face region to the low resolution conversion module 350 (S1850). If there is no new face in the face recognition result image, the processor 450 increases only the target resolution value and transmits it to the low resolution conversion module 350 (S1860). When the resolution of the region other than the face region in the image matches the resolution of the high resolution image, which is the original image, the processor 450 outputs the image at that point of time and ends the processing. However, if the resolution of the region other than the face region in the converted image does not match the resolution of the high resolution image, which is the original image, the face recognition is performed again (S1870).

FIG. 19 is a flowchart illustrating a process of performing a face recognition based real-time automatic image anonymizing method according to another embodiment of the present invention.

Referring to FIG. 19, the photographing unit 250 transmits the photographed high resolution image to the low resolution conversion module 350 (S1910). The low resolution conversion module 350 receives the target resolution value from the processor 450 and performs low resolution conversion on the entire image of the high resolution image (S1920). However, in the initial conversion, the resolution conversion is performed at a very low resolution for the entire image. The processor 450 performs facial recognition on the image converted by the low resolution conversion module 350 using the facial recognition model transmitted from the communication unit 150 (S1930). If there is no new face in the face recognition result image, the processor 450 increases the target resolution value and transmits it to the low resolution conversion module 350 (S1940). If a new face is recognized in the face recognition result image, the processor 450 controls the output unit 550 to output the converted low resolution image at the face detection time and ends the processing in step S1950. When the resolution of the image converted by the low resolution conversion module 350 in accordance with the increased target resolution value coincides with the resolution of the high resolution image that is the original image, the processor 450 outputs the image at that point of time and ends the processing .

The camera system according to the first and second embodiments and the method of driving the camera system have been described above. The automatic recognition of behavior-based resolution, the automatic behavior recognition method, and the face recognition based image anonymization method according to the present invention can be implemented in the form of a program command that can be executed through various computer means and recorded in a computer readable medium . The computer-readable medium may include program instructions, data files, data structures, and the like, alone or in combination. The program instructions recorded on the medium may be those specially designed and constructed for the present invention or may be available to those skilled in the art of computer software. Examples of computer-readable media include magnetic media such as hard disks, floppy disks and magnetic tape; optical media such as CD-ROMs and DVDs; magnetic media such as floppy disks; Magneto-optical media, and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. Examples of program instructions include machine language code such as those produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like.

Claims

A data server for generating learning data based on recognition of human behavior in the low resolution image based on the high resolution learning image and generating a human behavior recognition model from the learning data; And

A low-resolution image is taken,

And a camera for capturing an image by changing to a high resolution mode than a conventional resolution if it is determined through a first behavior recognition that recognizes human behavior in a low resolution image using a human behavior recognition model transmitted from the data server, Camera system.
The method according to claim 1,

The camera comprises:

Determining whether the captured image is a crisis state through a second behavior recognition that recognizes human behavior in the image using the human behavior recognition model,

If the situation is a crisis, you can save the images shot at high resolution,

And stores the image photographed at a low resolution in the case of a non-crisis situation.
The method according to claim 1,

The camera comprises:

A photographing unit for photographing a low-resolution image;

A communication unit for receiving a human behavior recognition model used for recognizing human behavior in the low resolution image from the data server and transmitting the received human behavior recognition model to the image analysis unit;

An image analyzing unit for determining whether the user is in a crisis state through a first behavior recognition process for recognizing a human behavior in a low resolution image using the human behavior recognition model received from the communication unit;

A control unit for controlling the photographing unit to change to a higher resolution mode than the existing resolution and to photograph the photographing unit if a crisis state is determined by the image analysis unit; And

And a storage unit for storing a low-resolution image or a high-resolution image photographed by the photographing unit.
The method of claim 3,

Wherein the image analyzing unit comprises:

And determining whether the image is a crisis state through a second behavior recognition that recognizes human behavior in the image using the human behavior recognition model with respect to the image photographed by changing to the high resolution mode.
The method according to claim 1,

The data server comprising:

A resolution converter for converting a high-resolution human behavior image into a plurality of low-resolution images;

A plurality of low resolution images generated by the resolution conversion unit are received and divided into a spatial stream and a time stream, and convolution and pooling are performed for each stream, and a fully connected layer is added A convolution neural network learning unit for learning a convolutional neural network; And

And a human behavior recognition model generation unit that generates a human behavior recognition model based on data learned by the convolution neural network learning unit.
A method for automatically adjusting a resolution based on a behavior recognition by a camera,

(a) photographing an image of a camera at a low resolution;

(b) receiving a human behavior recognition model used by the communication unit of the camera for recognizing human behavior in a low-resolution image from an external data server and transmitting the human behavior recognition model to an image analysis unit of the camera;

(c) determining whether the image analyzing unit is in a crisis state through a first behavior recognition that recognizes human behavior in a low-resolution image using the human behavior recognition model received from the camera communication unit;

(d) when the image analyzing unit determines that the image capturing unit is in a crisis state, the control unit of the camera controls the image capturing unit to change the image capturing unit to a higher resolution mode than the existing resolution image capturing unit; And

(e) storing a low-resolution image or a high-resolution image captured by the photographing unit in a storage unit of the camera.
The method according to claim 6,

Between step (d) and step (e)

And a step of determining whether the image is a crisis state through a second behavior recognition that recognizes human behavior in the image using the human behavior recognition model for the image captured by changing the image analysis unit to the high resolution mode, Way.
A method for automatically recognizing an action in a video camera system,

(a) converting a high-resolution human behavior image into a plurality of low-resolution images by resolution conversion of a data server;

(b) a convolutional neural network learning unit of the data server receives the plurality of low-resolution images and divides them into a spatial stream and a time stream, performs convolution and pooling for each stream, layer is further applied to learn a Convolutional Neural Network;

(c) generating a human behavior recognition model based on data learned by the convolutional neural network learning unit;

(d) photographing a low-resolution image of a photographing part of the camera; And

(e) recognizing human behavior in the low-resolution image photographed by the photographing unit using the human behavior recognition model received from the data server by the image analysis unit of the camera.
A data server that performs learning for face recognition in an image using a learning image and generates a face recognition model; And

Receiving the face recognition model from the data server,

Resolution conversion is performed by capturing a high-resolution image,

Recognizing a face in the converted image using the face recognition model transmitted from the data server,

And a camera for outputting the anonymized image to the face region by re-converting the detected face region and the region other than the face region according to the face recognition result to have different resolutions.
A communication unit for receiving a face recognition model for face recognition in an image from an external data server;

A photographing unit for photographing a high-resolution image;

A low-resolution conversion module that receives the high-resolution image photographed by the photographing unit, receives the position information of the face region and the target resolution value from the processor, and converts the high-resolution image;

A processor for recognizing a face in the image converted by the low resolution conversion module using the face recognition model transmitted from the communication unit and updating the position of the face region and the target resolution value according to the face recognition result; And

And an output unit for outputting the image converted by the low resolution conversion module.
11. The method of claim 10,

The processor comprising:

When a face is detected in the image converted by the low resolution conversion module through face recognition,

The detected face area is recorded, the position information is stored,

The target resolution value for the remaining area excluding the stored face area is increased,

Wherein the low-resolution conversion module controls to convert the remaining region except for the stored face region back to an increased target resolution.
A communication unit for receiving a face recognition model for face recognition in an image from an external data server;

A photographing unit for photographing a high-resolution image;

A low-resolution conversion module that receives a high-resolution image captured by the imaging unit, receives a target resolution value from the processor, and converts the high-resolution image;

A processor for recognizing a face in the image converted by the low resolution conversion module using the face recognition model transmitted from the communication unit and updating the target resolution value according to the face recognition result; And

And an output unit for outputting the image converted by the low resolution conversion module.
13. The method of claim 12,

The processor comprising:

When a face is detected in the image converted by the low resolution conversion module,

And outputs the converted low-resolution image at the detection time point.
In a method for a camera to anonymize an image,

(a) capturing a high-resolution image of a photographing part of the camera;

(b) a communication unit of the camera receives a face recognition model for face recognition in an image from an external data server;

(c) receiving a high-resolution image photographed by the photographing unit by the low resolution conversion module of the camera, receiving the position of the face region and the target resolution value from the processor, and converting the high resolution image;

(d) recognizing a face in the image converted by the low resolution conversion module using the face recognition model received from the communication unit of the camera, and updating the position of the face region and the target resolution value according to the face recognition result ; And

(e) outputting an image converted by an output unit of the camera into the low resolution conversion module.
In a method for a camera to anonymize an image,

(a) capturing a high-resolution image of a photographing part of the camera;

(b) a communication unit of the camera receives a face recognition model for face recognition in an image from an external data server;

(c) receiving a high resolution image photographed by the photographing unit by the low resolution conversion module of the camera, receiving the target resolution value from the processor, and converting the high resolution image;

(d) recognizing a face in the image converted by the low resolution conversion module using the face recognition model received from the communication unit of the camera processor, and updating the target resolution value according to the face recognition result; And

(e) outputting an image converted by an output unit of the camera into the low resolution conversion module.