WO2019017720A1 - Camera system for protecting privacy and method therefor - Google Patents

Camera system for protecting privacy and method therefor Download PDF

Info

Publication number
WO2019017720A1
WO2019017720A1 PCT/KR2018/008196 KR2018008196W WO2019017720A1 WO 2019017720 A1 WO2019017720 A1 WO 2019017720A1 KR 2018008196 W KR2018008196 W KR 2018008196W WO 2019017720 A1 WO2019017720 A1 WO 2019017720A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
resolution
low
unit
camera
Prior art date
Application number
PCT/KR2018/008196
Other languages
French (fr)
Korean (ko)
Inventor
양현종
유상원
김기윤
김명언
Original Assignee
주식회사 이고비드
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from KR1020170092204A external-priority patent/KR101911900B1/en
Priority claimed from KR1020170092203A external-priority patent/KR101876433B1/en
Application filed by 주식회사 이고비드 filed Critical 주식회사 이고비드
Publication of WO2019017720A1 publication Critical patent/WO2019017720A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/18Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast

Definitions

  • the present invention relates to a technology for recognizing human behavior in a low-resolution image, a camera system for automatically adjusting a shooting resolution of the camera based on the technology, and a low- To a camera system capable of anonymizing video.
  • the present invention relates to a camera system for shooting a low-resolution image in a normal state to protect the privacy of a subject, a camera system for capturing a high-resolution image when a dangerous situation is determined through a behavior recognition technology for a low-resolution image, And an automatic behavior recognition method.
  • the present invention provides a camera system capable of automatically reducing the camera battery consumption by automatically adjusting the shooting resolution of the camera according to the importance of the situation through the low-resolution image behavior recognition technology, And to provide the above objects.
  • the present invention takes into account the characteristics of low-resolution image conversion, and learns a convolution neural network by mapping a plurality of low-resolution images transformed from a high-resolution image into an embedding space, And to provide an automatic behavior recognition method capable of increasing the human behavior recognition rate.
  • Another object of the present invention is to provide a camera system and a facial recognition-based image anonymization method that can protect privacy by separating a face region and an area other than a face region with respect to a high-resolution image by performing an ultra-low resolution processing only on the face region do.
  • a camera system generates learning data based on recognition of human behavior in a low-resolution image based on a high-resolution learning image, ; Resolution image, and when a crisis is detected through the first behavior recognition that recognizes human behavior in a low-resolution image using the human behavior recognition model transmitted from the data server, the image is changed to a high- And a camera for photographing.
  • the camera determines whether the image captured by changing to the high-resolution mode is a crisis state through a second behavior recognition that recognizes human behavior in the image using the human behavior recognition model, And stores the image photographed at a high resolution when the situation is not satisfied and the image photographed at a low resolution when the situation is not a crisis.
  • the camera may further include: a photographing unit photographing a low-resolution image; A communication unit for receiving a human behavior recognition model used for recognizing human behavior in the low resolution image from the data server and transmitting the received human behavior recognition model to the image analysis unit; An image analyzing unit for determining whether the user is in a crisis state through a first behavior recognition process for recognizing a human behavior in a low resolution image using the human behavior recognition model received from the communication unit; A control unit for controlling the photographing unit to change to a higher resolution mode than the existing resolution and to photograph the photographing unit if a crisis state is determined by the image analysis unit; And a storage unit for storing a low-resolution image or a high-resolution image photographed by the photographing unit.
  • the image analyzing unit can determine whether the image is a crisis state through a second behavior recognition that recognizes human behavior in the image using the human behavior recognition model for the image changed to the high-resolution mode.
  • the data server may include: a resolution conversion unit that converts a high-resolution human behavior image into a plurality of low-resolution images; A plurality of low resolution images generated by the resolution conversion unit are received and divided into a spatial stream and a time stream, and convolution and pooling are performed for each stream, and a fully connected layer is added A convolution neural network learning unit for learning a convolutional neural network; And a human behavior recognition model generation unit for generating a human behavior recognition model based on the data learned by the convolution neural network learning unit.
  • an automatic resolution adjusting method comprising the steps of: (a) capturing an image of a camera at a low resolution; (b) receiving a human behavior recognition model used by the communication unit of the camera for recognizing human behavior in a low-resolution image from an external data server and transmitting the human behavior recognition model to an image analysis unit of the camera; (c) determining whether the image analyzing unit is in a crisis state through a first behavior recognition that recognizes human behavior in a low-resolution image using the human behavior recognition model received from the camera communication unit; (d) when the image analyzing unit determines that the image capturing unit is in a crisis state, the control unit of the camera controls the image capturing unit to change the image capturing unit to a higher resolution mode than the existing resolution image capturing unit; And (e) storing a low-resolution image or a high-resolution image captured by the photographing unit by a storage unit of the camera.
  • the automatic resolution adjusting method may further include a step of, between the steps (d) and (e), performing a human action in the image using the human behavior recognition model And a second step of recognizing whether or not the user is in a crisis state by recognizing the second behavior.
  • an automatic behavior recognition method comprising: (a) converting a high-resolution human behavior image into a plurality of low-resolution images by a resolution conversion unit of a data server; (b) a convolutional neural network learning unit of the data server receives the plurality of low-resolution images and divides them into a spatial stream and a time stream, performs convolution and pooling for each stream, layer is further applied to learn a Convolutional Neural Network; (c) generating a human behavior recognition model based on data learned by the convolutional neural network learning unit; (d) photographing a low-resolution image of a photographing part of the camera; And (e) recognizing human behavior in the low-resolution image photographed by the photographing unit using the human behavior recognition model received from the data server by the image analysis unit of the camera.
  • a camera system including: a data server that performs learning for face recognition in an image using a learning image and generates a face recognition model; And a controller for receiving the facial recognition model from the data server, capturing a high resolution image to perform low resolution conversion, recognizing a face in the converted image using the facial recognition model transmitted from the data server, And a camera for outputting the anonymized image to the face region by re-converting the detected face region and the region other than the face region so as to have different resolutions.
  • a camera including: a communication unit for receiving a face recognition model for face recognition in an image from an external data server; A photographing unit for photographing a high-resolution image; A low-resolution conversion module that receives the high-resolution image photographed by the photographing unit, receives the position information of the face region and the target resolution value from the processor, and converts the high-resolution image; A processor for recognizing a face in the image converted by the low resolution conversion module using the face recognition model transmitted from the communication unit and updating the position of the face region and the target resolution value according to the face recognition result; And an output unit for outputting the image converted by the low resolution conversion module.
  • the processor when the face is detected in the image converted by the low resolution conversion module through face recognition, the processor records the detected face region and stores the position information, and the remaining region excluding the stored face region And the low resolution conversion module controls to convert the remaining area excluding the stored face area to the target resolution to be converted again.
  • a camera including: a communication unit for receiving a face recognition model for face recognition in an image from an external data server; A photographing unit for photographing a high-resolution image; A low-resolution conversion module that receives a high-resolution image captured by the imaging unit, receives a target resolution value from the processor, and converts the high-resolution image; A processor for recognizing a face in the image converted by the low resolution conversion module using the face recognition model transmitted from the communication unit and updating the target resolution value according to the face recognition result; And an output unit for outputting the image converted by the low resolution conversion module.
  • the processor may control the low-resolution conversion module to output the low-resolution image converted at the detection time when a face is detected in the image converted by the low-resolution conversion module.
  • an image anonymizing method comprising the steps of: (a) capturing a high resolution image of a photographing unit of a camera; (b) a communication unit of the camera receives a face recognition model for face recognition in an image from an external data server; (c) receiving a high-resolution image photographed by the photographing unit by the low resolution conversion module of the camera, receiving the position of the face region and the target resolution value from the processor, and converting the high resolution image; (d) recognizing a face in the image converted by the low resolution conversion module using the face recognition model received from the communication unit of the camera, and updating the position of the face region and the target resolution value according to the face recognition result ; And (e) the output of the camera outputting the image converted by the low resolution conversion module.
  • an image anonymizing method comprising the steps of: (a) capturing a high resolution image of a photographing unit of a camera; (b) a communication unit of the camera receives a face recognition model for face recognition in an image from an external data server; (c) receiving a high resolution image photographed by the photographing unit by the low resolution conversion module of the camera, receiving the target resolution value from the processor, and converting the high resolution image; (d) recognizing a face in the image converted by the low resolution conversion module using the face recognition model received from the communication unit of the camera processor, and updating the target resolution value according to the face recognition result; And (e) the output of the camera outputting the image converted by the low resolution conversion module.
  • a low-resolution image is normally photographed and a behavior recognition technology for a low-resolution image is determined to be a crisis, a high-resolution image is photographed, thereby protecting privacy of a subject.
  • the present invention it is possible to reduce the camera battery consumption and efficiently utilize the storage space by automatically adjusting the shooting resolution of the camera according to the importance of the situation through the low-resolution image behavior recognition technology.
  • a plurality of low-resolution images transformed from a high-resolution image are mapped into an embedding space in consideration of the characteristics of low-resolution image conversion, thereby learning a convolution neural network,
  • the behavior recognition rate can be increased.
  • the present invention it is possible to protect the privacy of the high-resolution image by distinguishing the face area from the area other than the face area and performing the ultra-low resolution processing only on the face area.
  • FIG. 1 is a block diagram of a camera system according to a first embodiment of the present invention.
  • FIG. 2 is a view for explaining the operational concept of the camera system according to the first embodiment of the present invention.
  • FIG. 3 is a diagram illustrating a configuration of a camera according to a first embodiment of the present invention.
  • FIG. 4 is a flowchart illustrating a flow of operation of the camera according to the first embodiment of the present invention.
  • FIG. 5 is a detailed block diagram of a data server in a camera system according to a first embodiment of the present invention.
  • FIG. 6 is a view for explaining that even when a high-resolution image is converted to a low-resolution image, a low-resolution image derived from the same high-resolution image may have a different value.
  • FIG. 7 is a view showing a structured manner in which the convolutional neural network learning unit according to the first embodiment of the present invention divides an input image into a spatial stream and a temporal stream.
  • FIG. 8 is a diagram showing the structure of a two-stream convolutional neural network applied to convolutional neural network learning according to the first embodiment of the present invention.
  • FIG. 9 is a diagram for explaining how the convolutional neural network learning unit according to the first embodiment of the present invention performs learning for intra-image behavior recognition using an embedding space.
  • FIG. 10 is a diagram illustrating a structure of a multi-siamese convolutional neural network applied to convolutional neural network learning according to the first embodiment of the present invention.
  • FIG. 11 is a flowchart illustrating a process of learning a convolutional neural network among methods of automatically recognizing behaviors in a video camera system according to the first embodiment of the present invention.
  • FIG. 12 is a diagram showing a comparison of accuracy of behavior recognition between a case in which a method of automatically detecting an intra-video motion in a camera system according to the first embodiment of the present invention is applied and a conventional technique.
  • FIG. 13 is a block diagram illustrating the configuration of a camera system according to a second embodiment of the present invention.
  • FIG. 14 is a diagram showing the structure of a convolutional neural network (CNN) used by a data server of a camera system according to a second embodiment of the present invention to generate a face recognition model.
  • CNN convolutional neural network
  • 15 is a diagram illustrating an example of a video image processed by the video anonymization method according to the second embodiment of the present invention.
  • 16 is a block diagram showing a configuration of a camera according to a second embodiment of the present invention.
  • FIG. 17 is a view simply showing an anonymizing process of an image photographed by a camera according to the second embodiment of the present invention in real time.
  • FIG. 18 is a flowchart illustrating a process of performing an image anonymization method according to the second embodiment of the present invention.
  • FIG. 19 is a flowchart illustrating a process of performing an image anonymizing method according to the third embodiment of the present invention.
  • a component when referred to as being connected or connected to another component, it may be directly connected or connected to the other component, but it should be understood that there may be other components in between.
  • FIGS. 13 to 19 relate to a second embodiment, And a camera system for recognizing and anonymizing a corresponding part.
  • FIG. 1 is a view illustrating a configuration of a camera system according to an embodiment of the present invention
  • FIG. 2 is a diagram illustrating an operation concept of a camera system according to an embodiment of the present invention.
  • the camera system 1 of the present invention includes a behavior recognition-based resolution automatic adjustment camera 10 and a data server 20.
  • the behavior recognition-based resolution automatic adjustment camera 10 photographs a subject
  • the data server 20 receives a high-resolution image, performs learning for human behavior recognition, and recognizes human behavior in a low-resolution image based on the learned data To generate a human behavior recognition model.
  • the motion recognition based resolution control camera 10 of the present invention may include various camera devices. For example, it can be a smart home camera, a robot camera, a security CCTV, a wearable camera, a car black box camera, and the like.
  • the data server 20 can perform learning for recognizing human behavior in a low-resolution image from a high-resolution image stored on-line or a high-resolution image on-line.
  • the data server 20 can generate learning data by learning a convolutional neural network, and can generate a human behavior recognition model based on the generated learning data.
  • FIGS. 5 to 10. FIG.
  • a camera 10 (hereinafter, referred to as a 'camera') photographs a low-resolution moving image in a usual manner to prevent privacy invasion of a subject.
  • the camera 10 of the present invention recognizes human behavior in a low-resolution image (for example, 16x12 pixels) by applying a behavior recognition technique to the photographed image.
  • the camera 10 can pick up an image of a high resolution (for example, 4K pixels) by selecting an optimal camera resolution according to a result of behavior recognition, and can store the captured image in a memory or transmit it to the data server 20. Therefore, the camera 10 of the present invention can solve the problem that the privacy of the subject can be violated by continuously photographing a high-resolution image even in a conventional camera or the like.
  • the present invention reduces battery consumption of the camera 10 and enables efficient use of the storage space. The specific operation of the camera 10 of the present invention will be described with reference to FIGS. 3 to 4. FIG.
  • FIG. 3 is a diagram illustrating a configuration of a camera according to a first embodiment of the present invention.
  • the camera 10 may include a photographing unit 100, a communication unit 200, an image analysis unit 300, a storage unit 400, and a control unit 500.
  • the shooting unit 100 can shoot a subject at a high resolution.
  • the communication unit 200 receives the human behavior recognition model used for recognizing human behavior in the low resolution image from the data server 20 and transmits the received human behavior recognition model to the image analysis unit 300.
  • the data server 20 can perform learning to recognize human behavior in a low-resolution image from a high-resolution image stored on-line or a high-resolution image on-line.
  • the high-resolution image is used to learn a Convolutional Neural Network to perform human behavior recognition in a low-resolution image photographed by the photographing unit 100.
  • a high-resolution image as a learning image may be a publicly available source.
  • a publicly available source can be, for example, an open source such as YouTube.
  • high-resolution images can be obtained from sources that publicly provide video online.
  • the communication unit 200 may transmit the image stored in the storage unit 400 to the data server 20.
  • the communication unit 200 may include at least one of a wireless communication module and a wired communication module.
  • the wireless communication module may include a wireless network communication module, a wireless local area network (WLAN), a wireless communication module such as a Wi-Fi, a wireless fidelity or WiMAX, and a wireless personal area network (WPAN) Or the like.
  • WLAN wireless local area network
  • WPAN wireless personal area network
  • the image analysis unit 300 can receive the human behavior recognition model from the communication unit 200 and recognize the human behavior in the low resolution image photographed by the photographing unit 100. [ The image analyzing unit 300 can recognize the human behavior in the low-resolution image photographed by the photographing unit 100 using the human behavior recognition model, and judge whether or not the recognized human behavior corresponds to the crisis situation recognition).
  • the storage unit 400 may store a low-resolution image or a high-resolution image photographed by the photographing unit 100.
  • the storage unit 400 stores images (weapons, naked images, drugs, blood, etc.) to be recognized or analyzed by the image or image analysis unit 300 taken by the image sensing unit 100 as a dangerous situation, And may be in the form of a flash memory for providing images in a real-time streaming manner.
  • the storage unit 400 includes a main storage unit and an auxiliary storage unit, and stores an application program necessary for the functional operation of the camera 10.
  • the storage unit 400 may include a program area and a data area.
  • the program area is an area for storing an operating system or the like for booting the camera 10
  • the data area is an area for storing an image photographed by the photographing part 100.
  • the control unit 500 may control the photographing unit 100, the communication unit 200, the image analysis unit 300, and the storage unit 400.
  • the control unit 500 can control the photographing unit 100 to change the mode to a higher resolution mode than the existing resolution and take a picture if the crisis is determined by the image analysis unit 300. [ This is to maximize the privacy of the photographed subject and to minimize the battery consumption of the camera 10 and the utilization of the storage space. That is, the normal photographing unit 100 photographs with a low resolution image (ex. 16x12 pixels), and when receiving the control signal, it can photograph with a high resolution image (ex. 720x480, 1280x720, 1920x1080, 4K, etc.).
  • the control unit 500 may control the photographing unit 100 to operate at a predetermined cycle to minimize battery consumption.
  • FIG. 4 is a flowchart for explaining the flow of operation of the camera according to the first embodiment of the present invention.
  • the photographing unit 100 photographs a normal low-resolution image and transmits the photographed image to the image analysis unit 300 (S210).
  • the image analyzing unit 300 determines whether the user is in a crisis state by recognizing a first action in the low resolution image using the human behavior recognition model received from the communication unit 200 at step S220.
  • the control unit 500 transmits a control signal so that the low resolution image is stored in the storage unit 400 when the image analysis unit 300 determines that the human action in the low resolution image does not correspond to the crisis situation.
  • the control unit 500 transmits a control signal to the imaging unit 100 to capture the subject as a high-resolution image (S240).
  • the image analyzing unit 300 determines whether a high-resolution image photographed by the photographing unit 100 is in a crisis state through a second behavior recognition that recognizes human behavior in the image using the human behavior recognition model at step S250.
  • the control unit 500 transmits a control signal to store the high-resolution image in the storage unit 400, A control signal is transmitted so that an image photographed at a low resolution is stored in the storage unit 400 (S260).
  • the camera 10 of the present invention can recognize the human behavior by performing two human behavior recognizations on the photographed image, and finally store the photographed image. Therefore, it is possible to reduce the error of the behavior recognition, to protect the privacy of the object and to appropriately use the storage space.
  • FIG. 5 is a detailed block diagram of a data server in a camera system according to a first embodiment of the present invention.
  • the data server 20 may include a resolution conversion unit 21, a convolutional neural network learning unit 22, and a human behavior recognition model generation unit 23.
  • the resolution conversion unit 21 can convert the high-resolution human behavior image received from the communication unit 200 into a plurality of low-resolution images. Multiple low-resolution images can be obtained using a technique called Inverse Super Resolution (ISR).
  • ISR Inverse Super Resolution
  • the main motivation of Inverse Super Resolution (ISR) is that a high resolution image contains a quantity of information corresponding to a low resolution image set, and a behavior recognition method can be performed by applying another low resolution image transformation to a single high resolution image. (M. S. Ryoo, B. Rothrock, C. Fleming, and H. J. Yang. Privacy-preserving human activity recognition from extreme low resolution. In AAAI, 2017.)
  • Resolution conversion section 21 generates n low-resolution images (i.e., V ik ) for each high-resolution training image X i by applying transform sets F k and D k as shown in the following equation (1) .
  • Fk is the camera motion transformation and Dk is the downsampling operator.
  • F k can be any affine transformation
  • the present invention regards the combination of transformation, scaling, and rotation as motion transformation.
  • standard mean downsampling is used for D k .
  • a sufficient number of low resolution transformed images V ik generated from the learning samples X i are provided to the convolutional neural network CNN and the classifier 330 so that efficient learning can be performed.
  • the convolutional neural network learning unit 22 receives a plurality of low resolution images generated by the resolution conversion unit 21 and divides them into a spatial stream and a time stream (optical flow), and performs convolution and pooling ) And apply a fully connected layer to learn the Convolutional Neural Network.
  • the convolutional neural network learning unit 22 can know that spatial streams and time streams derived from low-resolution images are used as parameters.
  • the spatial resolution of the low-resolution image is 16 ⁇ 12 pixels.
  • the spatial stream is configured to input the RGB pixel values of each frame (i.e., the input dimension is 16x12x3)
  • the temporal stream is the 10-frame connection of the X and Y optical flow images (i.e., 16x12x20) Lt; / RTI >
  • the X and Y optical flow images are constructed by calculating the " x (and y) optical flow magnitude " per pixel.
  • the human behavior recognition model generation unit 23 can generate a human behavior recognition model for recognizing human behavior in an image photographed by the photographing unit 100 based on data learned by the convolution neural network learning unit 22 have.
  • the human behavior recognition model may correspond to a classifier.
  • the classifier can shorten the image information and motion information included in the image based on the learned data when a new image is input, and can recognize the behavior in the image using the reduced information.
  • the image analysis technique of the present invention can be applied not only to a convolutional neural network (CNN) but also to a support vector machine (SVM).
  • FIG. 8 is a diagram illustrating a structure of a two-stream convolutional neural network applied to convolutional neural network learning according to an embodiment of the present invention.
  • the convolutional neural network learning unit 22 can perform learning for behavior recognition using a two-stream convolutional neural network.
  • a two-stream convolution neural network is applied to each frame of the image. They are summarized using time pyramids and generate a single video representation. Let a 2-stream network applied to each frame V t of video V at time t be h (V t ). Then, the expression f (V;?) Is calculated according to the following equation (2).
  • "," denotes the vector concatenation operator
  • T is the number of frames in the image V
  • fc represents the set of fully connected layers to be applied at the top of the connection.
  • the size of h (V t ) is 512-D: 256 x 2.
  • is a set of parameters in CNN that require learning from learning data.
  • max is a temporary maximum pooling operator that computes the maximum value of each element. In the example, four levels of time pyramids were used (i.e., a total of 15 maximum poolings).
  • a fully connected layer and a softmax layer can be applied to f (V) to learn the classifier.
  • Let g be such a layer, y g (f (V;?)).
  • y is an activity class label.
  • the learning g (f (V; [theta]) having classification loss using the low resolution image generated by the resolution converting unit 21 can provide a basic image classification model.
  • Figure 8 shows the overall structure of a two-stream CNN model using this time pyramid.
  • FIG. 9 is a diagram for explaining how the convolutional neural network learning unit 22 according to the first embodiment of the present invention performs learning for intra-image behavior recognition using an embedding space.
  • the two-stream network design of FIG. 8 can classify behavioral images by learning model parameters optimized for classification, but does not consider the characteristics of ultra-low resolution images in which different low resolution data are generated due to different transformations applied to the same scene .
  • the classifier In order for the classifier to better reflect the characteristics of the ultralow-resolution image, the embedding space, which maps to the same embedding location, regardless of the conversion to different low-resolution images with the same semantic content, .
  • Embedding learning is commonly optimized for both embedded learning and classification using the learned embedding in an end-to-end manner, and more generalized (i.e., overfilled) Enables learning of classifiers.
  • embedding learning is performed by minimizing the embedding distance between negative pairs while maximizing the embedding distance between positive pairs.
  • FIG. 10 is a diagram illustrating a structure of a multi-siamese convolutional neural network applied to convolutional neural network learning according to an embodiment of the present invention.
  • the resolution converting unit 21 converts a first batch composed of n low-resolution images having different resolutions and a plurality of different high-resolution images converted from the same high-resolution image, A second batch composed of n low-resolution images having the same size as the training data, and transmits the training data to the convolutional neural network learning unit 22.
  • the convolutional neural network learning unit 22 receives the first and second arrangements, divides each of the images into a spatial stream and a temporal stream, performs convolution and pooling for each stream, After the layer (Fully connected layer) is further applied, the embedding distance is mapped by mapping to the embedding space, so that learning for motion recognition in an image can be performed.
  • the Siamese neural network is a concept often used to learn similarity measures between two inputs, with two networks sharing the same parameters.
  • the goal of the Siamese network is to learn the embedding space in which similar items (low-resolution images in the case of the present invention) are placed nearby. More specifically, a sample corresponding to a positive pair that should be located close in the embedding space and a sample corresponding to a negative pair that should be located remotely are used for learning.
  • B is the batch of low resolution learning examples used
  • i and j are the index of the learning pair in the batch.
  • y ' ( i, j ) is a binary number, 1 for positive pairs and 0 for negative pairs.
  • the positive pair is composed of two low-resolution images derived from the same high-resolution image
  • the negative pair is composed of two low-resolution images derived from different high-resolution images.
  • Figure 9 shows embedding learning with contrast loss.
  • the network must be learned by using the combining loss function as shown in the following equation (4).
  • the multi-siamese convolutional neural network of the present invention has 2n network copies sharing the same parameter ⁇ for f (V; ⁇ ), unlike the standard cyamises network with only two network sharing parameters.
  • the multi-siamese convolutional neural network maps each copy to each of the n different low resolution transforms (i.e., F k ). Therefore, the embedding distance can be kept small by using the contrast loss.
  • the multi-siamuse convolutional neural network has n more network copies using images that do not match the scene of the first n branch to form a negative learning pair.
  • X ik f (V ik;? ), where V ik is obtained by applying the transform F k to X i .
  • Two types of placement are randomly generated based on placement B of the original high resolution learning image.
  • B 1 is the arrangement of the low-resolution images generated in the single high-resolution image Xi
  • B 2 is the arrangement of the randomly selected low-resolution images.
  • B 1 produces a positive pair
  • B 2 produces a negative pair.
  • the size of B 1 and B 2 should be n.
  • the low-resolution example Vj of B 2 is provided directly to the remaining n branches of the siamese network. Therefore, the new loss function is expressed by the following equation (5).
  • the multi-siamese convolution neural network model of the present invention simultaneously considers a plurality of low-resolution transforms for embedding learning.
  • the new loss function essentially takes all pairs of n low resolution transforms as positive pairs and considers the same number of negative pairs using a separate arrangement.
  • the final loss function is calculated by combining the multi-cimase contrast loss with the standard classification loss performed in equation (4).
  • FIG. 10 shows the overall process of learning multi-cyamics embedding and classifier. This can be seen as Siamese CNN, which combines multiple contrast losses of different low resolution pairs.
  • the multi-siamese convolution neural network model of the present invention uses three fully connected layers for embedding learning and classification. After applying the time pyramid, we obtain an intermediate representation of 7680-D (ie, 15 ⁇ 256 ⁇ 2) per image. Next we get a fully connected layer of size 8192. Embedding learning occurs after the second full-connect layer, where x has a dimension of 8192-D. Classification is performed by adding another fully connected layer and a soft max layer at the top.
  • FIG. 11 is a flowchart illustrating a step of learning a convolutional neural network among methods of automatically recognizing behaviors in a video camera system according to an embodiment of the present invention.
  • the data server 20 receives a high-resolution learning image (S1101).
  • the resolution conversion unit 21 converts the received high-resolution human behavior image into a plurality of low-resolution images (S1102).
  • the resolution converting unit 21 converts a first batch composed of n low resolution images having different resolutions and a plurality of different high resolution images converted from the same high resolution image, (S1103), and transmits the second batch to the convolutional neural network learning unit 22.
  • the convolutional neural network learning unit 22 generates a second batch of low-resolution images as learning data (S1103).
  • the convolutional neural network learning unit 22 receives a plurality of low-resolution images and divides them into a spatial stream and a temporal stream, performs a convolution and a pooling for each stream, and performs au, a fully connected layer, And the convolutional neural network is learned (S1104). At this time, the convolutional neural network learning unit 22 receives the first and second arrangements, divides each of the images into a spatial stream and a time stream, and performs convolution and pooling for each stream After the Fully connected layer is further applied, the embedding distance is mapped to the embedding space to perform learning for intra-image motion recognition (S1105).
  • FIG. 12 is a view comparing the accuracy of the behavior recognition between the case where the method of automatically detecting the behavior in the camera of the camera system according to the embodiment of the present invention and the accuracy of the behavior recognition between the prior art.
  • FIG. 12 shows the results of comparing the accuracy of classification using a 16x12 HMDB dataset or a DogCentric activity dataset in the case of (1) applying basic 1-stream CNN, (2) applying 2-stream CNN, Table.
  • two CNNs we divided into (i) learning without using multiple low-resolution transforms, (ii) learning with multiple low-resolution transformations applied but without embedding learning, and (iii) learning with Siamese embedding learning. Respectively.
  • the number of low-resolution transformations used in the experiment is 15. These transformations were randomly selected from a uniform pool. A total of 25 motion transforms F k were provided in the X and Y directions of ⁇ -5, -2.5, 0, +2.5, + 5 ⁇ %. In addition, a total of 75 transforms were provided with three different rotations of the angle ⁇ -5, 0, 5 ⁇ degrees. 27 transforms out of 75 were randomly selected.
  • 12B is a table showing the result of comparing the present invention with the conventional technique. As shown in the table, it can be seen that the classification accuracy according to the present invention is higher than 8%. That is, the method of applying the embedding learning to the 2-stream multi-Siamese CNN, which is one embodiment of the present invention, yields the best result in the 16x12 resolution motion recognition.
  • 12C and 12D are tables showing the results of experiments using DogCentric activity dataset.
  • FIG. 13 is a block diagram illustrating the configuration of a camera system according to a second embodiment of the present invention.
  • the privacy camera system 5 may include a camera 50 and a data server 60.
  • the camera 50 receives the face recognition model from the data server 60, captures a high-resolution image and performs low-resolution conversion.
  • the camera 50 uses the face recognition model transmitted from the data server 60, And an image that has been anonymized with respect to the face region can be output by re-converting the detected face region and the region other than the face region according to the face recognition result to have different resolutions.
  • the camera 50 of the present invention may include various camera devices. For example, it can be a smart home camera, a robot camera, a security CCTV, a wearable camera, a car black box camera, and the like. The specific configuration and function of the camera 50 will be described below with reference to FIG. 15 to FIG.
  • the data server 60 can perform learning for face recognition in an image using a learning image and generate a face recognition model.
  • the learning image may include both a high-resolution image and a low-resolution image.
  • the data server 60 uses a convolutional neural network (CNN) in generating a face recognition model.
  • CNN convolutional neural network
  • Convolutional neural network (CNN) is an algorithm with a structure that learns to analyze and classify input image (image frame).
  • the Fully convolutional network is newly applied to the ultra-low resolution face recognition.
  • Such an FCN model can analyze and classify whether or not a specific pixel of an input image (image frame) is a face pixel using learning data (a plurality of images including a 10x10 face).
  • FCN Fully convolutional network
  • FCN Fully Convolutional Network
  • This FCN is learned to be specialized for 10x10 size face recognition (detection), and learning can be performed using a face image database.
  • the learning follows a backpropagation method in the convolutional neural network.
  • the FCN judges whether a new input image corresponds to a face for each pixel to derive a probability that the corresponding pixel is a face.
  • the size of the face pixel may be 10 x 10 or larger.
  • a sliding window method is applied to find the position and size of the face.
  • a sliding window is a method of checking how many face pixels are present in a box by applying a bounding box of a predetermined size to all positions of the image. Using this, we find a box containing a lot of pixels with a high probability of a face, and non-maximum suppression of such boxes to find a final bounding box.
  • Non-maximum suppression refers to thinning edges obtained from image processing. Non-maximum suppression is performed to find a sharper line because the edges found using Gaussian mask and sob mask are blurred edges that are easily crumpled. In other words, non-maximum suppression is a process of comparing the pixel values of 8 directions based on the center pixel and removing the center pixel when the center pixel is the largest.
  • An integral image is simply an image with the sum of the next pixel plus the previous pixel.
  • the integral image is used, the sum of pixel values of a specific area can be obtained very easily.
  • FIG. 15 is a diagram illustrating an example of an image that has been anonymized by the face recognition based real-time automatic image anonymization method of the present invention.
  • the leftmost image is a high-resolution image
  • the middle image is a low-resolution image
  • the rightmost image corresponds to an anonymized image.
  • the camera and image anonymizing method according to the second embodiment of the present invention can convert only a face area into an ultra-low resolution image and photograph a region outside the face area while maintaining a high resolution.
  • 16 is a block diagram showing the configuration of a camera 50 according to the second embodiment of the present invention.
  • the camera 50 of the present invention includes a communication unit 150, a photographing unit 250, a low resolution conversion module 350, a processor 450, an output unit 550, and a storage unit 650 can do.
  • the communication unit 150 may receive the face recognition model for performing face recognition in the image from the data server 60 and transmit the received face recognition model to the processor 450.
  • the face recognition model used for recognizing a face in an image captured by the camera 50 is generated by the data server 60.
  • the communication unit 150 may transmit the image captured and stored by the camera 50 to the data server 60.
  • the communication unit 150 may include at least one of a wireless communication module and a wired communication module.
  • the wireless communication module may include a wireless network communication module, a wireless local area network (WLAN), a wireless communication module such as a Wi-Fi, a wireless fidelity or WiMAX, and a wireless personal area network (WPAN) Or the like.
  • WLAN wireless local area network
  • WPAN wireless personal area network
  • the photographing unit 250 can photograph a high-resolution image with respect to a subject.
  • the photographing unit 250 transmits the photographed high resolution image to the low resolution conversion module 350.
  • Resolution conversion module 350 receives the high resolution image from the image sensing unit 250 and receives the position of the face region and the target resolution value from the processor 450 to convert the high resolution image.
  • the position of the initial face region to be input may be an empty value that is not set, and the initial target resolution value may be a predetermined low resolution value.
  • the ultralow resolution value may be 16x12 pixels. That is, the low resolution conversion module 350 initially performs a low resolution conversion on the entire image without distinguishing the regions other than the face region and the face region.
  • the low-resolution conversion module 350 receives the position of the face region stored from the processor 450, and converts the face region and the non-face region in the image. At this time, the face area can be converted into a predetermined low resolution resolution, and the area outside the face area can be converted into the target resolution value.
  • the low resolution conversion module 350 may be implemented as a circuit and only receives the face area position information and the target resolution value from the processor 450 and converts the image frame continuously supplied from the photographing unit 250, It does not have storage space or memory.
  • the low resolution conversion module 350 may convert the high resolution image by receiving only the target resolution value from the processor 450. That is, the low resolution conversion module 350 can perform the low resolution conversion on the entire image without distinguishing the face region separately.
  • the processor 450 can recognize a face in the image converted by the low resolution conversion module 350 using the face recognition model transmitted from the communication unit 150.
  • the processor 450 may be a processing unit such as a CPU (Central Processing Unit) or a GPU (Graphics Processing Unit).
  • the processor 450 updates the position of the face area and the target resolution value according to the face recognition result. That is, when a face is detected in the image converted by the low resolution conversion module 350, the processor 450 records the detected face region, stores the position information, and searches for a target Increase the resolution value.
  • the processor 450 causes the low resolution conversion module 350 to convert again to the increased resolution for the remaining area outside the detected face area. Thereafter, the processor 450 performs face recognition again on the converted image with the increased resolution, and repeats the above process.
  • the camera 50 of the present invention repeatedly performs face recognition / detection and resolution conversion while maintaining the resolution of the face area in the image to NxN and increasing the resolution of the area outside the face area to the final target resolution.
  • An image output through this process is called an " anonymizing video ".
  • the processor 450 stops increasing the resolution of the instantaneous image in which the face is detected, instead of increasing only the resolution of the remaining region except for the detected face region when a face is detected in the image, and immediately outputs the low- .
  • the resolution of the area other than the face area is increased to speed up the calculation speed compared with the case of repeating resolution conversion and face recognition, but there is a disadvantage in that the anonymization processing is not precise.
  • the output unit 550 outputs the image converted by the low resolution conversion module 350. That is, the output unit 550 outputs the anonymized image.
  • the storage unit 650 may store an image output by the output unit 550.
  • the storage unit 650 includes a main storage device and an auxiliary storage device, and stores an application program necessary for the functional operation of the camera 50.
  • the storage unit 650 may include a program area and a data area.
  • FIG. 17 is a view simply showing an anonymizing process of an image taken by a camera 50 according to the second embodiment of the present invention in real time.
  • the photographing unit 250 transmits the photographed high resolution image to the low resolution conversion module 350.
  • the low resolution conversion module 350 converts the received high resolution image into a low resolution image and transmits the low resolution image to the processor 450 do.
  • the processor 450 performs face recognition on the low-resolution image, transfers the face region location information and the target resolution value according to the face recognition result to the low resolution conversion module 350, and performs the conversion again.
  • the target resolution value reaches the final target resolution, the image converted by the low resolution conversion module 350 is output.
  • FIG. 18 is a flowchart illustrating a process of performing an image anonymization method according to the second embodiment of the present invention.
  • the photographing unit 250 transmits the photographed high-resolution image to the low resolution conversion module 350 (S1810).
  • the low resolution conversion module 350 receives the position information of the face region and the target resolution value from the processor 450 and converts the face region in the high resolution image to the super low resolution in operation S1820.
  • Resolution conversion (S1830). However, since there is no face region position information at the initial conversion, the resolution conversion is performed at a very low resolution with respect to the entire image.
  • the processor 450 performs facial recognition on the image converted by the low resolution conversion module 350 using the facial recognition model transmitted from the communication unit 150 (S1840).
  • the processor 450 adds the position information of the detected face region to the low resolution conversion module 350 (S1850). If there is no new face in the face recognition result image, the processor 450 increases only the target resolution value and transmits it to the low resolution conversion module 350 (S1860). When the resolution of the region other than the face region in the image matches the resolution of the high resolution image, which is the original image, the processor 450 outputs the image at that point of time and ends the processing. However, if the resolution of the region other than the face region in the converted image does not match the resolution of the high resolution image, which is the original image, the face recognition is performed again (S1870).
  • FIG. 19 is a flowchart illustrating a process of performing a face recognition based real-time automatic image anonymizing method according to another embodiment of the present invention.
  • the photographing unit 250 transmits the photographed high resolution image to the low resolution conversion module 350 (S1910).
  • the low resolution conversion module 350 receives the target resolution value from the processor 450 and performs low resolution conversion on the entire image of the high resolution image (S1920). However, in the initial conversion, the resolution conversion is performed at a very low resolution for the entire image.
  • the processor 450 performs facial recognition on the image converted by the low resolution conversion module 350 using the facial recognition model transmitted from the communication unit 150 (S1930). If there is no new face in the face recognition result image, the processor 450 increases the target resolution value and transmits it to the low resolution conversion module 350 (S1940).
  • the processor 450 controls the output unit 550 to output the converted low resolution image at the face detection time and ends the processing in step S1950.
  • the processor 450 outputs the image at that point of time and ends the processing .
  • the camera system according to the first and second embodiments and the method of driving the camera system have been described above.
  • the automatic recognition of behavior-based resolution, the automatic behavior recognition method, and the face recognition based image anonymization method according to the present invention can be implemented in the form of a program command that can be executed through various computer means and recorded in a computer readable medium .
  • the computer-readable medium may include program instructions, data files, data structures, and the like, alone or in combination.
  • the program instructions recorded on the medium may be those specially designed and constructed for the present invention or may be available to those skilled in the art of computer software.
  • Examples of computer-readable media include magnetic media such as hard disks, floppy disks and magnetic tape; optical media such as CD-ROMs and DVDs; magnetic media such as floppy disks; Magneto-optical media, and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like.
  • Examples of program instructions include machine language code such as those produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

The present invention relates to a technology for recognizing human activity in a low-resolution image, and a camera system for automatically adjusting the resolution of a camera on the basis of the technology, and the present invention also relates to a camera for anonymizing a video by using a low-resolution face recognition/detection technology, and an anonymizing method therefor.

Description

사생활 보호를 위한 카메라 시스템 및 그 방법Camera system and method for protecting privacy
본 발명은 사생활 보호를 위한 카메라 시스템 및 그 방법에 관한 것으로, 구체적으로는 저해상도 영상 내의 인간 행동을 인식하는 기술과 이를 기반으로 하여 카메라의 촬영 해상도를 자동으로 조절하는 카메라 시스템, 그리고 저해상도 얼굴 인식 기술을 활용하여 비디오를 익명화 할 수 있는 카메라 시스템에 관한 것이다.More particularly, the present invention relates to a technology for recognizing human behavior in a low-resolution image, a camera system for automatically adjusting a shooting resolution of the camera based on the technology, and a low- To a camera system capable of anonymizing video.
기존의 카메라 시스템은 피사체의 동의를 받지 않은 상태로 연속 촬영을 할 수 있기 때문에 피사체의 사생활을 침해하는 문제가 존재한다. 이러한 사생활 침해 문제는 다음과 같이 다양하게 존재할 수 있다. Conventional camera systems have a problem of invading the privacy of a subject because they can take continuous pictures without the consent of the subject. The problem of privacy invasion can be variously as follows.
첫 번째 예시로, 동의 받지 않은 고해상도 영상의 연속 촬영에 의한 피사체의 사생활 침해가 최근 중요한 사회적 이슈로 대두되고 있다. 예를 들어, 가정용 (가정 보안 또는 스마트 홈 서비스) 카메라가 해킹 당한다면, 24시간 내내 개인 생활이 다른 사람에 의해 감시 또는 녹화될 위험이 있다. 또한, 로봇 카메라, 웨어러블 카메라 등의 고해상도 카메라는 연속되는 고해상도 영상 촬영으로 인해 배터리 소모가 크고 많은 저장 공간을 필요로 하여 장시간 사용이 어려운 문제도 있었다.As a first example, privacy invasion of a subject by continuous shooting of unsupposed high-resolution images is emerging as an important social issue in recent years. For example, if a home (home security or smart home service) camera is hacked, there is a risk that your personal life will be watched or recorded by someone else throughout the 24 hours. In addition, high-resolution cameras such as a robot camera and a wearable camera have a problem that the battery consumption is large due to continuous high-resolution image capturing and the storage space is required, which makes it difficult to use for a long time.
따라서 이러한 사생활 침해 문제로부터 안전한 카메라 장치의 개발이 필요하며, 그것을 가능하게 하는 사생활 보호 컴퓨터 비전 알고리즘(computer vision algorithms)의 디자인이 필요하다. 즉, 영상에서 중요한 사건과 인간 행동을 인식하고, 그것을 바탕으로 무작위적인 고해상도 영상 촬영이 이루어지지 않도록 할 필요성이 있다. 이와 관련하여, 최근에는 저해상도 영상으로부터 인간 행동을 인식하기 위한 시도들이 있었다. 그러나 이전 연구의 대부분은 저해상도 센서의 본질적인 특성을 거의 고려하지 않은 점에서 제한적인 부분이 있었다. 도 6을 참조하면, 고해상도 영상을 저해상도로 변환할 경우, 단일 픽셀이 장면으로부터 캡처할 수 있는 고유한 제한으로 인해, 정확히 동일한 장면에서 비롯된 이미지들이라도 종종 완전히 다른 픽셀 (즉, RGB) 값을 가질 수 있다. 저해상도 변환에 따라, 정확히 동일한 장면으로부터 비롯된 복수의 저해상도 이미지라도 완전히 다른 시각적 데이터가 될 수 있는 것이다. 즉, 이전 연구의 대부분은 이러한 저해상도 변환의 특성을 고려하지 못했다.Therefore, it is necessary to develop a secure camera device from the privacy invasion problem, and to design a privacy vision computer vision algorithm which makes it possible. In other words, there is a need to recognize important events and human behaviors in the image, and to prevent random high-resolution image capturing based on the recognition. In this regard, there have been recent attempts to recognize human behavior from low resolution images. However, most of the previous researches have limited in that they do not consider the intrinsic properties of low-resolution sensors. Referring to FIG. 6, when converting a high-resolution image to a low-resolution image, images originating from exactly the same scene often have completely different pixel (i.e., RGB) values due to inherent limitations that a single pixel can capture from a scene . With low-resolution transforms, even a plurality of low-resolution images originating from exactly the same scene can be completely different visual data. In other words, most of the previous studies did not take into account the characteristics of these low resolution transformations.
두 번째 예시로, 개인적인 장소에서 본인이 개인적으로 설치한 카메라도 해킹에 의해 사생활이 노출될 수 있는 위험이 있다.As a second example, there is a risk that privacy cameras may be exposed by personal hacking in a personal place.
이러한 사생활 침해 문제 해결을 위해 의도적으로 저품질의 비디오만을 촬영하는 기술이 있었으나, 저품질의 비디오로는 비디오 내의 구체적 상황이 어떠한지 판단하기 어려운 문제점이 있다. 즉, 전체 영상에 대해 무조건적으로 저해상도 처리를 해버리는 경우 사생활은 보호가 되지만, 사람의 행동이나 물체를 구별할 수 없기 때문에 영상 내의 상황을 파악할 수가 없게 된다. 반대로 무조건 고해상도 영상을 촬영하는 경우에는 사람의 행동이나 물체를 인식하는 것은 쉬워지지만 사생활 침해 문제를 피하기 어렵다. 따라서 영상 내의 상세한 정보는 보존하면서 동시에 피사체의 사생활 보호가 필요한 영역(ex. 얼굴 영역)만 저해상도 처리하는 비디오 익명화 기술이 필요하다.In order to solve such privacy invasion problem, there has been a technique of intentionally shooting only low quality video, but it is difficult to judge the specific situation in video with low quality video. In other words, if unconditionally low resolution processing is applied to the entire image, the privacy is protected but the situation in the image can not be grasped because the human action or object can not be distinguished. On the contrary, when a high-resolution image is unconditionally captured, it is easy to recognize a person's behavior or an object, but it is difficult to avoid privacy problems. Therefore, there is a need for video anonymization technology that preserves detailed information in the image and low-resolution processing only the area (eg, face area) that needs to protect the privacy of the subject.
본 발명은 피사체에 대한 프라이버시 보호를 위해 평상시에는 저해상도 영상을 촬영하다가, 저해상도 영상에 대한 행동 인식 기술을 통해 위기 상황이라고 판단된 경우 고해상도 영상을 촬영하는 카메라 시스템, 행동인식 기반의 해상도 자동 조절 방법 및 행동 자동 인식 방법을 제공하는 것을 목적으로 한다.The present invention relates to a camera system for shooting a low-resolution image in a normal state to protect the privacy of a subject, a camera system for capturing a high-resolution image when a dangerous situation is determined through a behavior recognition technology for a low-resolution image, And an automatic behavior recognition method.
또한, 본 발명은 저해상도 영상 행동 인식 기술을 통해 상황의 중요도에 따라 카메라의 촬영 해상도를 자동으로 조절하여 카메라 배터리 소모를 줄이고, 저장 공간의 효율적 활용이 가능한 카메라 시스템과 행동인식 기반의 해상도 자동 조절 방법을 제공하는 것을 목적으로 한다.In addition, the present invention provides a camera system capable of automatically reducing the camera battery consumption by automatically adjusting the shooting resolution of the camera according to the importance of the situation through the low-resolution image behavior recognition technology, And to provide the above objects.
또한, 본 발명은 저해상도 영상 변환의 특성을 고려하여, 임베딩 공간(Embedding space)에 고해상도 영상으로부터 변환된 복수의 저해상도 영상을 맵핑함으로써 콘볼루션 신경망(Convolutional neural network)을 학습시키고, 촬영된 저해상도 영상 내의 인간 행동 인식률을 높일 수 있는 행동 자동 인식 방법을 제공하는 것을 목적으로 한다.In addition, the present invention takes into account the characteristics of low-resolution image conversion, and learns a convolution neural network by mapping a plurality of low-resolution images transformed from a high-resolution image into an embedding space, And to provide an automatic behavior recognition method capable of increasing the human behavior recognition rate.
다른 한편, 본 발명은 영상을 촬영하는 과정에서 피사체에 대한 익명화 처리를 자동으로 수행하여 피사체의 사생활을 보호할 수 있는 카메라 시스템과 얼굴인식 기반의 영상 익명화 방법을 제공하는 것을 목적으로 한다.It is another object of the present invention to provide a camera system and an image anonymization method based on face recognition, which can protect the privacy of a subject by automatically performing an anonymization process on a subject in the process of capturing an image.
또한, 본 발명은 영상 내 얼굴영역에 대한 익명화 처리 시 CPU/GPU의 메모리 상에 저장하지 않은 채로 실시간 처리함으로써 해킹으로부터 보호할 수 있는 카메라 시스템과 얼굴인식 기반의 영상 익명화 방법을 제공하는 것을 목적으로 한다.It is another object of the present invention to provide a camera system and an image anonymization method based on face recognition that can protect against hacking by being processed in real time without being stored on a memory of a CPU / GPU during an anonymization process on a face area in an image do.
또한, 본 발명은 고해상도 영상에 대해 얼굴영역과 얼굴영역 외의 영역을 구분하여 얼굴영역에 대해서만 초저해상도 처리를 함으로써 사생활을 보호할 수 있는 카메라 시스템과 얼굴인식 기반의 영상 익명화 방법을 제공하는 것을 목적으로 한다.Another object of the present invention is to provide a camera system and a facial recognition-based image anonymization method that can protect privacy by separating a face region and an area other than a face region with respect to a high-resolution image by performing an ultra-low resolution processing only on the face region do.
위와 같은 문제점을 해결하기 위하여, 본 발명에 따른 카메라 시스템은 고해상도 학습 영상을 바탕으로 저해상도 영상 내의 인간행동을 인식하는데 기초가 되는 학습 데이터를 생성하고, 학습 데이터로부터 인간행동 인식 모델을 생성하는 데이터서버; 및 저해상도 영상을 촬영하고, 상기 데이터서버로부터 전송 받은 인간행동 인식 모델을 이용하여 저해상도 영상 내 인간행동을 인식하는 제1 행동인식을 통해 위기상황으로 판단되면, 기존 해상도 보다 고해상도 모드로 변경하여 영상을 촬영하는 카메라;를 포함한다.In order to solve the above problems, a camera system according to the present invention generates learning data based on recognition of human behavior in a low-resolution image based on a high-resolution learning image, ; Resolution image, and when a crisis is detected through the first behavior recognition that recognizes human behavior in a low-resolution image using the human behavior recognition model transmitted from the data server, the image is changed to a high- And a camera for photographing.
또한, 상기 카메라 시스템에 있어서 상기 카메라는, 상기 고해상도 모드로 변경하여 촬영된 영상에 대해 상기 인간행동 인식 모델을 이용하여 영상 내 인간행동을 인식하는 제2 행동인식을 통해 위기상황인지 판단하고, 위기상황에 해당하는 경우 고해상도로 촬영된 영상을 저장하고, 위기상황이 아닌 경우 저해상도로 촬영된 영상을 저장하는 것을 특징으로 한다.In addition, in the camera system, the camera determines whether the image captured by changing to the high-resolution mode is a crisis state through a second behavior recognition that recognizes human behavior in the image using the human behavior recognition model, And stores the image photographed at a high resolution when the situation is not satisfied and the image photographed at a low resolution when the situation is not a crisis.
또한, 상기 카메라 시스템에 있어서 상기 카메라는, 저해상도 영상을 촬영하는 촬영부; 데이터 서버로부터 저해상도 영상 내의 인간행동을 인식하는데 사용되는 인간행동 인식 모델을 수신하여 영상분석부로 전송하는 통신부; 상기 통신부로부터 수신한 인간행동 인식 모델을 이용하여 저해상도 영상 내 인간행동을 인식하는 제1 행동인식을 통해 위기상황인지 판단하는 영상분석부; 상기 영상분석부에 의해 위기상황으로 판단되면 상기 촬영부가 기존 해상도 보다 고해상도 모드로 변경하여 촬영하도록 제어하는 제어부; 및 상기 촬영부에 의해 촬영된 저해상도 영상 또는 고해상도 영상을 저장하는 저장부;를 포함한다.In the camera system, the camera may further include: a photographing unit photographing a low-resolution image; A communication unit for receiving a human behavior recognition model used for recognizing human behavior in the low resolution image from the data server and transmitting the received human behavior recognition model to the image analysis unit; An image analyzing unit for determining whether the user is in a crisis state through a first behavior recognition process for recognizing a human behavior in a low resolution image using the human behavior recognition model received from the communication unit; A control unit for controlling the photographing unit to change to a higher resolution mode than the existing resolution and to photograph the photographing unit if a crisis state is determined by the image analysis unit; And a storage unit for storing a low-resolution image or a high-resolution image photographed by the photographing unit.
이 때, 상기 영상분석부는 상기 고해상도 모드로 변경하여 촬영된 영상에 대해 상기 인간행동 인식 모델을 이용하여 영상 내 인간행동을 인식하는 제2 행동인식을 통해 위기상황인지 판단할 수 있다.At this time, the image analyzing unit can determine whether the image is a crisis state through a second behavior recognition that recognizes human behavior in the image using the human behavior recognition model for the image changed to the high-resolution mode.
또한, 상기 카메라 시스템에 있어서 상기 데이터서버는, 고해상도의 인간행동 영상을 복수의 저해상도 영상으로 변환하는 해상도 변환부; 상기 해상도 변환부에 의해 생성된 복수의 저해상도 영상을 입력 받아 공간 스트림 및 시간 스트림으로 구분하고, 각각의 스트림에 대해 콘볼루션 및 풀링(Pooling)을 수행하고, 완전 연결 레이어(Fully connected layer)를 추가 적용하여 콘볼루션 신경망(Convolutional Neural Network)을 학습시키는 콘볼루션 신경망 학습부; 및 상기 콘볼루션 신경망 학습부에 의해 학습된 데이터를 기초로 인간행동 인식 모델을 생성하는 인간행동 인식 모델 생성부;를 포함한다.In the camera system, the data server may include: a resolution conversion unit that converts a high-resolution human behavior image into a plurality of low-resolution images; A plurality of low resolution images generated by the resolution conversion unit are received and divided into a spatial stream and a time stream, and convolution and pooling are performed for each stream, and a fully connected layer is added A convolution neural network learning unit for learning a convolutional neural network; And a human behavior recognition model generation unit for generating a human behavior recognition model based on the data learned by the convolution neural network learning unit.
한편, 본 발명에 따른 해상도 자동 조절 방법은, (a) 카메라의 촬영부가 저해상도로 영상을 촬영하는 단계; (b) 카메라의 통신부가 외부의 데이터 서버로부터 저해상도 영상 내의 인간행동을 인식하는데 사용되는 인간행동 인식 모델을 수신하여 카메라의 영상분석부로 전송하는 단계; (c) 상기 영상분석부가 상기 카메라의 통신부로부터 수신한 인간행동 인식 모델을 이용하여 저해상도 영상 내 인간행동을 인식하는 제1 행동인식을 통해 위기상황인지 판단하는 단계; (d) 상기 영상분석부에 의해 위기상황으로 판단되면, 카메라의 제어부는, 상기 촬영부가 기존 해상도 보다 고해상도 모드로 변경하여 촬영하도록 제어하는 단계; 및 (e) 카메라의 저장부가 상기 촬영부에 의해 촬영된 저해상도 영상 또는 고해상도 영상을 저장하는 단계;를 포함한다.According to another aspect of the present invention, there is provided an automatic resolution adjusting method comprising the steps of: (a) capturing an image of a camera at a low resolution; (b) receiving a human behavior recognition model used by the communication unit of the camera for recognizing human behavior in a low-resolution image from an external data server and transmitting the human behavior recognition model to an image analysis unit of the camera; (c) determining whether the image analyzing unit is in a crisis state through a first behavior recognition that recognizes human behavior in a low-resolution image using the human behavior recognition model received from the camera communication unit; (d) when the image analyzing unit determines that the image capturing unit is in a crisis state, the control unit of the camera controls the image capturing unit to change the image capturing unit to a higher resolution mode than the existing resolution image capturing unit; And (e) storing a low-resolution image or a high-resolution image captured by the photographing unit by a storage unit of the camera.
또한, 상기 해상도 자동 조절 방법은, 상기 (d) 단계와 (e) 단계 사이에, 상기 영상분석부가 상기 고해상도 모드로 변경하여 촬영된 영상에 대해 상기 인간행동 인식 모델을 이용하여 영상 내 인간행동을 인식하는 제2 행동인식을 통해 위기상황인지 판단하는 단계;를 더 포함할 수 있다.Also, the automatic resolution adjusting method may further include a step of, between the steps (d) and (e), performing a human action in the image using the human behavior recognition model And a second step of recognizing whether or not the user is in a crisis state by recognizing the second behavior.
한편, 본 발명의 또 다른 실시예에 따른 행동 자동 인식 방법은, (a) 데이터서버의 해상도 변환부가 고해상도의 인간행동 영상을 복수의 저해상도 영상으로 변환하는 단계; (b) 데이터서버의 콘볼루션 신경망 학습부가 상기 복수의 저해상도 영상을 입력 받아 공간 스트림 및 시간 스트림으로 구분하고, 각각의 스트림에 대해 콘볼루션 및 풀링(Pooling)을 수행하고, 완전 연결 레이어(Fully connected layer)를 추가 적용하여 콘볼루션 신경망(Convolutional Neural Network)을 학습시키는 단계; (c) 데이터서버의 인간행동 인식 모델 생성부가 상기 콘볼루션 신경망 학습부에 의해 학습된 데이터를 기초로 인간행동 인식 모델을 생성하는 단계; (d) 카메라의 촬영부가 저해상도 영상을 촬영하는 단계; 및 (e) 카메라의 영상분석부가 상기 데이터서버로부터 수신한 인간행동 인식 모델을 이용하여 상기 촬영부에 의해 촬영된 저해상도 영상 내 인간행동을 인식하는 단계;를 포함한다.According to another aspect of the present invention, there is provided an automatic behavior recognition method comprising: (a) converting a high-resolution human behavior image into a plurality of low-resolution images by a resolution conversion unit of a data server; (b) a convolutional neural network learning unit of the data server receives the plurality of low-resolution images and divides them into a spatial stream and a time stream, performs convolution and pooling for each stream, layer is further applied to learn a Convolutional Neural Network; (c) generating a human behavior recognition model based on data learned by the convolutional neural network learning unit; (d) photographing a low-resolution image of a photographing part of the camera; And (e) recognizing human behavior in the low-resolution image photographed by the photographing unit using the human behavior recognition model received from the data server by the image analysis unit of the camera.
또한, 본 발명의 또 다른 실시예에 따른 카메라 시스템은, 학습 영상을 이용하여 영상 내의 얼굴인식을 위한 학습을 수행하고 얼굴인식 모델을 생성하는 데이터서버; 및 상기 데이터 서버로부터 상기 얼굴인식 모델을 전송 받고, 고해상도 영상을 촬영하여 저해상도 변환을 수행하며, 상기 데이터서버로부터 전송 받은 얼굴인식 모델을 이용하여 변환된 영상 내의 얼굴을 인식하고, 얼굴인식 결과에 따라 검출된 얼굴영역과 얼굴영역 이외의 영역을 구분하여 서로 다른 해상도를 갖도록 재변환함으로써 얼굴영역에 대해 익명화 처리된 영상을 출력하는 카메라;를 포함한다.According to another aspect of the present invention, there is provided a camera system including: a data server that performs learning for face recognition in an image using a learning image and generates a face recognition model; And a controller for receiving the facial recognition model from the data server, capturing a high resolution image to perform low resolution conversion, recognizing a face in the converted image using the facial recognition model transmitted from the data server, And a camera for outputting the anonymized image to the face region by re-converting the detected face region and the region other than the face region so as to have different resolutions.
또한, 본 발명의 또 다른 실시예에 따른 카메라는, 외부의 데이터 서버로부터 영상 내의 얼굴인식을 위한 얼굴인식 모델을 수신하는 통신부; 고해상도 영상을 촬영하는 촬영부; 상기 촬영부에 의해 촬영된 고해상도 영상을 입력 받고, 프로세서로부터 얼굴영역의 위치정보와 목표 해상도 값을 입력 받아 고해상도 영상을 변환하는 저해상도 변환 모듈; 상기 통신부로부터 전송 받은 얼굴인식 모델을 이용하여 상기 저해상도 변환 모듈에 의해 변환된 영상 내의 얼굴을 인식하고, 얼굴인식 결과에 따라 얼굴영역의 위치와 목표 해상도 값을 갱신하는 프로세서; 및 상기 저해상도 변환 모듈에 의해 변환된 영상을 출력하는 출력부;를 포함한다.According to another aspect of the present invention, there is provided a camera including: a communication unit for receiving a face recognition model for face recognition in an image from an external data server; A photographing unit for photographing a high-resolution image; A low-resolution conversion module that receives the high-resolution image photographed by the photographing unit, receives the position information of the face region and the target resolution value from the processor, and converts the high-resolution image; A processor for recognizing a face in the image converted by the low resolution conversion module using the face recognition model transmitted from the communication unit and updating the position of the face region and the target resolution value according to the face recognition result; And an output unit for outputting the image converted by the low resolution conversion module.
이 때, 상기 프로세서는 얼굴인식을 통해 상기 저해상도 변환 모듈에 의해 변환된 영상 내에서 얼굴이 검출된 경우, 검출된 얼굴영역을 기록하여 그 위치정보를 저장하고, 상기 저장된 얼굴영역을 제외한 나머지 영역에 대한 목표 해상도 값을 증가시키며, 상기 저해상도 변환 모듈이 상기 저장된 얼굴영역을 제외한 나머지 영역을 증가된 목표 해상도에 맞추어 다시 변환하도록 제어하는 것을 특징으로 할 수 있다.In this case, when the face is detected in the image converted by the low resolution conversion module through face recognition, the processor records the detected face region and stores the position information, and the remaining region excluding the stored face region And the low resolution conversion module controls to convert the remaining area excluding the stored face area to the target resolution to be converted again.
한편, 본 발명의 또 다른 실시예에 따른 카메라는, 외부의 데이터 서버로부터 영상 내의 얼굴인식을 위한 얼굴인식 모델을 수신하는 통신부; 고해상도 영상을 촬영하는 촬영부; 상기 촬영부에 의해 촬영된 고해상도 영상을 입력 받고, 프로세서로부터 목표 해상도 값을 입력 받아 고해상도 영상을 변환하는 저해상도 변환 모듈; 상기 통신부로부터 전송 받은 얼굴인식 모델을 이용하여 상기 저해상도 변환 모듈에 의해 변환된 영상 내의 얼굴을 인식하고, 얼굴인식 결과에 따라 목표 해상도 값을 갱신하는 프로세서; 및 상기 저해상도 변환 모듈에 의해 변환된 영상을 출력하는 출력부;를 포함한다.According to another aspect of the present invention, there is provided a camera including: a communication unit for receiving a face recognition model for face recognition in an image from an external data server; A photographing unit for photographing a high-resolution image; A low-resolution conversion module that receives a high-resolution image captured by the imaging unit, receives a target resolution value from the processor, and converts the high-resolution image; A processor for recognizing a face in the image converted by the low resolution conversion module using the face recognition model transmitted from the communication unit and updating the target resolution value according to the face recognition result; And an output unit for outputting the image converted by the low resolution conversion module.
또한, 상기 카메라에 있어서 상기 프로세서는, 상기 저해상도 변환 모듈에 의해 변환된 영상 내에서 얼굴이 검출된 경우, 검출 시점에 변환된 저해상도 영상을 출력하도록 제어하는 것을 특징으로 할 수 있다.In the camera, the processor may control the low-resolution conversion module to output the low-resolution image converted at the detection time when a face is detected in the image converted by the low-resolution conversion module.
한편, 본 발명의 또 다른 실시예에 따른 영상 익명화 방법은, (a) 카메라의 촬영부가 고해상도 영상을 촬영하는 단계; (b) 카메라의 통신부가 외부의 데이터 서버로부터 영상 내의 얼굴인식을 위한 얼굴인식 모델을 수신하는 단계; (c) 카메라의 저해상도 변환 모듈이 상기 촬영부에 의해 촬영된 고해상도 영상을 입력 받고, 프로세서로부터 얼굴영역의 위치와 목표 해상도 값을 입력 받아 고해상도 영상을 변환하는 단계; (d) 카메라의 프로세서가 상기 통신부로부터 전송 받은 얼굴인식 모델을 이용하여 상기 저해상도 변환 모듈에 의해 변환된 영상 내의 얼굴을 인식하고, 얼굴인식 결과에 따라 얼굴영역의 위치와 목표 해상도 값을 갱신하는 단계; 및 (e) 카메라의 출력부가 상기 저해상도 변환 모듈에 의해 변환된 영상을 출력하는 단계를 포함한다.According to another aspect of the present invention, there is provided an image anonymizing method, comprising the steps of: (a) capturing a high resolution image of a photographing unit of a camera; (b) a communication unit of the camera receives a face recognition model for face recognition in an image from an external data server; (c) receiving a high-resolution image photographed by the photographing unit by the low resolution conversion module of the camera, receiving the position of the face region and the target resolution value from the processor, and converting the high resolution image; (d) recognizing a face in the image converted by the low resolution conversion module using the face recognition model received from the communication unit of the camera, and updating the position of the face region and the target resolution value according to the face recognition result ; And (e) the output of the camera outputting the image converted by the low resolution conversion module.
한편, 본 발명의 또 다른 실시예에 따른 영상 익명화 방법은, (a) 카메라의 촬영부가 고해상도 영상을 촬영하는 단계; (b) 카메라의 통신부가 외부의 데이터 서버로부터 영상 내의 얼굴인식을 위한 얼굴인식 모델을 수신하는 단계; (c) 카메라의 저해상도 변환 모듈이 상기 촬영부에 의해 촬영된 고해상도 영상을 입력 받고, 프로세서로부터 목표 해상도 값을 입력 받아 고해상도 영상을 변환하는 단계; (d) 카메라의 프로세서가 상기 통신부로부터 전송 받은 얼굴인식 모델을 이용하여 상기 저해상도 변환 모듈에 의해 변환된 영상 내의 얼굴을 인식하고, 얼굴인식 결과에 따라 목표 해상도 값을 갱신하는 단계; 및 (e) 카메라의 출력부가 상기 저해상도 변환 모듈에 의해 변환된 영상을 출력하는 단계를 포함한다.According to another aspect of the present invention, there is provided an image anonymizing method, comprising the steps of: (a) capturing a high resolution image of a photographing unit of a camera; (b) a communication unit of the camera receives a face recognition model for face recognition in an image from an external data server; (c) receiving a high resolution image photographed by the photographing unit by the low resolution conversion module of the camera, receiving the target resolution value from the processor, and converting the high resolution image; (d) recognizing a face in the image converted by the low resolution conversion module using the face recognition model received from the communication unit of the camera processor, and updating the target resolution value according to the face recognition result; And (e) the output of the camera outputting the image converted by the low resolution conversion module.
본 발명에 의하면 평상시에는 저해상도 영상을 촬영하다가 저해상도 영상에 대한 행동 인식 기술을 통해 위기 상황이라고 판단된 경우 고해상도 영상을 촬영하도록 하여 피사체에 대한 프라이버시 보호를 도모할 수 있다.According to the present invention, when a low-resolution image is normally photographed and a behavior recognition technology for a low-resolution image is determined to be a crisis, a high-resolution image is photographed, thereby protecting privacy of a subject.
또한, 본 발명에 의하면 저해상도 영상 행동 인식 기술을 통해 상황의 중요도에 따라 카메라의 촬영 해상도를 자동으로 조절하여 카메라 배터리 소모를 줄이고 저장 공간의 효율적 활용이 가능하게 한다.In addition, according to the present invention, it is possible to reduce the camera battery consumption and efficiently utilize the storage space by automatically adjusting the shooting resolution of the camera according to the importance of the situation through the low-resolution image behavior recognition technology.
또한, 본 발명에 의하면 저해상도 영상 변환의 특성을 고려하여 임베딩 공간(Embedding space)에 고해상도 영상으로부터 변환된 복수의 저해상도 영상을 맵핑함으로써 콘볼루션 신경망(Convolutional neural network)을 학습시키고 촬영된 저해상도 영상 내의 인간 행동 인식률을 높일 수 있다.According to the present invention, a plurality of low-resolution images transformed from a high-resolution image are mapped into an embedding space in consideration of the characteristics of low-resolution image conversion, thereby learning a convolution neural network, The behavior recognition rate can be increased.
또한, 본 발명에 의하면 영상을 촬영하는 과정에서 피사체에 대한 익명화 처리를 자동으로 수행하여 피사체의 사생활을 보호할 수 있다.In addition, according to the present invention, it is possible to automatically protect the privacy of a subject by automatically performing an anonymization process on the subject in the process of shooting the image.
또한, 본 발명에 의하면 영상 내 얼굴영역에 대한 익명화 처리 시 CPU/GPU의 메모리 상에 저장하지 않은 채로 실시간 처리함으로써 해킹으로부터 보호할 수 있다.In addition, according to the present invention, in an anonymizing process for a face area in an image, it can be protected from hacking by real-time processing without being stored on the memory of the CPU / GPU.
또한, 본 발명에 의하면 고해상도 영상에 대해 얼굴영역과 얼굴영역 외의 영역을 구분하여 얼굴영역에 대해서만 초저해상도 처리를 함으로써 사생활을 보호할 수 있다.In addition, according to the present invention, it is possible to protect the privacy of the high-resolution image by distinguishing the face area from the area other than the face area and performing the ultra-low resolution processing only on the face area.
도 1은 본 발명의 제1 실시예에 따른 카메라 시스템의 구성을 나타낸 도면이다.1 is a block diagram of a camera system according to a first embodiment of the present invention.
도 2는 본 발명의 제1 실시예에 따른 카메라 시스템의 작동 개념을 설명하는 도면이다.2 is a view for explaining the operational concept of the camera system according to the first embodiment of the present invention.
도 3은 본 발명의 제1 실시예에 따른 카메라의 구성을 나타낸 도면이다.FIG. 3 is a diagram illustrating a configuration of a camera according to a first embodiment of the present invention.
도 4는 본 발명의 제1 실시예에 따른 카메라가 동작하는 흐름을 설명하는 순서도이다.4 is a flowchart illustrating a flow of operation of the camera according to the first embodiment of the present invention.
도 5는 본 발명의 제1 실시예에 따른 카메라 시스템의 구성 중 데이터서버의 세부 구성을 나타낸 도면이다.5 is a detailed block diagram of a data server in a camera system according to a first embodiment of the present invention.
도 6은 고해상도 이미지가 저해상도 이미지로 변환될 때 동일한 고해상도 이미지에서부터 비롯된 저해상도 이미지라도 다른 값을 가질 수 있음을 설명하기 위한 도면이다.FIG. 6 is a view for explaining that even when a high-resolution image is converted to a low-resolution image, a low-resolution image derived from the same high-resolution image may have a different value.
도 7은 본 발명의 제1 실시예에 따른 콘볼루션 신경망 학습부가 입력 영상을 공간적 스트림과 시간적 스트림으로 구분하는 것을 구조적으로 보여주는 도면이다.FIG. 7 is a view showing a structured manner in which the convolutional neural network learning unit according to the first embodiment of the present invention divides an input image into a spatial stream and a temporal stream.
도 8은 본 발명의 제1 실시예에 따른 콘볼루션 신경망 학습에 적용되는 2-스트림 콘볼루션 신경망(two-stream convolutional neural network)의 구조를 나타낸 도면이다.8 is a diagram showing the structure of a two-stream convolutional neural network applied to convolutional neural network learning according to the first embodiment of the present invention.
도 9는 본 발명의 제1 실시예에 따른 콘볼루션 신경망 학습부가 임베딩 공간(Embedding space)을 이용하여 영상 내 행동 인식을 위한 학습을 수행하는 것을 설명하는 도면이다.9 is a diagram for explaining how the convolutional neural network learning unit according to the first embodiment of the present invention performs learning for intra-image behavior recognition using an embedding space.
도 10은 본 발명의 제1 실시예에 따른 콘볼루션 신경망 학습에 적용되는 멀티-시아미즈 콘볼루션 신경망(multi-siamese convolutional neural network)의 구조를 나타낸 도면이다.10 is a diagram illustrating a structure of a multi-siamese convolutional neural network applied to convolutional neural network learning according to the first embodiment of the present invention.
도 11은 본 발명의 제1 실시예에 따른 카메라 시스템이 영상 내 행동을 자동 인식하는 방법 중 콘볼루션 신경망을 학습시키는 단계를 나타낸 순서도이다.11 is a flowchart illustrating a process of learning a convolutional neural network among methods of automatically recognizing behaviors in a video camera system according to the first embodiment of the present invention.
도 12는 본 발명의 제1 실시예에 따른 카메라 시스템의 영상 내 행동 자동 인식 방법을 적용하였을 경우와 종래기술 간 행동 인식의 정확도를 비교하여 나타낸 도면이다.FIG. 12 is a diagram showing a comparison of accuracy of behavior recognition between a case in which a method of automatically detecting an intra-video motion in a camera system according to the first embodiment of the present invention is applied and a conventional technique.
도 13은 본 발명의 제2 실시예에 따른 카메라 시스템의 구성을 보여주는 블록도이다.13 is a block diagram illustrating the configuration of a camera system according to a second embodiment of the present invention.
도 14는 본 발명의 제2 실시예에 따른 카메라 시스템의 데이터 서버가 얼굴인식 모델을 생성하는데 사용되는 컨볼루셔널 신경망(Convolutional neural network, CNN)의 구조를 나타내는 도면이다.FIG. 14 is a diagram showing the structure of a convolutional neural network (CNN) used by a data server of a camera system according to a second embodiment of the present invention to generate a face recognition model.
도 15는 본 발명의 제2 실시예에 따른 영상 익명화 방법에 의해 처리된 영상의 예시를 보여주는 그림이다.15 is a diagram illustrating an example of a video image processed by the video anonymization method according to the second embodiment of the present invention.
도 16은 본 발명의 제2 실시예에 따른 카메라의 구성을 나타내는 블록도이다.16 is a block diagram showing a configuration of a camera according to a second embodiment of the present invention.
도 17은 본 발명의 제2 실시예에 따른 카메라가 촬영한 영상을 실시간으로 익명화 처리하는 것을 간단하게 나타낸 도면이다.FIG. 17 is a view simply showing an anonymizing process of an image photographed by a camera according to the second embodiment of the present invention in real time.
도 18은 본 발명의 제2 실시예에 따른 영상 익명화 방법이 수행되는 과정을 설명하는 순서도이다.18 is a flowchart illustrating a process of performing an image anonymization method according to the second embodiment of the present invention.
도 19는 본 발명의 제3 실시예에 따른 영상 익명화 방법이 수행되는 과정을 설명하는 순서도이다.FIG. 19 is a flowchart illustrating a process of performing an image anonymizing method according to the third embodiment of the present invention.
본 발명의 목적과 기술적 구성 및 그에 따른 작용 효과에 관한 자세한 사항은 본 발명의 명세서에 첨부된 도면에 의거한 이하의 상세한 설명에 의해 보다 명확하게 이해될 것이다. 첨부된 도면을 참조하여 본 발명에 따른 실시예를 상세하게 설명한다.DETAILED DESCRIPTION OF THE EMBODIMENTS Reference will now be made in detail to the preferred embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the like elements throughout. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment according to the present invention will be described in detail with reference to the accompanying drawings.
본 명세서에서 개시되는 실시예들은 본 발명의 범위를 한정하는 것으로 해석되거나 이용되지 않아야 할 것이다. 이 분야의 통상의 기술자에게 본 명세서의 실시예를 포함한 설명은 다양한 응용을 갖는다는 것이 당연하다. 따라서, 본 발명의 상세한 설명에 기재된 임의의 실시예들은 본 발명을 보다 잘 설명하기 위한 예시적인 것이며 본 발명의 범위가 실시예들로 한정되는 것을 의도하지 않는다.The embodiments disclosed herein should not be construed or interpreted as limiting the scope of the present invention. It will be apparent to those of ordinary skill in the art that the description including the embodiments of the present specification has various applications. Accordingly, any embodiment described in the Detailed Description of the Invention is illustrative for a better understanding of the invention and is not intended to limit the scope of the invention to embodiments.
도면에 표시되고 아래에 설명되는 기능 블록들은 가능한 구현의 예들일 뿐이다. 다른 구현들에서는 상세한 설명의 사상 및 범위를 벗어나지 않는 범위에서 다른 기능 블록들이 사용될 수 있다. 또한, 본 발명의 하나 이상의 기능 블록이 개별 블록들로 표시되지만, 본 발명의 기능 블록들 중 하나 이상은 동일 기능을 실행하는 다양한 하드웨어 및 소프트웨어 구성들의 조합일 수 있다.The functional blocks shown in the drawings and described below are merely examples of possible implementations. In other implementations, other functional blocks may be used without departing from the spirit and scope of the following detailed description. Also, although one or more functional blocks of the present invention are represented as discrete blocks, one or more of the functional blocks of the present invention may be a combination of various hardware and software configurations that perform the same function.
또한, 어떤 구성요소들을 포함한다는 표현은 개방형의 표현으로서 해당 구성요소들이 존재하는 것을 단순히 지칭할 뿐이며, 추가적인 구성요소들을 배제하는 것으로 이해되어서는 안 된다.In addition, the expression "including any element" is merely an expression of an open-ended expression, and is not to be construed as excluding the additional elements.
나아가 어떤 구성요소가 다른 구성요소에 연결되어 있다거나 접속되어 있다고 언급될 때에는, 그 다른 구성요소에 직접적으로 연결 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 한다.Further, when a component is referred to as being connected or connected to another component, it may be directly connected or connected to the other component, but it should be understood that there may be other components in between.
또한 '제1, 제2' 등과 같은 표현은 복수의 구성들을 구분하기 위한 용도로만 사용된 표현으로써, 구성들 사이의 순서나 기타 특징들을 한정하지 않는다.Also, the expressions such as 'first, second', etc. are used only to distinguish a plurality of configurations, and do not limit the order or other features between configurations.
이하서는 도면들을 참조하여 본 발명에 따른 다양한 실시예들을 살펴본다. 참고로 도 1 내지 도 12는 제1 실시예에 대한 것으로, 행동인식 기반으로 해상도를 자동 조절하는 카메라 시스템에 관한 것이며, 도 13 내지 도 19는 제2 실시예에 관한 것으로, 영상 내 얼굴 영역을 인식하고 해당 부분을 익명화 처리하는 카메라 시스템에 관한 것이다.Hereinafter, various embodiments according to the present invention will be described with reference to the drawings. 1 to 12 relate to a camera system for automatically adjusting a resolution on the basis of a behavior recognition, and FIGS. 13 to 19 relate to a second embodiment, And a camera system for recognizing and anonymizing a corresponding part.
<제1 실시예>&Lt; Embodiment 1 >
도 1은 본 발명의 일 실시예에 따른 카메라 시스템의 구성을 나타낸 도면이고, 도 2는 본 발명의 일 실시예에 따른 카메라 시스템의 작동 개념을 설명하는 도면이다.FIG. 1 is a view illustrating a configuration of a camera system according to an embodiment of the present invention, and FIG. 2 is a diagram illustrating an operation concept of a camera system according to an embodiment of the present invention.
도 1을 참조하면, 본 발명의 카메라 시스템(1)은 행동 인식기반 해상도 자동 조절 카메라(10)와 데이터 서버(20)로 구성된다. 행동 인식기반 해상도 자동 조절 카메라(10)는 피사체를 촬영하고, 데이터 서버(20)는 고해상도 영상을 수신하여 인간 행동 인식을 위한 학습을 수행하며, 학습한 데이터를 기초로 저해상도 영상 내의 인간 행동을 인식하기 위한 인간행동 인식 모델을 생성한다. 본 발명의 행동인식 기반 해상도 조절 카메라(10)는 다양한 카메라 기기를 포함할 수 있다. 예를 들면, 스마트홈 카메라, 로봇 카메라, 방범용 CCTV, 웨어러블 카메라, 차량용 블랙박스 카메라 등일 수 있다.Referring to FIG. 1, the camera system 1 of the present invention includes a behavior recognition-based resolution automatic adjustment camera 10 and a data server 20. The behavior recognition-based resolution automatic adjustment camera 10 photographs a subject, the data server 20 receives a high-resolution image, performs learning for human behavior recognition, and recognizes human behavior in a low-resolution image based on the learned data To generate a human behavior recognition model. The motion recognition based resolution control camera 10 of the present invention may include various camera devices. For example, it can be a smart home camera, a robot camera, a security CCTV, a wearable camera, a car black box camera, and the like.
데이터서버(20)는 데이터서버(20)는 기 저장되어 있는 고해상도 영상 또는 온라인 상의 고해상도 영상으로부터 저해상도 영상 내 인간행동을 인식하기 위한 학습을 수행할 수 있다. 데이터서버(20)는 콘볼루션 신경망(Convolutional neural network)를 학습시켜 학습 데이터를 생성할 수 있으며, 생성한 학습 데이터를 기초로 인간행동 인식 모델을 생성할 수 있다. 데이터서버(20)의 상세 구성 및 영상 내의 인간행동을 인식하는 기술에 대한 상세한 설명은 이하 도 5 내지 도 10을 통해 설명한다.The data server 20 can perform learning for recognizing human behavior in a low-resolution image from a high-resolution image stored on-line or a high-resolution image on-line. The data server 20 can generate learning data by learning a convolutional neural network, and can generate a human behavior recognition model based on the generated learning data. Detailed description of the detailed configuration of the data server 20 and a technique for recognizing human behavior in an image will be described below with reference to FIGS. 5 to 10. FIG.
도 2를 참조하면, 카메라(10)(이하 '카메라')는 평상시에는 저해상도 동영상을 촬영하여 피사체에 대한 사생활 침해를 방지한다. 본 발명의 카메라(10)는 촬영된 영상에 대해 행동 인식 기술을 적용하여 저해상도 영상(예를 들면, 16x12 픽셀) 내의 인간 행동을 인식한다. 카메라(10)는 행동인식 결과에 따라 최적의 카메라 해상도를 선택하여 고해상도의 영상(예를 들면, 4K 픽셀)을 촬영하고, 촬영된 영상을 메모리에 저장하거나 데이터 서버(20)로 전송할 수 있다. 따라서 본 발명의 카메라(10)는 종래 카메라 등이 평상시에도 고해상도 영상을 연속적으로 촬영함으로써 피사체의 사생활을 침해할 수 있는 문제를 해소할 수 있다. 또한, 본 발명은 카메라(10)의 배터리 소모를 줄이고, 저장 공간을 효율적으로 사용할 수 있게 한다. 본 발명의 카메라(10)의 구체적인 동작에 대해서는 이하 도 3 내지 도 4를 통해 설명하도록 한다.Referring to FIG. 2, a camera 10 (hereinafter, referred to as a 'camera') photographs a low-resolution moving image in a usual manner to prevent privacy invasion of a subject. The camera 10 of the present invention recognizes human behavior in a low-resolution image (for example, 16x12 pixels) by applying a behavior recognition technique to the photographed image. The camera 10 can pick up an image of a high resolution (for example, 4K pixels) by selecting an optimal camera resolution according to a result of behavior recognition, and can store the captured image in a memory or transmit it to the data server 20. Therefore, the camera 10 of the present invention can solve the problem that the privacy of the subject can be violated by continuously photographing a high-resolution image even in a conventional camera or the like. In addition, the present invention reduces battery consumption of the camera 10 and enables efficient use of the storage space. The specific operation of the camera 10 of the present invention will be described with reference to FIGS. 3 to 4. FIG.
도 3은 본 발명의 제1 실시예에 따른 카메라의 구성을 나타낸 도면이다.FIG. 3 is a diagram illustrating a configuration of a camera according to a first embodiment of the present invention.
도 3을 참조하면, 카메라(10)는 촬영부(100), 통신부(200), 영상분석부(300), 저장부(400) 및 제어부(500)를 포함할 수 있다.3, the camera 10 may include a photographing unit 100, a communication unit 200, an image analysis unit 300, a storage unit 400, and a control unit 500.
촬영부(100)는 평상시에 피사체를 저해상도로 촬영하다가 제어부(500)로부터 고해상도 영상 촬영 모드로 변경하는 제어 신호를 수신하는 경우 피사체를 고해상도로 촬영할 수 있다.When the control unit 500 receives a control signal for changing from the control unit 500 to the high-resolution image shooting mode while shooting a subject at a low resolution, the shooting unit 100 can shoot a subject at a high resolution.
통신부(200)는 데이터서버(20)로부터 저해상도 영상 내의 인간행동을 인식하는데 사용되는 인간행동 인식 모델을 수신하여 영상분석부(300)로 전송하는 역할을 한다. 데이터서버(20)는 기 저장되어 있는 고해상도 영상 또는 온라인 상의 고해상도 영상으로부터 저해상도 영상 내 인간행동을 인식하기 위한 학습을 수행할 수 있다. 고해상도 영상은 촬영부(100)에 의해 촬영된 저해상도 영상 내 인간 행동 인식을 수행하기 위해 콘볼루션 신경망(Convolutional Neural Network)을 학습시키는데 사용된다. 학습 영상으로서 고해상도 영상은 공개적으로 사용 가능한 소스일 수 있다. 공개적으로 사용 가능한 소스는 예를 들면, 유투브(YouTube)와 같은 오픈 소스일 수 있다. 그 외에도 고해상도 영상은 온라인 상에서 공개적으로 영상(Video)를 제공하는 소스로부터 획득될 수 있을 것이다. 통신부(200)는 저장부(400)에 저장된 영상을 데이터서버(20)로 전송할 수도 있다.The communication unit 200 receives the human behavior recognition model used for recognizing human behavior in the low resolution image from the data server 20 and transmits the received human behavior recognition model to the image analysis unit 300. The data server 20 can perform learning to recognize human behavior in a low-resolution image from a high-resolution image stored on-line or a high-resolution image on-line. The high-resolution image is used to learn a Convolutional Neural Network to perform human behavior recognition in a low-resolution image photographed by the photographing unit 100. [ A high-resolution image as a learning image may be a publicly available source. A publicly available source can be, for example, an open source such as YouTube. In addition, high-resolution images can be obtained from sources that publicly provide video online. The communication unit 200 may transmit the image stored in the storage unit 400 to the data server 20.
통신부(200)는 무선 통신모듈 및 유선 통신모듈 중 적어도 하나를 포함할 수 있다. 그리고 무선 통신모듈은 무선망 통신모듈, 무선랜(WLAN, Wireless Local Area Network 또는 Wi-Fi, Wireless Fidelity 또는 WiMAX, Worldwide Interoperability for Microwave Access) 통신모듈 및 무선팬(WPAN, Wireless Personal Area Network) 통신모듈 중 적어도 하나를 포함할 수 있다.The communication unit 200 may include at least one of a wireless communication module and a wired communication module. The wireless communication module may include a wireless network communication module, a wireless local area network (WLAN), a wireless communication module such as a Wi-Fi, a wireless fidelity or WiMAX, and a wireless personal area network (WPAN) Or the like.
영상분석부(300)는 통신부(200)로부터 인간 행동 인식 모델을 수신하여 촬영부(100)에 의해 촬영된 저해상도 영상 내의 인간행동을 인식할 수 있다. 영상분석부(300)는 인간행동 인식모델을 이용하여 촬영부(100)에 의해 촬영된 저해상도 영상 내의 인간행동을 인식하고, 인식한 인간행동이 위기 상황에 해당하는지를 판단할 수 있다(제1 행동인식).The image analysis unit 300 can receive the human behavior recognition model from the communication unit 200 and recognize the human behavior in the low resolution image photographed by the photographing unit 100. [ The image analyzing unit 300 can recognize the human behavior in the low-resolution image photographed by the photographing unit 100 using the human behavior recognition model, and judge whether or not the recognized human behavior corresponds to the crisis situation recognition).
저장부(400)는 촬영부(100)에 의해 촬영된 저해상도 영상 또는 고해상도 영상을 저장할 수 있다. 저장부(400)는 촬영부(100)가 촬영한 영상 또는 영상분석부(300)가 위험한 상황이라고 인식하거나 분석할 이미지(흉기류, 알몸이미지, 마약, 피 등) 및 다수의 인간행동 영상들을 저장할 수 있으며, 실시간 스트리밍 방식으로 영상을 제공하기 위한 플래시 메모리 형태일 수 있다.The storage unit 400 may store a low-resolution image or a high-resolution image photographed by the photographing unit 100. The storage unit 400 stores images (weapons, naked images, drugs, blood, etc.) to be recognized or analyzed by the image or image analysis unit 300 taken by the image sensing unit 100 as a dangerous situation, And may be in the form of a flash memory for providing images in a real-time streaming manner.
이를 위해 저장부(400)는 주 기억 장치 및 보조 기억 장치를 포함하고, 카메라(10)의 기능 동작에 필요한 응용 프로그램을 저장한다. 저장부(400)는 크게 프로그램 영역과 데이터 영역을 포함할 수 있다. 프로그램 영역은 카메라(10)를 부팅시키는 운영체제 등을 저장하는 영역이고, 데이터 영역은 촬영부(100)에 의해 촬영된 영상이 저장되는 영역이다.To this end, the storage unit 400 includes a main storage unit and an auxiliary storage unit, and stores an application program necessary for the functional operation of the camera 10. [ The storage unit 400 may include a program area and a data area. The program area is an area for storing an operating system or the like for booting the camera 10, and the data area is an area for storing an image photographed by the photographing part 100.
제어부(500)는 촬영부(100), 통신부(200), 영상분석부(300) 및 저장부(400)를 제어할 수 있다. 제어부(500)는 영상분석부(300)에 의해 위기상황으로 판단되면 촬영부(100)가 기존 해상도 보다 고해상도 모드로 변경하여 촬영하도록 제어할 수 있다. 이는 촬영되는 피사체의 사생활을 최대한 보호하고 카메라(10)의 배터리 소모 최소화 및 저장공간 활용을 최대화 하기 위한 것이다. 즉, 평상시 촬영부(100)는 저해상도 영상(ex. 16x12 픽셀)으로 촬영하다가, 제어 신호를 수신하면 고해상도 영상(ex. 720x480, 1280x720, 1920x1080, 4K 등)으로 촬영할 수 있다. 제어부(500)는 배터리 소모를 최소화 하기 위해 촬영부(100)가 기 설정된 주기로 동작하도록 제어할 수도 있다. The control unit 500 may control the photographing unit 100, the communication unit 200, the image analysis unit 300, and the storage unit 400. The control unit 500 can control the photographing unit 100 to change the mode to a higher resolution mode than the existing resolution and take a picture if the crisis is determined by the image analysis unit 300. [ This is to maximize the privacy of the photographed subject and to minimize the battery consumption of the camera 10 and the utilization of the storage space. That is, the normal photographing unit 100 photographs with a low resolution image (ex. 16x12 pixels), and when receiving the control signal, it can photograph with a high resolution image (ex. 720x480, 1280x720, 1920x1080, 4K, etc.). The control unit 500 may control the photographing unit 100 to operate at a predetermined cycle to minimize battery consumption.
도 4는 본 발명의 제1 실시예에 따라 카메라가 동작하는 흐름을 설명하는 순서도이다.4 is a flowchart for explaining the flow of operation of the camera according to the first embodiment of the present invention.
도 4를 참조하면, 촬영부(100)는 평상시 저해상도 영상을 촬영하고 촬영된 영상을 영상분석부(300)로 전송한다(S210). 영상분석부(300)는 통신부(200)로부터 수신한 인간행동 인식 모델을 이용하여 저해상도 영상 내의 인간행동을 인식하는 제1 행동인식을 통해 위기상황인지 판단한다(S220). 제어부(500)는 저해상도 영상 내의 인간행동이 영상분석부(300)에 의해 위기상황에 해당하지 않는 것으로 판단된 경우 저해상도 영상이 저장부(400)에 저장되도록 제어 신호를 전송한다.Referring to FIG. 4, the photographing unit 100 photographs a normal low-resolution image and transmits the photographed image to the image analysis unit 300 (S210). The image analyzing unit 300 determines whether the user is in a crisis state by recognizing a first action in the low resolution image using the human behavior recognition model received from the communication unit 200 at step S220. The control unit 500 transmits a control signal so that the low resolution image is stored in the storage unit 400 when the image analysis unit 300 determines that the human action in the low resolution image does not correspond to the crisis situation.
제어부(500)는 저해상도 영상 내의 인간행동이 영상분석부(300)에 의해 위기상황으로 판단된 경우에는 촬영부(100)가 피사체를 고해상도 영상으로 촬영하도록 제어 신호를 전송한다(S240). 영상분석부(300)는 촬영부(100)의 의해 촬영된 고해상도 영상에 대해 인간행동 인식 모델을 이용하여 영상 내 인간행동을 인식하는 제2 행동인식을 통해 위기상황인지 판단한다(S250). 제어부(500)는 영상분석부(300)에 의해 고해상도 영상 내 인간 행동이 위기상황으로 판단된 경우 고해상도 촬영된 영상이 저장부(400)에 저장되도록 제어 신호를 전송하고, 위기상황이 아닌 것으로 판단된 경우에는 저해상도로 촬영된 영상이 저장부(400)에 저장되도록 제어 신호를 전송한다(S260).If the human behavior in the low-resolution image is determined to be a crisis by the image analysis unit 300, the control unit 500 transmits a control signal to the imaging unit 100 to capture the subject as a high-resolution image (S240). The image analyzing unit 300 determines whether a high-resolution image photographed by the photographing unit 100 is in a crisis state through a second behavior recognition that recognizes human behavior in the image using the human behavior recognition model at step S250. When the image analysis unit 300 determines that the human action in the high-resolution image is a crisis state, the control unit 500 transmits a control signal to store the high-resolution image in the storage unit 400, A control signal is transmitted so that an image photographed at a low resolution is stored in the storage unit 400 (S260).
이와 같이 본 발명의 카메라(10)는 촬영된 영상에 대해 두 번의 인간 행동 인식을 수행하여 인간 행동 인식이 바르게 되었는지 확인한 후 최종적으로 촬영된 영상을 저장할 수 있다. 따라서 행동 인식의 오류를 줄일 수 있고, 피사체의 사생활 보호 및 저장 공간의 적절한 활용을 도모할 수 있다.As described above, the camera 10 of the present invention can recognize the human behavior by performing two human behavior recognizations on the photographed image, and finally store the photographed image. Therefore, it is possible to reduce the error of the behavior recognition, to protect the privacy of the object and to appropriately use the storage space.
도 5는 본 발명의 제1 실시예에 따른 카메라 시스템의 구성 중 데이터서버의 세부 구성을 나타낸 도면이다.5 is a detailed block diagram of a data server in a camera system according to a first embodiment of the present invention.
도 5를 참조하면, 데이터서버(20)는 해상도 변환부(21), 콘볼루션 신경망 학습부(22), 인간행동 인식 모델 생성부(23)를 포함할 수 있다.5, the data server 20 may include a resolution conversion unit 21, a convolutional neural network learning unit 22, and a human behavior recognition model generation unit 23.
해상도 변환부(21)는 통신부(200)로부터 수신한 고해상도의 인간행동 영상을 복수의 저해상도 영상으로 변환할 수 있다. 복수의 저해상도 영상은 Inverse Super Resolution(ISR)이라는 기술을 사용하여 얻어질 수 있다. Inverse Super Resolution(ISR)의 주된 동기는, 고해상도 영상 하나가 저해상도 영상 세트에 상응하는 양의 정보를 담고 있으며, 행동 인식 방법은 단일 고해상도 영상에 다른 저해상도 영상 변환을 적용함으로써 수행될 수 있다는 것이다. (M. S. Ryoo, B. Rothrock, C. Fleming, and H. J. Yang. Privacy-preserving human activity recognition from extreme low resolution. In AAAI, 2017.)The resolution conversion unit 21 can convert the high-resolution human behavior image received from the communication unit 200 into a plurality of low-resolution images. Multiple low-resolution images can be obtained using a technique called Inverse Super Resolution (ISR). The main motivation of Inverse Super Resolution (ISR) is that a high resolution image contains a quantity of information corresponding to a low resolution image set, and a behavior recognition method can be performed by applying another low resolution image transformation to a single high resolution image. (M. S. Ryoo, B. Rothrock, C. Fleming, and H. J. Yang. Privacy-preserving human activity recognition from extreme low resolution. In AAAI, 2017.)
해상도 변환부(21)는 아래 [수학식1]과 같이 변환 세트 Fk 및 Dk를 적용함으로써 각각의 고해상도 학습(training) 영상 Xi에 대해 n 개의 저해상도 영상(즉, Vik)을 생성한다. Resolution conversion section 21 generates n low-resolution images (i.e., V ik ) for each high-resolution training image X i by applying transform sets F k and D k as shown in the following equation (1) .
Figure PCTKR2018008196-appb-M000001
Figure PCTKR2018008196-appb-M000001
여기에서 Fk는 카메라 모션 변환이고 Dk는 다운 샘플링 연산자이다. Fk는 임의의 아핀(affine) 변환이 될 수 있지만 본 발명에서는 변환, 스케일링 및 회전의 조합을 모션 변환으로 간주한다. 또한, Dk를 위해 표준 평균 다운샘플링이 사용된다. 콘볼루션 신경망(CNN)과 분류부(330)에 학습 샘플(Xi)로부터 생성되는 충분한 수의 저해상도 변환 영상(Vik)을 제공함으로써 효율적인 학습이 이루어지게 할 수 있다.Where Fk is the camera motion transformation and Dk is the downsampling operator. Although F k can be any affine transformation, the present invention regards the combination of transformation, scaling, and rotation as motion transformation. Also, standard mean downsampling is used for D k . A sufficient number of low resolution transformed images V ik generated from the learning samples X i are provided to the convolutional neural network CNN and the classifier 330 so that efficient learning can be performed.
콘볼루션 신경망 학습부(22)는 해상도 변환부(21)에 의해 생성된 복수의 저해상도 영상을 입력 받아 공간 스트림 및 시간 스트림(광학 흐름)으로 구분하고, 각각의 스트림에 대해 콘볼루션 및 풀링(Pooling)을 수행하고 완전 연결 레이어(Fully connected layer)를 추가 적용하여 콘볼루션 신경망(Convolutional Neural Network)을 학습시킬 수 있다.The convolutional neural network learning unit 22 receives a plurality of low resolution images generated by the resolution conversion unit 21 and divides them into a spatial stream and a time stream (optical flow), and performs convolution and pooling ) And apply a fully connected layer to learn the Convolutional Neural Network.
도 7을 참조하면, 콘볼루션 신경망 학습부(22)는 저해상도 이미지들로부터 비롯된 공간 스트림과 시간 스트림을 파라미터로 사용하는 것을 알 수 있다. 이때 저해상도 영상의 공간 해상도로 16x12 픽셀을 사용한다. 좀 더 자세하게는, 공간 스트림은 각 프레임의 RGB 픽셀 값을 입력(즉, 입력 차원이 16x12x3)으로 하고, 시간 스트림은 X 및 Y 광학 흐름(optical flow) 이미지(즉, 16x12x20)의 10-프레임 연결을 취한다. X 및 Y 광학 흐름 이미지는 픽셀 당 "x (및 y) 광학 흐름 크기"를 계산하여 구성된다.Referring to FIG. 7, the convolutional neural network learning unit 22 can know that spatial streams and time streams derived from low-resolution images are used as parameters. At this time, the spatial resolution of the low-resolution image is 16 × 12 pixels. More specifically, the spatial stream is configured to input the RGB pixel values of each frame (i.e., the input dimension is 16x12x3), and the temporal stream is the 10-frame connection of the X and Y optical flow images (i.e., 16x12x20) Lt; / RTI &gt; The X and Y optical flow images are constructed by calculating the " x (and y) optical flow magnitude &quot; per pixel.
인간행동 인식모델 생성부(23)는 콘볼루션 신경망 학습부(22)가 학습한 데이터를 기반으로 촬영부(100)에 의해 촬영된 영상 내의 인간행동을 인식하기 위한 인간행동 인식모델을 생성할 수 있다. 인간행동 인식 모델은 분류자(Classifier)에 해당할 수 있다. 분류자(Classifier)는 새로운 영상이 입력된 경우 학습된 데이터를 바탕으로 영상 안에 포함된 이미지 정보 및 움직임 정보를 축약하고, 이를 사용하여 영상 안에 나타나는 행동을 인식할 수 있다. 본 발명의 영상 분석 기법은 콘볼루션 신경망(Covolutional neural network, CNN)뿐만 아니라 Support vector machine(SVM) 등에도 적용될 수 있다.The human behavior recognition model generation unit 23 can generate a human behavior recognition model for recognizing human behavior in an image photographed by the photographing unit 100 based on data learned by the convolution neural network learning unit 22 have. The human behavior recognition model may correspond to a classifier. The classifier can shorten the image information and motion information included in the image based on the learned data when a new image is input, and can recognize the behavior in the image using the reduced information. The image analysis technique of the present invention can be applied not only to a convolutional neural network (CNN) but also to a support vector machine (SVM).
도 8은 본 발명의 일 실시예에 따른 콘볼루션 신경망 학습에 적용되는 2-스트림 콘볼루션 신경망(two-stream convolutional neural network)의 구조를 나타낸 도면이다.8 is a diagram illustrating a structure of a two-stream convolutional neural network applied to convolutional neural network learning according to an embodiment of the present invention.
도 8을 참조하면, 콘볼루션 신경망 학습부(22)는 2-스트림 콘볼루션 신경망(two-stream convolutional neural network)을 이용하여 행동인식을 위한 학습을 수행할 수 있다. 2-스트림 콘볼루션 신경망은 영상의 각 프레임에 적용된다. 이들은 시간 피라미드를 사용하여 요약되고 하나의 영상(video) 표현을 생성한다. 시간 t에서 비디오 V의 각 프레임 Vt에 적용되는 2-스트림 네트워크를 h(Vt) 라 하자. 그러면 표현 f(V;θ)는 다음의 [수학식 2]와 같이 계산된다.Referring to FIG. 8, the convolutional neural network learning unit 22 can perform learning for behavior recognition using a two-stream convolutional neural network. A two-stream convolution neural network is applied to each frame of the image. They are summarized using time pyramids and generate a single video representation. Let a 2-stream network applied to each frame V t of video V at time t be h (V t ). Then, the expression f (V;?) Is calculated according to the following equation (2).
Figure PCTKR2018008196-appb-M000002
Figure PCTKR2018008196-appb-M000002
여기에서, “,”는 벡터 연결 연산자를 나타내고, T는 영상 V 내의 프레임들의 수이고, fc는 연결 최상부에 적용될 완전 연결 레이어(fully connected layers)들의 세트를 나타낸다. h(Vt)의 크기는 512-D: 256×2이다. θ는 학습 데이터로부터 학습이 필요한 CNN에 있는 파라미터의 집합이다. max는 각 요소의 최대값을 계산하는 임시 최대 풀링 연산자이다. 실시예에서, 4 레벨의 시간 피라미드가 사용되었다(즉, 총 15 개의 최대 풀링).Here, &quot;,&quot; denotes the vector concatenation operator, T is the number of frames in the image V, and fc represents the set of fully connected layers to be applied at the top of the connection. The size of h (V t ) is 512-D: 256 x 2. θ is a set of parameters in CNN that require learning from learning data. max is a temporary maximum pooling operator that computes the maximum value of each element. In the example, four levels of time pyramids were used (i.e., a total of 15 maximum poolings).
f(V)에 완전 연결 레이어(fully connected layers)와 softmax 레이어를 추가로 적용하면 분류자(Classifier)를 학습시킬 수 있다. g를 그러한 계층으로 놓으면, y=g(f(V;θ))이다. 여기서 y는 행동 분류 라벨(activity class label)이다. 해상도 변환부(21)에 의해 생성된 저해상도 영상을 사용하여 분류 손실(classification loss)을 갖는 학습 g(f(V;θ))은 기본적인 영상 분류 모델을 제공할 수 있다. 도 8은 이러한 시간 피라미드를 사용한 2-스트림 CNN 모델의 전체적인 구조를 보여준다.A fully connected layer and a softmax layer can be applied to f (V) to learn the classifier. Let g be such a layer, y = g (f (V;?)). Where y is an activity class label. The learning g (f (V; [theta]) having classification loss using the low resolution image generated by the resolution converting unit 21 can provide a basic image classification model. Figure 8 shows the overall structure of a two-stream CNN model using this time pyramid.
도 9는 본 발명의 제1 실시예에 따른 콘볼루션 신경망 학습부(22)가 임베딩 공간(Embedding space)을 이용하여 영상 내 행동 인식을 위한 학습을 수행하는 것을 설명하는 도면이다.FIG. 9 is a diagram for explaining how the convolutional neural network learning unit 22 according to the first embodiment of the present invention performs learning for intra-image behavior recognition using an embedding space.
도 8의 2-스트림 네트워크 설계는 분류에 최적화된 모델 매개 변수를 학습하여 행동 영상을 분류할 수 있지만, 동일한 장면에 적용된 다른 변환으로 인해 서로 다른 저해상도 데이터가 생성되는 초저해상도 영상의 특성을 고려하지는 않은 것이다. 분류자(Classifier)가 초저해상도 영상의 특성을 보다 잘 반영할 수 있도록 하려면 동일한 의미 콘텐츠를 가진 서로 다른 저해상도 영상에 대한 변환이 어떠하든 간에 동일한 임베딩 위치(embedding location)에 매핑하는 임베딩 공간(embedding space)을 학습해야 한다. 임베딩 학습(embedding learning)은 종단 간 (end-to-end) 방식으로 학습된 임베딩을 사용하여 임베딩 학습 및 분류 모두에 대해 공동으로 최적화되고, 보다 일반화 된(즉, 덜 오버피트(overfit)된) 분류자(Classifier)의 학습을 가능하게 한다.The two-stream network design of FIG. 8 can classify behavioral images by learning model parameters optimized for classification, but does not consider the characteristics of ultra-low resolution images in which different low resolution data are generated due to different transformations applied to the same scene . In order for the classifier to better reflect the characteristics of the ultralow-resolution image, the embedding space, which maps to the same embedding location, regardless of the conversion to different low-resolution images with the same semantic content, . Embedding learning is commonly optimized for both embedded learning and classification using the learned embedding in an end-to-end manner, and more generalized (i.e., overfilled) Enables learning of classifiers.
도 9를 참조하면, 임베딩 학습은 양의 쌍(positive pair) 사이의 임베딩 거리(embedding distance)를 최대화하면서 음의 쌍(negative pair) 사이의 임베딩 거리를 최소화 함으로써 수행된다.Referring to FIG. 9, embedding learning is performed by minimizing the embedding distance between negative pairs while maximizing the embedding distance between positive pairs.
도 10은 본 발명의 일 실시예에 따른 콘볼루션 신경망 학습에 적용되는 멀티-시아미즈 콘볼루션 신경망(multi-siamese convolutional neural network)의 구조를 나타낸 도면이다.10 is a diagram illustrating a structure of a multi-siamese convolutional neural network applied to convolutional neural network learning according to an embodiment of the present invention.
도 10을 참조하면, 해상도 변환부(21)는 동일한 고해상도 영상으로부터 변환되어 서로 다른 해상도를 갖는 n개의 저해상도 영상으로 구성된 제1 배치(batch)와 복수의 서로 다른 고해상도 영상으로부터 변환되어 서로 다른 해상도를 갖는 n개의 저해상도 영상으로 구성된 제2 배치(batch)를 학습 데이터로 생성하고, 콘볼루션 신경망 학습부(22)로 전송할 수 있다.Referring to FIG. 10, the resolution converting unit 21 converts a first batch composed of n low-resolution images having different resolutions and a plurality of different high-resolution images converted from the same high-resolution image, A second batch composed of n low-resolution images having the same size as the training data, and transmits the training data to the convolutional neural network learning unit 22. [
콘볼루션 신경망 학습부(22)는 제1 배치 및 제2 배치를 입력 받아 각각의 영상에 대해 공간 스트림 및 시간 스트림으로 구분하고, 각각의 스트림에 대해 콘볼루션 및 풀링(Pooling)을 수행하고 완전 연결 레이어(Fully connected layer)를 추가 적용한 후, 임베딩 공간(Embedding space)에 맵핑하여 임베딩 거리(embedding distance)를 조절함으로써 영상 내 행동 인식을 위한 학습을 수행할 수 있다.The convolutional neural network learning unit 22 receives the first and second arrangements, divides each of the images into a spatial stream and a temporal stream, performs convolution and pooling for each stream, After the layer (Fully connected layer) is further applied, the embedding distance is mapped by mapping to the embedding space, so that learning for motion recognition in an image can be performed.
시아미즈 신경망(Siamese neural network)은 두 개의 입력 사이의 유사성 척도를 학습하기 위해 종종 사용되는 것으로, 동일한 파라미터를 공유하는 두 개의 네트워크를 갖는 개념이다. 시아미즈 네트워크의 목표는 비슷한 항목 (본 발명의 경우에는 저해상도 영상)을 근처에 배치하는 임베딩 공간을 학습하는 것이다. 보다 구체적으로는, 임베딩 공간에서 가까이 위치해야 하는 양의 쌍(positive pair)에 대응하는 샘플과 멀리 떨어져서 위치해야 하는 음의 쌍(negative pair)에 대응하는 샘플이 학습에 사용된다.The Siamese neural network is a concept often used to learn similarity measures between two inputs, with two networks sharing the same parameters. The goal of the Siamese network is to learn the embedding space in which similar items (low-resolution images in the case of the present invention) are placed nearby. More specifically, a sample corresponding to a positive pair that should be located close in the embedding space and a sample corresponding to a negative pair that should be located remotely are used for learning.
x=f(V;θ)를 본 발명에서 사용되는 CNN이라고 하면, 학습 도중에, 동일한 네트워크 f(V; θ)를 임의의 저해상도 영상 Vi 및 Vj에 두 번 적용함으로써 xi=f(Vi; θ) 및 xj=f(Vj;θ)가 얻어진다. 여기서 (xi, xj)는 양의 쌍 또는 음의 쌍일 수 있다. 네트워크 파라미터 θ를 학습하기 위한 대조 손실(contrastive loss)은 다음의 [수학식 3]과 같다. (V;?) is applied to the arbitrary low-resolution images V i and V j twice during the learning, x i = f (V;?) ;?) and x j = f (V j ;?) are obtained. Where (x i, x j) may ssangil the positive or negative of the pair. The contrast loss for learning the network parameter [theta] is expressed by the following equation (3).
Figure PCTKR2018008196-appb-M000003
Figure PCTKR2018008196-appb-M000003
여기에서 m은 미리 결정된 마진이고, B는 사용되는 저해상도 학습 예제의 배치(batch)이며, i와 j는 배치에서 학습 쌍의 인덱스이다. y'( i,j )는 이진수로서, 양의 쌍인 경우 1이고 음의 쌍인 경우 0이다.Where m is a predetermined margin, B is the batch of low resolution learning examples used, i and j are the index of the learning pair in the batch. y ' ( i, j ) is a binary number, 1 for positive pairs and 0 for negative pairs.
저해상도 영상 내 행동인식을 위한 임베딩 학습에서, 양의 쌍은 동일한 고해상도 영상에서 비롯된 두 개의 저해상도 영상으로 구성되고, 음의 쌍은 서로 다른 고해상도 영상으로부터 비롯된 두 개의 저해상도 영상으로 구성된다. 도 9는 대조적인 손실을 가진 임베딩 학습을 보여준다. 게다가, y=g(f(V;θ))를 학습하여 저해상도 영상을 최종 분류하는 것이 목적이기 때문에, 다음의 [수학식 4]와 같이 결합 손실 함수를 사용하여 네트워크를 학습시켜야 한다.In embedding learning for low-resolution intra-image behavior recognition, the positive pair is composed of two low-resolution images derived from the same high-resolution image, and the negative pair is composed of two low-resolution images derived from different high-resolution images. Figure 9 shows embedding learning with contrast loss. In addition, since the goal is to finally classify the low-resolution images by learning y = g (f (V;?)), The network must be learned by using the combining loss function as shown in the following equation (4).
Figure PCTKR2018008196-appb-M000004
Figure PCTKR2018008196-appb-M000004
여기에서 Lclass(θ)는 네트워크 y=g(f(V;θ))의 표준 분류 손실이고, λ1, λ2는 가중치이다.Where L class (θ) is the standard classification loss of the network y = g (f (V; θ)), and λ 1 , λ 2 are the weights.
본 발명의 멀티-시아미즈 콘볼루션 신경망은, 2개의 네트워크 공유 파라미터만 갖는 표준 시아미즈 네트워크와는 달리, f(V;θ)에 대해 동일한 파라미터 θ를 공유하는 2n 개의 네트워크 복사본을 갖는다. 멀티-시아미즈 콘볼루션 신경망은 복사본 각각을 n개의 서로 다른 저해상도 변환(즉, Fk) 각각에 대응시킨다. 따라서 대조 손실을 사용하여 임베딩 거리를 작게 유지할 수 있다. 그리고 멀티-시아미즈 콘볼루션 신경망은 음의 학습 쌍을 형성하기 위해 첫 번째 n 브랜치(branch)의 장면과 일치하지 않는 영상을 사용하여 n개의 네트워크 복사본을 더 갖는다.The multi-siamese convolutional neural network of the present invention has 2n network copies sharing the same parameter θ for f (V; θ), unlike the standard cyamises network with only two network sharing parameters. The multi-siamese convolutional neural network maps each copy to each of the n different low resolution transforms (i.e., F k ). Therefore, the embedding distance can be kept small by using the contrast loss. And the multi-siamuse convolutional neural network has n more network copies using images that do not match the scene of the first n branch to form a negative learning pair.
xik=f(Vik;θ)로 하면, 여기서 Vik는 변환 Fk를 Xi에 적용함으로써 얻어진다. 원본 고해상도 학습 영상의 배치 B를 바탕으로 두 종류의 배치가 무작위로 생성된다. B1은 단일 고해상도 영상 Xi에서 생성된 저해상도 영상의 배치이고, B2는 무작위로 선택된 저해상도 영상의 배치이다. B1은 양의 쌍을 생성하고, B2는 음의 쌍을 생성한다. B1과 B2의 크기는 n이어야 한다. B1은 B의 각 예제 Xi에 대해 n개의 서로 다른 저해상도 변환을 적용하여 얻어지고, 각각의 결과 Vik=DkFkXi이 멀티-시아미즈 네트워크의 첫 번째 n 브랜치에 제공된다. B2의 저해상도 예제 Vj는 시아미즈 네트워크의 나머지 n 개 브랜치에 직접 제공된다. 따라서 새로운 손실 함수는 다음의 [수학식 5]와 같다.Let X ik = f (V ik;? ), where V ik is obtained by applying the transform F k to X i . Two types of placement are randomly generated based on placement B of the original high resolution learning image. B 1 is the arrangement of the low-resolution images generated in the single high-resolution image Xi, and B 2 is the arrangement of the randomly selected low-resolution images. B 1 produces a positive pair, and B 2 produces a negative pair. The size of B 1 and B 2 should be n. B 1 is obtained by applying n different low-resolution transforms to each example Xi of B, and each result V ik = D k F k X i is provided to the first n branches of the multi-siamese network. The low-resolution example Vj of B 2 is provided directly to the remaining n branches of the siamese network. Therefore, the new loss function is expressed by the following equation (5).
Figure PCTKR2018008196-appb-M000005
Figure PCTKR2018008196-appb-M000005
즉, 본 발명의 멀티-시아미즈 콘볼루션 신경망 모델은 임베딩 학습을 위해 복수의 저해상도 변환을 동시에 고려한다. 새로운 손실 함수는 본질적으로 n 개 저해상도 변환의 모든 쌍을 양의 쌍으로 취하고, 별도의 배치를 사용하여 같은 수의 음의 쌍을 고려한다.That is, the multi-siamese convolution neural network model of the present invention simultaneously considers a plurality of low-resolution transforms for embedding learning. The new loss function essentially takes all pairs of n low resolution transforms as positive pairs and considers the same number of negative pairs using a separate arrangement.
최종 손실 함수는 상기 멀티-시아미즈 대조 손실과 [수학식 4]에서 수행된 표준 분류 손실을 결합하여 계산된다. 도 10은 멀티-시아미즈 임베딩 및 분류자(Classifier)의 학습에 관한 전반적인 과정을 보여준다. 이것은 서로 다른 저해상도 쌍의 복수의 대조 손실을 혼합한 시아미즈 CNN으로 볼 수 있다.The final loss function is calculated by combining the multi-cimase contrast loss with the standard classification loss performed in equation (4). FIG. 10 shows the overall process of learning multi-cyamics embedding and classifier. This can be seen as Siamese CNN, which combines multiple contrast losses of different low resolution pairs.
본 발명의 멀티-시아미즈 콘볼루션 신경망 모델은 임베딩 학습과 분류를 위해 3개의 완전 연결 레이어(fully connected layer)를 사용한다. 시간 피라미드 적용 후, 영상 당 7680-D (즉, 15×256×2)의 중간 표현을 얻는다. 그 다음 크기가 8192인 완전 연결 레이어를 얻는다. 임베딩 학습은 2번째 완전 연결 레이어 이후에 이루어지고, x는 8192-D의 차원을 갖는다. 분류(classification)는 가장 위에 또 하나의 완전 연결 레이어(fully connected layer)와 하나의 소프트 맥스 레이어(soft max layer)를 추가하여 수행된다.The multi-siamese convolution neural network model of the present invention uses three fully connected layers for embedding learning and classification. After applying the time pyramid, we obtain an intermediate representation of 7680-D (ie, 15 × 256 × 2) per image. Next we get a fully connected layer of size 8192. Embedding learning occurs after the second full-connect layer, where x has a dimension of 8192-D. Classification is performed by adding another fully connected layer and a soft max layer at the top.
도 11은 본 발명의 일 실시예에 따른 카메라 시스템이 영상 내 행동을 자동 인식하는 방법 중 콘볼루션 신경망을 학습시키는 단계를 나타낸 순서도이다.11 is a flowchart illustrating a step of learning a convolutional neural network among methods of automatically recognizing behaviors in a video camera system according to an embodiment of the present invention.
도 11을 참조하면, 데이터 서버(20)는 고해상도 학습 영상을 수신한다(S1101). 해상도 변환부(21)는 수신한 고해상도의 인간행동 영상을 복수의 저해상도 영상으로 변환한다(S1102). 이 때, 해상도 변환부(21)는 동일한 고해상도 영상으로부터 변환되어 서로 다른 해상도를 갖는 n개의 저해상도 영상으로 구성된 제1 배치(batch)와 복수의 서로 다른 고해상도 영상으로부터 변환되어 서로 다른 해상도를 갖는 n개의 저해상도 영상으로 구성된 제2 배치(batch)를 학습 데이터로 생성하고(S1103), 콘볼루션 신경망 학습부(22)로 전송한다. 콘볼루션 신경망 학습부(22)가 복수의 저해상도 영상을 입력 받아 공간 스트림 및 시간 스트림으로 구분하고, 각각의 스트림에 대해 콘볼루션 및 풀링(Pooling)을 수행하au, 완전 연결 레이어(Fully connected layer)를 추가 적용하여 콘볼루션 신경망(Convolutional Neural Network)을 학습시킨다(S1104). 이 때, 콘볼루션 신경망 학습부(22)는 제1 배치 및 제2 배치를 입력 받아 각각의 영상에 대해 공간 스트림 및 시간 스트림으로 구분하고, 각각의 스트림에 대해 콘볼루션 및 풀링(Pooling)을 수행하고 완전 연결 레이어(Fully connected layer)를 추가 적용한 후, 임베딩 공간(Embedding space)에 맵핑하여 임베딩 거리(embedding distance)를 조절함으로써 영상 내 행동 인식을 위한 학습을 수행할 수 있다(S1105).Referring to FIG. 11, the data server 20 receives a high-resolution learning image (S1101). The resolution conversion unit 21 converts the received high-resolution human behavior image into a plurality of low-resolution images (S1102). At this time, the resolution converting unit 21 converts a first batch composed of n low resolution images having different resolutions and a plurality of different high resolution images converted from the same high resolution image, (S1103), and transmits the second batch to the convolutional neural network learning unit 22. The convolutional neural network learning unit 22 generates a second batch of low-resolution images as learning data (S1103). The convolutional neural network learning unit 22 receives a plurality of low-resolution images and divides them into a spatial stream and a temporal stream, performs a convolution and a pooling for each stream, and performs au, a fully connected layer, And the convolutional neural network is learned (S1104). At this time, the convolutional neural network learning unit 22 receives the first and second arrangements, divides each of the images into a spatial stream and a time stream, and performs convolution and pooling for each stream After the Fully connected layer is further applied, the embedding distance is mapped to the embedding space to perform learning for intra-image motion recognition (S1105).
도 12는 본 발명의 일 실시예에 따른 카메라 시스템의 영상 내 행동 자동 인식 방법을 적용하였을 경우와 종래기술 간 행동 인식의 정확도를 비교하여 나타낸 도면이다.FIG. 12 is a view comparing the accuracy of the behavior recognition between the case where the method of automatically detecting the behavior in the camera of the camera system according to the embodiment of the present invention and the accuracy of the behavior recognition between the prior art.
도 12는 16x12 HMDB dataset 또는 DogCentric activity dataset를 사용하여 (1) 기본 1-스트림 CNN을 적용한 경우, (2) 2-스트림 CNN을 적용한 경우로 나누어 실험했을 때, 분류의 정확도를 비교한 결과를 나타낸 표이다. 위 두 CNN에 대해 (i) 복수의 저해상도 변환을 사용하지 않은 학습, (ii) 복수의 저해상도 변환을 적용하나 임베딩 학습을 적용하지 않은 학습, (iii) 시아미즈 임베딩 학습을 적용한 학습으로 나누어 결과를 비교하였다.FIG. 12 shows the results of comparing the accuracy of classification using a 16x12 HMDB dataset or a DogCentric activity dataset in the case of (1) applying basic 1-stream CNN, (2) applying 2-stream CNN, Table. For the above two CNNs, we divided into (i) learning without using multiple low-resolution transforms, (ii) learning with multiple low-resolution transformations applied but without embedding learning, and (iii) learning with Siamese embedding learning. Respectively.
실험에 사용된 저해상도 변환의 수는 15이다. 이러한 변환은 균일한 풀에서 무작위로 선택되었다. {-5, -2.5, 0, +2.5, +5}%의 X와 Y 방향으로, 총 25 개의 모션 변환 Fk가 제공되었다. 또한, 각도 {-5, 0, 5}도의 세가지 다른 회전으로 총 75개의 변환이 제공되었다. 75개 중에서 27개의 변환이 무작위로 선택되었다.The number of low-resolution transformations used in the experiment is 15. These transformations were randomly selected from a uniform pool. A total of 25 motion transforms F k were provided in the X and Y directions of {-5, -2.5, 0, +2.5, + 5}%. In addition, a total of 75 transforms were provided with three different rotations of the angle {-5, 0, 5} degrees. 27 transforms out of 75 were randomly selected.
도 12a를 참조하면, 2-스트림 CNN에 시아미즈 학습을 적용한 경우가 가장 높은 정확도를 갖는 것을 알 수 있다. 즉, 시아미즈 네트워크 구조를 사용하여 임베딩 공간을 학습하는 것이 행동 분류에 있어서 큰 효과를 나타낸다는 것을 알 수 있다. 복수의 저해상도 변환에 기반한 대조 손실의 사용은 분류자의 학습에 있어서 더 강력하고, 학습 데이터의 오버피팅을 줄일 수 있게 한다.Referring to FIG. 12A, it can be seen that the case of applying the siamese learning to the 2-stream CNN has the highest accuracy. In other words, it can be seen that learning the embedding space using the cyamuse network structure has a great effect on the behavior classification. The use of contrast loss based on multiple low resolution transformations is more robust in classifier learning and reduces overfitting of training data.
도 12b는 본 발명과 종래기술을 비교한 결과를 나타낸 표이다. 표에서 나타나듯이, 본 발명에 의한 분류 정확도가 8% 이상 높은 것을 알 수 있다. 즉, 본 발명의 일 실시예인 2-스트림 멀티-시아미즈 CNN에 임베딩 학습을 적용한 방법은 16x12 해상도의 행동 인식에 있어서 가장 좋은 결과를 도출한다.12B is a table showing the result of comparing the present invention with the conventional technique. As shown in the table, it can be seen that the classification accuracy according to the present invention is higher than 8%. That is, the method of applying the embedding learning to the 2-stream multi-Siamese CNN, which is one embodiment of the present invention, yields the best result in the 16x12 resolution motion recognition.
도 12c 및 도 12d는 DogCentric activity dataset을 사용하여 실험한 결과를 나타낸 표이다. 결과적으로, 본 발명의 일 실시예에 따른 카메라 시스템의 영상 내 행동 자동 인식 방법이 행동 인식에 있어서 가장 좋은 정확도를 나타내며, 따라서 사생활 보호에 가장 효과적인 것임을 알 수 있다.12C and 12D are tables showing the results of experiments using DogCentric activity dataset. As a result, it can be seen that the automatic detection method of the behavior in the camera system of the camera according to the embodiment of the present invention shows the best accuracy in the behavior recognition, and thus is the most effective in privacy protection.
<제2 실시예>&Lt; Embodiment 2 >
도 13은 본 발명의 제2 실시예에 따른 카메라 시스템의 구성을 보여주는 블록도이다. 13 is a block diagram illustrating the configuration of a camera system according to a second embodiment of the present invention.
도 13을 참조하면, 사생활 보호 카메라 시스템(5)은 카메라(50)와 데이터 서버(60)를 포함할 수 있다.Referring to FIG. 13, the privacy camera system 5 may include a camera 50 and a data server 60.
카메라(50)는 데이터 서버(60)로부터 상기 얼굴인식 모델을 전송 받고, 고해상도 영상을 촬영하여 저해상도 변환을 수행하며, 상기 데이터 서버(60)로부터 전송 받은 얼굴인식 모델을 이용하여 변환된 영상 내의 얼굴을 인식하고, 얼굴인식 결과에 따라 검출된 얼굴영역과 얼굴영역 이외의 영역을 구분하여 서로 다른 해상도를 갖도록 재변환함으로써 얼굴영역에 대해 익명화 처리된 영상을 출력할 수 있다. 본 발명의 카메라(50)는 다양한 카메라 기기를 포함할 수 있다. 예를 들면, 스마트홈 카메라, 로봇 카메라, 방범용 CCTV, 웨어러블 카메라, 차량용 블랙박스 카메라 등일 수 있다. 카메라(50)의 구체적인 구성과 기능에 대해서는 이하 도 15 내지 도 19를 통해 설명한다.The camera 50 receives the face recognition model from the data server 60, captures a high-resolution image and performs low-resolution conversion. The camera 50 uses the face recognition model transmitted from the data server 60, And an image that has been anonymized with respect to the face region can be output by re-converting the detected face region and the region other than the face region according to the face recognition result to have different resolutions. The camera 50 of the present invention may include various camera devices. For example, it can be a smart home camera, a robot camera, a security CCTV, a wearable camera, a car black box camera, and the like. The specific configuration and function of the camera 50 will be described below with reference to FIG. 15 to FIG.
데이터 서버(60)는 학습 영상을 이용하여 영상 내의 얼굴인식을 위한 학습을 수행하고 얼굴인식 모델을 생성할 수 있다. 학습 영상에는 고해상도 영상 또는 저해상도 영상이 모두 포함될 수 있다. 데이터 서버(60)는 얼굴인식 모델을 생성하는데 있어서, 콘볼루션 신경망(Convolutional neural network, CNN)을 이용한다. 콘볼루션 신경망(Convolutional neural network, CNN)은 입력되는 영상(이미지 프레임)을 분석하고, 분류하기 위해 학습하는 구조를 가진 알고리즘이다.The data server 60 can perform learning for face recognition in an image using a learning image and generate a face recognition model. The learning image may include both a high-resolution image and a low-resolution image. The data server 60 uses a convolutional neural network (CNN) in generating a face recognition model. Convolutional neural network (CNN) is an algorithm with a structure that learns to analyze and classify input image (image frame).
본 발명에서는 Fully convolutional network(FCN)을 초저해상도 얼굴 인식에 맞추어 새로이 적용하였다. 이러한 FCN 모델은 학습 데이터(10x10 크기의 얼굴이 포함된 복수의 이미지들)를 사용하여 입력되는 영상(이미지 프레임)의 특정 픽셀이 얼굴 픽셀인지 아닌지 분석 및 분류할 수 있다.In the present invention, the Fully convolutional network (FCN) is newly applied to the ultra-low resolution face recognition. Such an FCN model can analyze and classify whether or not a specific pixel of an input image (image frame) is a face pixel using learning data (a plurality of images including a 10x10 face).
저해상도 영상 내의 얼굴 인식을 수행하는 Fully convolutional network(FCN) 구조는 도 14를 통해 확인할 수 있다.The Fully convolutional network (FCN) structure for performing face recognition in a low-resolution image can be confirmed from FIG.
도 14a를 참조하면, Fully convolutional network(FCN)은 총 19개의 컨벌루션 레이어들(convolution layers)과 3개의 디컨벌루션 레이어들(deconvolution layers)로 구성되어 있으며, 각 레이어(layer)의 채널(channel) 수(depth)는 도 14b의 표와 같다.14A, the Fully Convolutional Network (FCN) is composed of 19 convolution layers and 3 deconvolution layers, and the number of channels of each layer (depth) is shown in the table of Fig. 14B.
이러한 FCN은 10x10 크기의 얼굴 인식(검출)에 특화되도록 학습되며, 학습은 얼굴 이미지 데이터베이스를 사용하여 수행될 수 있다. 학습은 컨볼루션 신경망에서 백프로퍼게이션(backpropagation) 방법을 따른다. FCN의 학습이 완료되면, FCN은 새로운 입력 이미지에 대해 픽셀 별로 얼굴에 해당하는지를 판단하여 해당 픽셀이 얼굴일 확률을 도출할 수 있다. 얼굴 픽셀의 크기는 10x10 또는 그 이상의 크기일 수 있다.This FCN is learned to be specialized for 10x10 size face recognition (detection), and learning can be performed using a face image database. The learning follows a backpropagation method in the convolutional neural network. When the learning of the FCN is completed, the FCN judges whether a new input image corresponds to a face for each pixel to derive a probability that the corresponding pixel is a face. The size of the face pixel may be 10 x 10 or larger.
FCN이 얼굴 픽셀에 해당하는 것으로 판단한 경우, 슬라이딩 윈도우(sliding window)방법을 적용하여 얼굴의 위치와 크기를 찾는다. 슬라이딩 윈도우(sliding window)는 일정 크기의 바운딩 박스(bounding box)를 영상의 모든 위치에 적용하여, 박스 내부에 얼굴 픽셀이 몇 개나 존재하는지 확인하는 방법이다. 이를 이용하여 얼굴일 확률이 높은 픽셀이 많이 포함된 박스(box)를 찾고, 그러한 박스들을 non-maximum suppression하여 최종 바운딩 박스(bounding box)를 찾는다. Non-maximum suppression이란 image processing을 통해 얻은 edge를 얇게 만들어주는 것을 말한다. Gaussian mask와 sobel mask를 이용해 찾은 edge는 bluring되어 있는, 쉽게 말해서 뭉개져 있는 edge이기 때문에 더욱 선명한 선을 찾기 위해 Non-maximum suppression이 수행된다. 즉, non-maximum suppression은 중심 픽셀을 기준으로 8방향의 픽셀 값들을 비교하여 중심픽셀이 가장 클 경우 그대로 두고, 아닐 경우에는 제거해주는 과정이다.If it is determined that the FCN corresponds to the face pixel, a sliding window method is applied to find the position and size of the face. A sliding window is a method of checking how many face pixels are present in a box by applying a bounding box of a predetermined size to all positions of the image. Using this, we find a box containing a lot of pixels with a high probability of a face, and non-maximum suppression of such boxes to find a final bounding box. Non-maximum suppression refers to thinning edges obtained from image processing. Non-maximum suppression is performed to find a sharper line because the edges found using Gaussian mask and sob mask are blurred edges that are easily crumpled. In other words, non-maximum suppression is a process of comparing the pixel values of 8 directions based on the center pixel and removing the center pixel when the center pixel is the largest.
이때, 적분 영상(integral image)를 사용하면 계산 시간이 절약된다. 적분 영상(integral image)이란 쉽게 말해서 다음 픽셀에 이전 픽셀까지의 합이 더해진 영상이다. 적분 영상을 사용할 경우 특정 영역의 픽셀 값의 총합을 매우 쉽게 구할 수 있다.At this time, using an integral image saves computation time. An integral image is simply an image with the sum of the next pixel plus the previous pixel. When the integral image is used, the sum of pixel values of a specific area can be obtained very easily.
도 15는 본 발명의 얼굴인식 기반 실시간 자동 영상 익명화 방법에 의해 익명화 처리된 영상의 예시를 보여주는 그림이다.FIG. 15 is a diagram illustrating an example of an image that has been anonymized by the face recognition based real-time automatic image anonymization method of the present invention.
도 15를 참조하면, 가장 왼쪽의 이미지가 고해상도 영상이고, 중간 이미지가 초저해상도로 변환된 영상이며, 가장 오른쪽 이미지가 익명화 완료된 영상에 해당한다. 본 발명의 제2 실시예에 따른 카메라 및 영상 익명화 방법은 도 15에서 확인할 수 있는 바와 같이 얼굴영역만을 초저해상도로 변환하여 촬영하고 얼굴영역 외의 영역은 고해상도를 유지하여 촬영할 수 있다.Referring to FIG. 15, the leftmost image is a high-resolution image, the middle image is a low-resolution image, and the rightmost image corresponds to an anonymized image. As shown in FIG. 15, the camera and image anonymizing method according to the second embodiment of the present invention can convert only a face area into an ultra-low resolution image and photograph a region outside the face area while maintaining a high resolution.
도 16은 본 발명의 제2 실시예에 따른 카메라(50)의 구성을 나타내는 블록도이다.16 is a block diagram showing the configuration of a camera 50 according to the second embodiment of the present invention.
도 16을 참조하면, 본 발명의 카메라(50)는 통신부(150), 촬영부(250), 저해상도 변환 모듈(350), 프로세서(450), 출력부(550), 저장부(650)를 포함할 수 있다.16, the camera 50 of the present invention includes a communication unit 150, a photographing unit 250, a low resolution conversion module 350, a processor 450, an output unit 550, and a storage unit 650 can do.
통신부(150)는 데이터 서버(60)로부터 영상 내의 얼굴인식을 수행하기 위한 얼굴인식 모델을 수신하여 프로세서(450)로 전송할 수 있다. 위에서 설명한 바와 같이, 카메라(50)가 촬영하는 영상 내에서 얼굴을 인식하는데 사용되는 얼굴인식 모델은 데이터 서버(60)에 의해 생성된다. 그 외에, 통신부(150)는 카메라(50)가 촬영하고 저장한 영상을 데이터 서버(60)로 전송할 수도 있을 것이다.The communication unit 150 may receive the face recognition model for performing face recognition in the image from the data server 60 and transmit the received face recognition model to the processor 450. As described above, the face recognition model used for recognizing a face in an image captured by the camera 50 is generated by the data server 60. [ In addition, the communication unit 150 may transmit the image captured and stored by the camera 50 to the data server 60.
 통신부(150)는 무선 통신모듈 및 유선 통신모듈 중 적어도 하나를 포함할 수 있다. 그리고 무선 통신모듈은 무선망 통신모듈, 무선랜(WLAN, Wireless Local Area Network 또는 Wi-Fi, Wireless Fidelity 또는 WiMAX, Worldwide Interoperability for Microwave Access) 통신모듈 및 무선팬(WPAN, Wireless Personal Area Network) 통신모듈 중 적어도 하나를 포함할 수 있다.The communication unit 150 may include at least one of a wireless communication module and a wired communication module. The wireless communication module may include a wireless network communication module, a wireless local area network (WLAN), a wireless communication module such as a Wi-Fi, a wireless fidelity or WiMAX, and a wireless personal area network (WPAN) Or the like.
촬영부(250)는 피사체에 대하여 고해상도 영상을 촬영할 수 있다. 촬영부(250)는 촬영한 고해상도 영상을 저해상도 변환 모듈(350)로 전송한다.The photographing unit 250 can photograph a high-resolution image with respect to a subject. The photographing unit 250 transmits the photographed high resolution image to the low resolution conversion module 350.
저해상도 변환 모듈(350)은 촬영부(250)로부터 고해상도 영상을 수신하고, 프로세서(450)로부터 얼굴영역의 위치와 목표 해상도 값을 입력 받아 고해상도 영상을 변환할 수 있다. 이 때, 입력되는 초기 얼굴영역의 위치는 설정되지 않은 것으로서 empty 값일 수 있고, 초기 목표 해상도 값은 기 설정된 초저해상도 값일 수 있다. 초저해상도 값은 16x12 픽셀일 수 있다. 즉, 저해상도 변환 모듈(350)은 초기에는 얼굴영역과 얼굴영역 외의 영역을 구분하지 않고 전체 이미지에 대해 저해상도 변환을 수행한다. Resolution conversion module 350 receives the high resolution image from the image sensing unit 250 and receives the position of the face region and the target resolution value from the processor 450 to convert the high resolution image. In this case, the position of the initial face region to be input may be an empty value that is not set, and the initial target resolution value may be a predetermined low resolution value. The ultralow resolution value may be 16x12 pixels. That is, the low resolution conversion module 350 initially performs a low resolution conversion on the entire image without distinguishing the regions other than the face region and the face region.
저해상도 변환 모듈(350)은 프로세서(450)로부터 저장된 얼굴영역의 위치를 입력 받아 영상 내의 얼굴영역과 얼굴영역 외의 영역을 구분하여 각각 변환할 수 있다. 이때, 얼굴영역에 대해서는 기 설정된 초저해상도 변환하고, 얼굴영역 외의 영역에 대해서는 입력된 목표 해상도 값에 맞추어 변환할 수 있다. 저해상도 변환 모듈(350)은 회로로 구현될 수 있으며, 프로세서(450)로부터 얼굴영역 위치정보와 목표 해상도 값을 입력 받아 촬영부(250)로부터 계속적으로 주어지는 이미지 프레임을 변환하는 역할만을 수행하며, 어떠한 저장공간 또는 메모리도 가지지 않는다.The low-resolution conversion module 350 receives the position of the face region stored from the processor 450, and converts the face region and the non-face region in the image. At this time, the face area can be converted into a predetermined low resolution resolution, and the area outside the face area can be converted into the target resolution value. The low resolution conversion module 350 may be implemented as a circuit and only receives the face area position information and the target resolution value from the processor 450 and converts the image frame continuously supplied from the photographing unit 250, It does not have storage space or memory.
저해상도 변환 모듈(350)은 프로세서(450)로부터 목표 해상도 값만을 받아서 고해상도 영상을 변환할 수도 있다. 즉, 저해상도 변환 모듈(350)은 얼굴영역을 따로 구분하지 않고 영상(이미지) 전체에 대해 저해상도 변환을 수행할 수 있다.The low resolution conversion module 350 may convert the high resolution image by receiving only the target resolution value from the processor 450. That is, the low resolution conversion module 350 can perform the low resolution conversion on the entire image without distinguishing the face region separately.
프로세서(450)는 통신부(150)로부터 전송 받은 얼굴인식 모델을 이용하여 저해상도 변환 모듈(350)에 의해 변환된 영상 내의 얼굴을 인식할 수 있다. 프로세서(450)는 CPU(Central Processing Unit) 또는 GPU(Graphics Processing Unit) 등의 처리 장치일 수 있다. 프로세서(450)는 얼굴인식 결과에 따라 얼굴영역의 위치와 목표 해상도 값을 갱신한다. 즉, 프로세서(450)는 저해상도 변환 모듈(350)에 의해 변환된 영상 내에서 얼굴이 검출된 경우, 검출된 얼굴영역을 기록하고 그 위치정보를 저장하고, 검출된 얼굴영역 외의 나머지 영역에 대한 목표 해상도 값을 증가 시킨다.The processor 450 can recognize a face in the image converted by the low resolution conversion module 350 using the face recognition model transmitted from the communication unit 150. [ The processor 450 may be a processing unit such as a CPU (Central Processing Unit) or a GPU (Graphics Processing Unit). The processor 450 updates the position of the face area and the target resolution value according to the face recognition result. That is, when a face is detected in the image converted by the low resolution conversion module 350, the processor 450 records the detected face region, stores the position information, and searches for a target Increase the resolution value.
즉, 프로세서(450)는 저해상도 변환 모듈(350)로 하여금 검출된 얼굴영역 외의 나머지 영역에 대해 증가된 해상도에 맞추어 다시 변환하도록 한다. 그 후 프로세서(450)는 증가된 해상도로 변환된 영상에서 얼굴 인식을 다시 수행하고, 위의 과정을 반복한다.That is, the processor 450 causes the low resolution conversion module 350 to convert again to the increased resolution for the remaining area outside the detected face area. Thereafter, the processor 450 performs face recognition again on the converted image with the increased resolution, and repeats the above process.
프로세서(450)는 저해상도 변환 모듈(350)이 증가된 목표 해상도 값에 맞추어 다시 변환한 영상의 해상도가 처음 입력된 촬영부(250)에 의해 촬영된 고해상도 영상의 해상도와 일치하는 경우, 해상도 값 증가시키기를 멈추고, 영상을 출력하도록 제어한다.When the resolution of the image converted again by the low resolution conversion module 350 in accordance with the increased target resolution value matches the resolution of the high resolution image photographed by the photographing unit 250, And stops outputting the image and outputs the image.
위의 과정을 따르게 되면, 검출되는 얼굴의 크기가 NxN이고, 해상도가 매번 이전의 s배로 증가된다고 할 때, 프로세서(450) 상에 (s*N)x(s*N) 크기 이상의 얼굴 이미지는 기록되지 않는다. 예를 들어, s가 1.5배이고 N이 10일 경우, CPU/GPU 메모리에는 15x15 보다 큰 얼굴 이미지가 절대 기록되지 않으면서 익명화 비디오 촬영이 수행될 수 있다.(S * N) x (s * N) size or more on the processor 450, if the detected face size is NxN and the resolution is increased to the previous s times It is not recorded. For example, if s is 1.5 times and N is 10, an anonymized video can be taken without ever writing a face image larger than 15x15 in the CPU / GPU memory.
즉, 본 발명의 카메라(50)는 얼굴 인식/검출과 해상도 변환을 반복적으로 수행하면서, 영상 내 얼굴영역의 해상도는 NxN으로 유지하고, 얼굴영역 외의 영역의 해상도는 최종 목표 해상도까지 증가시킨다. 이러한 과정을 통해 출력되는 영상을 “익명화 비디오”라고 한다.That is, the camera 50 of the present invention repeatedly performs face recognition / detection and resolution conversion while maintaining the resolution of the face area in the image to NxN and increasing the resolution of the area outside the face area to the final target resolution. An image output through this process is called an &quot; anonymizing video &quot;.
프로세서(450)는 영상 내에서 얼굴이 검출된 경우 검출된 얼굴영역을 제외한 나머지 영역의 해상도만 증가시키는 것이 아니라, 얼굴이 검출된 순간 영상의 해상도 올리기를 중단하고 그 시점에서의 저해상도 영상을 바로 출력하도록 제어할 수도 있다. 이 경우는 얼굴영역 외의 영역의 해상도를 증가시켜 해상도 변환 및 얼굴인식을 반복하는 경우보다 연산 속도가 빨라지지만 익명화 처리가 정밀하지 못한 단점이 있다.The processor 450 stops increasing the resolution of the instantaneous image in which the face is detected, instead of increasing only the resolution of the remaining region except for the detected face region when a face is detected in the image, and immediately outputs the low- . In this case, the resolution of the area other than the face area is increased to speed up the calculation speed compared with the case of repeating resolution conversion and face recognition, but there is a disadvantage in that the anonymization processing is not precise.
출력부(550)는 저해상도 변환 모듈(350)에 의해 변환된 영상을 출력한다. 즉, 출력부(550)는 익명화 처리된 영상을 출력하게 된다.The output unit 550 outputs the image converted by the low resolution conversion module 350. That is, the output unit 550 outputs the anonymized image.
저장부(650)는 출력부(550)에 의해 출력되는 영상을 저장할 수 있다. 저장부(650)는 주 기억 장치 및 보조 기억 장치를 포함하고, 카메라(50)의 기능 동작에 필요한 응용 프로그램을 저장한다. 저장부(650)는 크게 프로그램 영역과 데이터 영역을 포함할 수 있다.The storage unit 650 may store an image output by the output unit 550. The storage unit 650 includes a main storage device and an auxiliary storage device, and stores an application program necessary for the functional operation of the camera 50. [ The storage unit 650 may include a program area and a data area.
도 17은 본 발명의 제2 실시예에 따른 카메라(50)가 촬영한 영상을 실시간으로 익명화 처리하는 것을 간단하게 나타낸 도면이다.FIG. 17 is a view simply showing an anonymizing process of an image taken by a camera 50 according to the second embodiment of the present invention in real time.
도 17을 참조하면, 촬영부(250)는 촬영한 고해상도 영상을 저해상도 변환 모듈(350)에 전송하고, 저해상도 변환 모듈(350)은 전송 받은 고해상도 영상을 저해상도 영상으로 변환하여 프로세서(450)로 전송한다. 프로세서(450)는 저해상도 영상에서 얼굴인식을 수행하고 얼굴인식 결과에 따른 얼굴영역 위치 정보와 목표 해상도 값을 저해상도 변환 모듈(350)에 전달하여 다시 변환을 수행하도록 한다. 목표 해상도 값이 최종 목표 해상도에 이른 경우, 저해상도 변환 모듈(350)에 의해 변환된 영상이 출력으로 나오게 된다.17, the photographing unit 250 transmits the photographed high resolution image to the low resolution conversion module 350. The low resolution conversion module 350 converts the received high resolution image into a low resolution image and transmits the low resolution image to the processor 450 do. The processor 450 performs face recognition on the low-resolution image, transfers the face region location information and the target resolution value according to the face recognition result to the low resolution conversion module 350, and performs the conversion again. When the target resolution value reaches the final target resolution, the image converted by the low resolution conversion module 350 is output.
도 18은 본 발명의 제2 실시예에 따른 영상 익명화 방법이 수행되는 과정을 설명하는 순서도이다.18 is a flowchart illustrating a process of performing an image anonymization method according to the second embodiment of the present invention.
도 18을 참조하면, 촬영부(250)는 촬영한 고해상도 영상을 저해상도 변환 모듈(350)로 전송한다(S1810). 저해상도 변환 모듈(350)은 프로세서(450)로부터 얼굴영역의 위치정보와 목표 해상도 값을 입력 받아 고해상도 영상 내 얼굴영역에 대해서는 초저해상도로 변환하고(S1820), 얼굴영역 이외의 부분에 대해서는 목표 해상도 값에 맞추어 저해상도 변환을 수행한다(S1830). 다만, 초기 변환 시에는 얼굴영역 위치정보가 없으므로, 전체 이미지에 대해 초저해상도로 해상도 변환이 수행된다. 프로세서(450)는 통신부(150)로부터 전송 받은 얼굴인식 모델을 이용하여 저해상도 변환 모듈(350)에 의해 변환된 영상에 대하여 얼굴인식을 수행한다(S1840). 얼굴인식 결과 영상 내에 새로운 얼굴이 인식된 경우, 프로세서(450)는 검출된 얼굴영역의 위치정보를 추가하여 저해상도 변환 모듈(350)로 전송한다(S1850). 얼굴인식 결과 영상 내에 새로운 얼굴이 없는 경우에는, 프로세서(450)는 목표 해상도 값만을 증가시켜 저해상도 변환 모듈(350)로 전송한다(S1860). 프로세서(450)는 영상 내 얼굴영역 외의 영역의 해상도가 원본 영상인 고해상도 영상의 해상도와 일치하는 경우에는 그 시점의 영상을 출력하고 처리 과정을 종료한다. 그러나 변환된 영상에서 얼굴영역 외의 영역의 해상도가 원본 영상인 고해상도 영상의 해상도와 일치하지 않는 경우에는 얼굴인식을 다시 수행한다(S1870).Referring to FIG. 18, the photographing unit 250 transmits the photographed high-resolution image to the low resolution conversion module 350 (S1810). The low resolution conversion module 350 receives the position information of the face region and the target resolution value from the processor 450 and converts the face region in the high resolution image to the super low resolution in operation S1820. Resolution conversion (S1830). However, since there is no face region position information at the initial conversion, the resolution conversion is performed at a very low resolution with respect to the entire image. The processor 450 performs facial recognition on the image converted by the low resolution conversion module 350 using the facial recognition model transmitted from the communication unit 150 (S1840). If a new face is recognized in the face recognition result image, the processor 450 adds the position information of the detected face region to the low resolution conversion module 350 (S1850). If there is no new face in the face recognition result image, the processor 450 increases only the target resolution value and transmits it to the low resolution conversion module 350 (S1860). When the resolution of the region other than the face region in the image matches the resolution of the high resolution image, which is the original image, the processor 450 outputs the image at that point of time and ends the processing. However, if the resolution of the region other than the face region in the converted image does not match the resolution of the high resolution image, which is the original image, the face recognition is performed again (S1870).
도 19는 본 발명의 다른 실시예에 따른 얼굴인식 기반 실시간 자동 영상 익명화 방법이 수행되는 과정을 설명하는 순서도이다.FIG. 19 is a flowchart illustrating a process of performing a face recognition based real-time automatic image anonymizing method according to another embodiment of the present invention.
도 19를 참조하면, 촬영부(250)는 촬영한 고해상도 영상을 저해상도 변환 모듈(350)로 전송한다(S1910). 저해상도 변환 모듈(350)은 프로세서(450)로부터 목표 해상도 값을 입력 받아 고해상도 영상의 이미지 전체에 대해 저해상도 변환을 수행한다(S1920). 다만, 초기 변환 시에는 전체 이미지에 대해 초저해상도로 해상도 변환이 수행된다. 프로세서(450)는 통신부(150)로부터 전송 받은 얼굴인식 모델을 이용하여 저해상도 변환 모듈(350)에 의해 변환된 영상에 대하여 얼굴인식을 수행한다(S1930). 얼굴인식 결과 영상 내에 새로운 얼굴이 없는 경우에는, 프로세서(450)는 목표 해상도 값을 증가시켜 저해상도 변환 모듈(350)로 전송한다(S1940). 얼굴인식 결과 영상 내에 새로운 얼굴이 인식된 경우, 프로세서(450)는 얼굴 검출 시점에 변환된 저해상도 영상이 출력되도록 출력부(550)를 제어하고, 처리과정을 종료시킨다(S1950). 저해상도 변환 모듈(350)이 증가된 목표 해상도 값에 맞추어 다시 변환한 영상의 해상도가 원본 영상인 고해상도 영상의 해상도와 일치하는 경우, 프로세서(450)는 그 시점의 영상을 출력하고 처리 과정을 종료시킨다.Referring to FIG. 19, the photographing unit 250 transmits the photographed high resolution image to the low resolution conversion module 350 (S1910). The low resolution conversion module 350 receives the target resolution value from the processor 450 and performs low resolution conversion on the entire image of the high resolution image (S1920). However, in the initial conversion, the resolution conversion is performed at a very low resolution for the entire image. The processor 450 performs facial recognition on the image converted by the low resolution conversion module 350 using the facial recognition model transmitted from the communication unit 150 (S1930). If there is no new face in the face recognition result image, the processor 450 increases the target resolution value and transmits it to the low resolution conversion module 350 (S1940). If a new face is recognized in the face recognition result image, the processor 450 controls the output unit 550 to output the converted low resolution image at the face detection time and ends the processing in step S1950. When the resolution of the image converted by the low resolution conversion module 350 in accordance with the increased target resolution value coincides with the resolution of the high resolution image that is the original image, the processor 450 outputs the image at that point of time and ends the processing .
이상 제1 실시예 및 제2 실시예에 따른 카메라 시스템, 그리고 카메라 시스템이 구동되는 방법에 대해 살펴보았다. 본 발명에 따른 행동인식 기반 해상도 자동 조절 방법 및 행동 자동 인식 방법, 그리고 얼굴인식 기반의 영상 익명화 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다.The camera system according to the first and second embodiments and the method of driving the camera system have been described above. The automatic recognition of behavior-based resolution, the automatic behavior recognition method, and the face recognition based image anonymization method according to the present invention can be implemented in the form of a program command that can be executed through various computer means and recorded in a computer readable medium . The computer-readable medium may include program instructions, data files, data structures, and the like, alone or in combination. The program instructions recorded on the medium may be those specially designed and constructed for the present invention or may be available to those skilled in the art of computer software. Examples of computer-readable media include magnetic media such as hard disks, floppy disks and magnetic tape; optical media such as CD-ROMs and DVDs; magnetic media such as floppy disks; Magneto-optical media, and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. Examples of program instructions include machine language code such as those produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like.

Claims (15)

  1. 고해상도 학습 영상을 바탕으로 저해상도 영상 내의 인간행동을 인식하는데 기초가 되는 학습 데이터를 생성하고, 학습 데이터로부터 인간행동 인식 모델을 생성하는 데이터서버; 및A data server for generating learning data based on recognition of human behavior in the low resolution image based on the high resolution learning image and generating a human behavior recognition model from the learning data; And
    저해상도 영상을 촬영하고,A low-resolution image is taken,
    상기 데이터서버로부터 전송 받은 인간행동 인식 모델을 이용하여 저해상도 영상 내 인간행동을 인식하는 제1 행동인식을 통해 위기상황으로 판단되면, 기존 해상도 보다 고해상도 모드로 변경하여 영상을 촬영하는 카메라;를 포함하는 카메라 시스템.And a camera for capturing an image by changing to a high resolution mode than a conventional resolution if it is determined through a first behavior recognition that recognizes human behavior in a low resolution image using a human behavior recognition model transmitted from the data server, Camera system.
  2. 제1항에 있어서, The method according to claim 1,
    상기 카메라는,The camera comprises:
    상기 고해상도 모드로 변경하여 촬영된 영상에 대해 상기 인간행동 인식 모델을 이용하여 영상 내 인간행동을 인식하는 제2 행동인식을 통해 위기상황인지 판단하고,Determining whether the captured image is a crisis state through a second behavior recognition that recognizes human behavior in the image using the human behavior recognition model,
    위기상황에 해당하는 경우 고해상도로 촬영된 영상을 저장하고,If the situation is a crisis, you can save the images shot at high resolution,
    위기상황이 아닌 경우 저해상도로 촬영된 영상을 저장하는 것을 특징으로 하는 카메라 시스템.And stores the image photographed at a low resolution in the case of a non-crisis situation.
  3. 제1항에 있어서,The method according to claim 1,
    상기 카메라는,The camera comprises:
    저해상도 영상을 촬영하는 촬영부;A photographing unit for photographing a low-resolution image;
    데이터 서버로부터 저해상도 영상 내의 인간행동을 인식하는데 사용되는 인간행동 인식 모델을 수신하여 영상분석부로 전송하는 통신부;A communication unit for receiving a human behavior recognition model used for recognizing human behavior in the low resolution image from the data server and transmitting the received human behavior recognition model to the image analysis unit;
    상기 통신부로부터 수신한 인간행동 인식 모델을 이용하여 저해상도 영상 내 인간행동을 인식하는 제1 행동인식을 통해 위기상황인지 판단하는 영상분석부;An image analyzing unit for determining whether the user is in a crisis state through a first behavior recognition process for recognizing a human behavior in a low resolution image using the human behavior recognition model received from the communication unit;
    상기 영상분석부에 의해 위기상황으로 판단되면 상기 촬영부가 기존 해상도 보다 고해상도 모드로 변경하여 촬영하도록 제어하는 제어부; 및A control unit for controlling the photographing unit to change to a higher resolution mode than the existing resolution and to photograph the photographing unit if a crisis state is determined by the image analysis unit; And
    상기 촬영부에 의해 촬영된 저해상도 영상 또는 고해상도 영상을 저장하는 저장부;를 포함하는 것을 특징으로 하는 카메라 시스템.And a storage unit for storing a low-resolution image or a high-resolution image photographed by the photographing unit.
  4. 제3항에 있어서,The method of claim 3,
    상기 영상분석부는,Wherein the image analyzing unit comprises:
    상기 고해상도 모드로 변경하여 촬영된 영상에 대해 상기 인간행동 인식 모델을 이용하여 영상 내 인간행동을 인식하는 제2 행동인식을 통해 위기상황인지 판단하는 것을 특징으로 하는 카메라 시스템.And determining whether the image is a crisis state through a second behavior recognition that recognizes human behavior in the image using the human behavior recognition model with respect to the image photographed by changing to the high resolution mode.
  5. 제1항에 있어서,The method according to claim 1,
    상기 데이터서버는,The data server comprising:
    고해상도의 인간행동 영상을 복수의 저해상도 영상으로 변환하는 해상도 변환부;A resolution converter for converting a high-resolution human behavior image into a plurality of low-resolution images;
    상기 해상도 변환부에 의해 생성된 복수의 저해상도 영상을 입력 받아 공간 스트림 및 시간 스트림으로 구분하고, 각각의 스트림에 대해 콘볼루션 및 풀링(Pooling)을 수행하고, 완전 연결 레이어(Fully connected layer)를 추가 적용하여 콘볼루션 신경망(Convolutional Neural Network)을 학습시키는 콘볼루션 신경망 학습부; 및A plurality of low resolution images generated by the resolution conversion unit are received and divided into a spatial stream and a time stream, and convolution and pooling are performed for each stream, and a fully connected layer is added A convolution neural network learning unit for learning a convolutional neural network; And
    상기 콘볼루션 신경망 학습부에 의해 학습된 데이터를 기초로 인간행동 인식 모델을 생성하는 인간행동 인식 모델 생성부;를 포함하는 것을 특징으로 하는 카메라 시스템.And a human behavior recognition model generation unit that generates a human behavior recognition model based on data learned by the convolution neural network learning unit.
  6. 카메라가 행동인식에 기반하여 해상도를 자동 조절하는 방법에 있어서,A method for automatically adjusting a resolution based on a behavior recognition by a camera,
    (a) 카메라의 촬영부가 저해상도로 영상을 촬영하는 단계;(a) photographing an image of a camera at a low resolution;
    (b) 카메라의 통신부가 외부의 데이터 서버로부터 저해상도 영상 내의 인간행동을 인식하는데 사용되는 인간행동 인식 모델을 수신하여 카메라의 영상분석부로 전송하는 단계;(b) receiving a human behavior recognition model used by the communication unit of the camera for recognizing human behavior in a low-resolution image from an external data server and transmitting the human behavior recognition model to an image analysis unit of the camera;
    (c) 상기 영상분석부가 상기 카메라의 통신부로부터 수신한 인간행동 인식 모델을 이용하여 저해상도 영상 내 인간행동을 인식하는 제1 행동인식을 통해 위기상황인지 판단하는 단계;(c) determining whether the image analyzing unit is in a crisis state through a first behavior recognition that recognizes human behavior in a low-resolution image using the human behavior recognition model received from the camera communication unit;
    (d) 상기 영상분석부에 의해 위기상황으로 판단되면, 카메라의 제어부는, 상기 촬영부가 기존 해상도 보다 고해상도 모드로 변경하여 촬영하도록 제어하는 단계; 및(d) when the image analyzing unit determines that the image capturing unit is in a crisis state, the control unit of the camera controls the image capturing unit to change the image capturing unit to a higher resolution mode than the existing resolution image capturing unit; And
    (e) 카메라의 저장부가 상기 촬영부에 의해 촬영된 저해상도 영상 또는 고해상도 영상을 저장하는 단계;를 포함하는 해상도 자동 조절 방법.(e) storing a low-resolution image or a high-resolution image captured by the photographing unit in a storage unit of the camera.
  7. 제6항에 있어서,The method according to claim 6,
    상기 (d) 단계와 (e) 단계 사이에,Between step (d) and step (e)
    상기 영상분석부가 상기 고해상도 모드로 변경하여 촬영된 영상에 대해 상기 인간행동 인식 모델을 이용하여 영상 내 인간행동을 인식하는 제2 행동인식을 통해 위기상황인지 판단하는 단계;를 더 포함하는 해상도 자동 조절 방법.And a step of determining whether the image is a crisis state through a second behavior recognition that recognizes human behavior in the image using the human behavior recognition model for the image captured by changing the image analysis unit to the high resolution mode, Way.
  8. 카메라 시스템이 영상 내 행동을 자동 인식하는 방법에 있어서,A method for automatically recognizing an action in a video camera system,
    (a) 데이터서버의 해상도 변환부가 고해상도의 인간행동 영상을 복수의 저해상도 영상으로 변환하는 단계;(a) converting a high-resolution human behavior image into a plurality of low-resolution images by resolution conversion of a data server;
    (b) 데이터서버의 콘볼루션 신경망 학습부가 상기 복수의 저해상도 영상을 입력 받아 공간 스트림 및 시간 스트림으로 구분하고, 각각의 스트림에 대해 콘볼루션 및 풀링(Pooling)을 수행하고, 완전 연결 레이어(Fully connected layer)를 추가 적용하여 콘볼루션 신경망(Convolutional Neural Network)을 학습시키는 단계;(b) a convolutional neural network learning unit of the data server receives the plurality of low-resolution images and divides them into a spatial stream and a time stream, performs convolution and pooling for each stream, layer is further applied to learn a Convolutional Neural Network;
    (c) 데이터서버의 인간행동 인식 모델 생성부가 상기 콘볼루션 신경망 학습부에 의해 학습된 데이터를 기초로 인간행동 인식 모델을 생성하는 단계;(c) generating a human behavior recognition model based on data learned by the convolutional neural network learning unit;
    (d) 카메라의 촬영부가 저해상도 영상을 촬영하는 단계; 및(d) photographing a low-resolution image of a photographing part of the camera; And
    (e) 카메라의 영상분석부가 상기 데이터서버로부터 수신한 인간행동 인식 모델을 이용하여 상기 촬영부에 의해 촬영된 저해상도 영상 내 인간행동을 인식하는 단계;를 포함하는 행동 자동 인식 방법.(e) recognizing human behavior in the low-resolution image photographed by the photographing unit using the human behavior recognition model received from the data server by the image analysis unit of the camera.
  9. 학습 영상을 이용하여 영상 내의 얼굴인식을 위한 학습을 수행하고 얼굴인식 모델을 생성하는 데이터서버; 및A data server that performs learning for face recognition in an image using a learning image and generates a face recognition model; And
    상기 데이터 서버로부터 상기 얼굴인식 모델을 전송 받고,Receiving the face recognition model from the data server,
    고해상도 영상을 촬영하여 저해상도 변환을 수행하며,Resolution conversion is performed by capturing a high-resolution image,
    상기 데이터서버로부터 전송 받은 얼굴인식 모델을 이용하여 변환된 영상 내의 얼굴을 인식하고,Recognizing a face in the converted image using the face recognition model transmitted from the data server,
    얼굴인식 결과에 따라 검출된 얼굴영역과 얼굴영역 이외의 영역을 구분하여 서로 다른 해상도를 갖도록 재변환함으로써 얼굴영역에 대해 익명화 처리된 영상을 출력하는 카메라;를 포함하는 카메라 시스템.And a camera for outputting the anonymized image to the face region by re-converting the detected face region and the region other than the face region according to the face recognition result to have different resolutions.
  10. 외부의 데이터 서버로부터 영상 내의 얼굴인식을 위한 얼굴인식 모델을 수신하는 통신부;A communication unit for receiving a face recognition model for face recognition in an image from an external data server;
    고해상도 영상을 촬영하는 촬영부;A photographing unit for photographing a high-resolution image;
    상기 촬영부에 의해 촬영된 고해상도 영상을 입력 받고, 프로세서로부터 얼굴영역의 위치정보와 목표 해상도 값을 입력 받아 고해상도 영상을 변환하는 저해상도 변환 모듈;A low-resolution conversion module that receives the high-resolution image photographed by the photographing unit, receives the position information of the face region and the target resolution value from the processor, and converts the high-resolution image;
    상기 통신부로부터 전송 받은 얼굴인식 모델을 이용하여 상기 저해상도 변환 모듈에 의해 변환된 영상 내의 얼굴을 인식하고, 얼굴인식 결과에 따라 얼굴영역의 위치와 목표 해상도 값을 갱신하는 프로세서; 및A processor for recognizing a face in the image converted by the low resolution conversion module using the face recognition model transmitted from the communication unit and updating the position of the face region and the target resolution value according to the face recognition result; And
    상기 저해상도 변환 모듈에 의해 변환된 영상을 출력하는 출력부;를 포함하는 카메라.And an output unit for outputting the image converted by the low resolution conversion module.
  11. 제10항에 있어서,11. The method of claim 10,
    상기 프로세서는,The processor comprising:
    얼굴인식을 통해 상기 저해상도 변환 모듈에 의해 변환된 영상 내에서 얼굴이 검출된 경우, When a face is detected in the image converted by the low resolution conversion module through face recognition,
    검출된 얼굴영역을 기록하여 그 위치정보를 저장하고,The detected face area is recorded, the position information is stored,
    상기 저장된 얼굴영역을 제외한 나머지 영역에 대한 목표 해상도 값을 증가시키며,The target resolution value for the remaining area excluding the stored face area is increased,
    상기 저해상도 변환 모듈이 상기 저장된 얼굴영역을 제외한 나머지 영역을 증가된 목표 해상도에 맞추어 다시 변환하도록 제어하는 것을 특징으로 하는 카메라.Wherein the low-resolution conversion module controls to convert the remaining region except for the stored face region back to an increased target resolution.
  12. 외부의 데이터 서버로부터 영상 내의 얼굴인식을 위한 얼굴인식 모델을 수신하는 통신부;A communication unit for receiving a face recognition model for face recognition in an image from an external data server;
    고해상도 영상을 촬영하는 촬영부;A photographing unit for photographing a high-resolution image;
    상기 촬영부에 의해 촬영된 고해상도 영상을 입력 받고, 프로세서로부터 목표 해상도 값을 입력 받아 고해상도 영상을 변환하는 저해상도 변환 모듈;A low-resolution conversion module that receives a high-resolution image captured by the imaging unit, receives a target resolution value from the processor, and converts the high-resolution image;
    상기 통신부로부터 전송 받은 얼굴인식 모델을 이용하여 상기 저해상도 변환 모듈에 의해 변환된 영상 내의 얼굴을 인식하고, 얼굴인식 결과에 따라 목표 해상도 값을 갱신하는 프로세서; 및A processor for recognizing a face in the image converted by the low resolution conversion module using the face recognition model transmitted from the communication unit and updating the target resolution value according to the face recognition result; And
    상기 저해상도 변환 모듈에 의해 변환된 영상을 출력하는 출력부;를 포함하는 카메라.And an output unit for outputting the image converted by the low resolution conversion module.
  13. 제12항에 있어서,13. The method of claim 12,
    상기 프로세서는,The processor comprising:
    상기 저해상도 변환 모듈에 의해 변환된 영상 내에서 얼굴이 검출된 경우,When a face is detected in the image converted by the low resolution conversion module,
    검출 시점에 변환된 저해상도 영상을 출력하도록 제어하는 것을 특징으로 하는 카메라.And outputs the converted low-resolution image at the detection time point.
  14. 카메라가 영상을 익명화 하는 방법에 있어서,In a method for a camera to anonymize an image,
    (a) 카메라의 촬영부가 고해상도 영상을 촬영하는 단계;(a) capturing a high-resolution image of a photographing part of the camera;
    (b) 카메라의 통신부가 외부의 데이터 서버로부터 영상 내의 얼굴인식을 위한 얼굴인식 모델을 수신하는 단계;(b) a communication unit of the camera receives a face recognition model for face recognition in an image from an external data server;
    (c) 카메라의 저해상도 변환 모듈이 상기 촬영부에 의해 촬영된 고해상도 영상을 입력 받고, 프로세서로부터 얼굴영역의 위치와 목표 해상도 값을 입력 받아 고해상도 영상을 변환하는 단계;(c) receiving a high-resolution image photographed by the photographing unit by the low resolution conversion module of the camera, receiving the position of the face region and the target resolution value from the processor, and converting the high resolution image;
    (d) 카메라의 프로세서가 상기 통신부로부터 전송 받은 얼굴인식 모델을 이용하여 상기 저해상도 변환 모듈에 의해 변환된 영상 내의 얼굴을 인식하고, 얼굴인식 결과에 따라 얼굴영역의 위치와 목표 해상도 값을 갱신하는 단계; 및(d) recognizing a face in the image converted by the low resolution conversion module using the face recognition model received from the communication unit of the camera, and updating the position of the face region and the target resolution value according to the face recognition result ; And
    (e) 카메라의 출력부가 상기 저해상도 변환 모듈에 의해 변환된 영상을 출력하는 단계를 포함하는 영상 익명화 방법.(e) outputting an image converted by an output unit of the camera into the low resolution conversion module.
  15. 카메라가 영상을 익명화 하는 방법에 있어서,In a method for a camera to anonymize an image,
    (a) 카메라의 촬영부가 고해상도 영상을 촬영하는 단계;(a) capturing a high-resolution image of a photographing part of the camera;
    (b) 카메라의 통신부가 외부의 데이터 서버로부터 영상 내의 얼굴인식을 위한 얼굴인식 모델을 수신하는 단계;(b) a communication unit of the camera receives a face recognition model for face recognition in an image from an external data server;
    (c) 카메라의 저해상도 변환 모듈이 상기 촬영부에 의해 촬영된 고해상도 영상을 입력 받고, 프로세서로부터 목표 해상도 값을 입력 받아 고해상도 영상을 변환하는 단계;(c) receiving a high resolution image photographed by the photographing unit by the low resolution conversion module of the camera, receiving the target resolution value from the processor, and converting the high resolution image;
    (d) 카메라의 프로세서가 상기 통신부로부터 전송 받은 얼굴인식 모델을 이용하여 상기 저해상도 변환 모듈에 의해 변환된 영상 내의 얼굴을 인식하고, 얼굴인식 결과에 따라 목표 해상도 값을 갱신하는 단계; 및(d) recognizing a face in the image converted by the low resolution conversion module using the face recognition model received from the communication unit of the camera processor, and updating the target resolution value according to the face recognition result; And
    (e) 카메라의 출력부가 상기 저해상도 변환 모듈에 의해 변환된 영상을 출력하는 단계를 포함하는 영상 익명화 방법.(e) outputting an image converted by an output unit of the camera into the low resolution conversion module.
PCT/KR2018/008196 2017-07-20 2018-07-20 Camera system for protecting privacy and method therefor WO2019017720A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
KR1020170092204A KR101911900B1 (en) 2017-07-20 2017-07-20 Privacy-preserving camera, system the same and real-time automated video anonymization method based on face detection
KR10-2017-0092203 2017-07-20
KR10-2017-0092204 2017-07-20
KR1020170092203A KR101876433B1 (en) 2017-07-20 2017-07-20 Activity recognition-based automatic resolution adjustment camera system, activity recognition-based automatic resolution adjustment method and automatic activity recognition method of camera system

Publications (1)

Publication Number Publication Date
WO2019017720A1 true WO2019017720A1 (en) 2019-01-24

Family

ID=65015145

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2018/008196 WO2019017720A1 (en) 2017-07-20 2018-07-20 Camera system for protecting privacy and method therefor

Country Status (1)

Country Link
WO (1) WO2019017720A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111178319A (en) * 2020-01-06 2020-05-19 山西大学 Video behavior identification method based on compression reward and punishment mechanism
CN111970509A (en) * 2020-08-10 2020-11-20 杭州海康威视数字技术股份有限公司 Video image processing method, device and system
DE102020115697A1 (en) 2020-06-15 2021-12-16 Iav Gmbh Ingenieurgesellschaft Auto Und Verkehr Method and device for natural facial anonymization in real time

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20110033680A (en) * 2009-09-25 2011-03-31 삼성전자주식회사 Apparatus for processing image in robot system and method thereof
KR101458136B1 (en) * 2014-02-28 2014-11-05 김호 Video processing method, video processing server performing the same, monitoring server performing the same, system performing the same and storage medium storing the same
KR20140131188A (en) * 2013-05-03 2014-11-12 딕스비전 주식회사 Integrated monitoring and controlling system and method including privacy protecion and interest emphasis function
KR20160088224A (en) * 2015-01-15 2016-07-25 삼성전자주식회사 Method for recognizing an object and apparatus thereof
KR101722664B1 (en) * 2015-10-27 2017-04-18 울산과학기술원 Multi-viewpoint System, Wearable Camera, CCTV, Control Server And Method For Active Situation Recognition

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20110033680A (en) * 2009-09-25 2011-03-31 삼성전자주식회사 Apparatus for processing image in robot system and method thereof
KR20140131188A (en) * 2013-05-03 2014-11-12 딕스비전 주식회사 Integrated monitoring and controlling system and method including privacy protecion and interest emphasis function
KR101458136B1 (en) * 2014-02-28 2014-11-05 김호 Video processing method, video processing server performing the same, monitoring server performing the same, system performing the same and storage medium storing the same
KR20160088224A (en) * 2015-01-15 2016-07-25 삼성전자주식회사 Method for recognizing an object and apparatus thereof
KR101722664B1 (en) * 2015-10-27 2017-04-18 울산과학기술원 Multi-viewpoint System, Wearable Camera, CCTV, Control Server And Method For Active Situation Recognition

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111178319A (en) * 2020-01-06 2020-05-19 山西大学 Video behavior identification method based on compression reward and punishment mechanism
DE102020115697A1 (en) 2020-06-15 2021-12-16 Iav Gmbh Ingenieurgesellschaft Auto Und Verkehr Method and device for natural facial anonymization in real time
CN111970509A (en) * 2020-08-10 2020-11-20 杭州海康威视数字技术股份有限公司 Video image processing method, device and system

Similar Documents

Publication Publication Date Title
WO2019172546A1 (en) Electronic apparatus and control method thereof
WO2019050360A1 (en) Electronic device and method for automatic human segmentation in image
WO2019164232A1 (en) Electronic device, image processing method thereof, and computer-readable recording medium
WO2020130309A1 (en) Image masking device and image masking method
WO2019017720A1 (en) Camera system for protecting privacy and method therefor
WO2019225964A1 (en) System and method for fast object detection
WO2022114731A1 (en) Deep learning-based abnormal behavior detection system and detection method for detecting and recognizing abnormal behavior
WO2021132798A1 (en) Method and apparatus for data anonymization
WO2022071695A1 (en) Device for processing image and method for operating same
WO2022075772A1 (en) Image inpainting method and device
WO2020080734A1 (en) Face recognition method and face recognition device
EP3707909A1 (en) Electronic device and method for correcting images using external electronic device
WO2020067615A1 (en) Method for controlling video anonymization device for improving anonymization performance, and device therefor
WO2019190142A1 (en) Method and device for processing image
WO2022097766A1 (en) Method and device for restoring obscured area
EP3707678A1 (en) Method and device for processing image
WO2023018084A1 (en) Method and system for automatically capturing and processing an image of a user
WO2023210884A1 (en) Device and method for removing noise on basis of non-local means
WO2022225102A1 (en) Adjustment of shutter value of surveillance camera via ai-based object recognition
WO2019198900A1 (en) Electronic apparatus and control method thereof
WO2021125521A1 (en) Action recognition method using sequential feature data and apparatus therefor
WO2023059042A1 (en) Method, system and apparatus for image orientation correction
WO2024071813A1 (en) Electronic device for classifying object area and background area, and operating method of electronic device
WO2021235641A1 (en) Age estimation device and method for estimating age
WO2023229185A1 (en) Electronic device and image processing method thereof

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18834584

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18834584

Country of ref document: EP

Kind code of ref document: A1