WO2020130309A1

WO2020130309A1 - Image masking device and image masking method

Info

Publication number: WO2020130309A1
Application number: PCT/KR2019/013173
Authority: WO
Inventors: 강지홍; 임비
Original assignee: 주식회사 로민
Priority date: 2018-12-20
Filing date: 2019-10-08
Publication date: 2020-06-25
Also published as: KR101972918B1

Abstract

The present application relates to an image masking device and an image masking method. An image masking method, according to one embodiment of the present invention, may comprise the steps of: decoding a received target video so as to generate frame images on a frame by frame basis; extracting a frame feature vector corresponding to each frame image from the frame images, and detecting candidate objects from the frame feature vector; extracting an object feature vector corresponding to each candidate object from the frame feature vector by using location information of the detected candidate objects; setting identification information for the candidate objects by using the object feature vector; tracking candidate objects having the same identification information in continuous frame images, and generating, for each identification information, tracking information for the candidate objects; and when receiving a masking input, extracting, by using the tracking information, candidate objects identical to the identification information included in the masking input, and performing masking on the extracted candidate objects.

Description

Video masking device and video masking method

The present application relates to a video masking device and a video masking method, and relates to a video masking device and a video masking method capable of masking a video for personal privacy.

With the recent spread of networks, such as the Internet, and the high functionality of various devices (personal computers, smartphones, cameras, etc.) used by ordinary users, it is easy to publicize videos shot by individuals online or share them with others. I can do it. However, as the number of videos shared on the network increases, the problem of exposure to personal information or invasion of privacy becomes serious, such as videos of faces of arbitrary persons being released on the network without permission. In order to solve this problem, in the related art, a non-identification technique has been proposed in which a face of a person included in a video is mosaicized to prevent exposure of personal information.

However, the conventional de-identification technology has a difficulty in that the user has to perform the de-identification processing for the corresponding video, particularly, in the case of de-identifying only a specific person among various persons included in the video There were difficulties such as having to manually check for all the characters included in the whole video.

On the other hand, as a method for exceptionally handling the masking of a specific person, the registration number 10-1215948, "a video information masking method of a surveillance system based on body information and face recognition" has been proposed. In other words, if the face recognition of a person is possible from the image information, the distance between the tail and the middle of the face is detected, and based on this, the face of a specific person stored in the database is identified, and the masking process of the person is separately processed. Is being presented.

However, in the case of the corresponding technology, since each of the technologies used to detect a person in the image, recognize a face, and track all is independently performed, a very large amount of computation is required to perform each process. In addition, since the performance of the entire system is determined by the lower limit of performance of a specific technology, a problem occurs in that the detection rate and recognition rate of the person are lowered. In addition, since the distance between the sheep and the middle is used as a feature point used for face recognition, there is a problem in that a face recognition rate is significantly reduced in a real system.

This application is intended to provide a video masking device and a video masking method capable of masking a video to protect the privacy of an individual.

The present application is intended to provide an image masking apparatus and a method for masking images that can selectively mask objects included in a video.

The present application is intended to provide an image masking apparatus and an image masking method capable of efficiently masking by tracking objects included in a video.

The present application is intended to provide an image masking apparatus and a method for masking an image having an excellent detection rate and recognition rate in detecting, tracking, and recognizing an object included in a video.

An image masking method according to an embodiment of the present invention includes: generating a frame image in a frame unit by decoding an input target video; Extracting a frame feature vector corresponding to each frame image from the frame image, and detecting a candidate object from the frame feature vector; Extracting an object feature vector corresponding to each candidate object from the frame feature vector using the detected location information of the candidate objects; Setting identification information for the candidate objects using the object feature vector; Tracking candidate objects having the same identification information in a continuous frame image to generate tracking information for the candidate objects for each identification information; And when receiving the masking input, extracting candidate objects identical to the identification information included in the masking input using the tracking information and performing masking on the extracted candidate objects.

Here, the setting of the identification information may include: searching a registration feature vector corresponding to the object feature vector in the identification information database; When a registration feature vector corresponding to the object feature vector is searched, extracting identification information matching the registration feature vector from the identification information database and setting the identification information of the candidate object; And if the registration feature vector corresponding to the object feature vector is not searched, generating identification information of the candidate object, and registering the object feature vector and the identification information in the identification information database. .

Here, in the step of searching in the identification information database, if the object feature vector matches the registered feature vector within a predetermined error range, it may be determined that the object feature vector corresponds to the registered feature vector.

Herein, the generating of the tracking information may track the same candidate object by using a difference value of an object feature vector of each candidate object included in the continuous frame image and a change in the position and size of the candidate objects. have.

Here, the step of generating the tracking information includes: Error = (V1-V2) + a × (d1-d2) + b × (s1-s2), where Error is a tracking error value and V1 is included in the first frame image. The object characteristic vector of the first candidate object, V2 is the object characteristic vector of the second candidate object included in the second frame image, d1 is the distance from the reference point to the center point of the first candidate object, and d2 is the second candidate object's distance from the reference point. The distance to the center point, s1 is the area of the first candidate object, s2 is the area of the second candidate object, a, b is the weight, and the second candidate object having the minimum tracking error value is the same as the first candidate object It can be determined as a candidate object having information.

Here, in the image masking method according to an embodiment of the present invention, when a candidate object having the same identification information is partially missing in a continuous frame image, an error is determined as an error in setting identification information for the candidate object It may further include a detection step.

Here, the tracking information may include at least one of identification information of the candidate object, frame information where the candidate object appears, and location information and size information of the candidate object included in the frame image.

Here, the image masking method according to an embodiment of the present invention, using the tracking information, generating an object tracking image by overlaying the candidate object and the identification information for each candidate object on the target video It may further include.

Here, in the image masking method according to an embodiment of the present invention, a list of identification information arranged with identification information corresponding to candidate objects included in the target video, and candidate objects having the same identification information appear in the target video The method may further include displaying a masking selection interface including appearance section information and a frame image in which the candidate object appears.

Here, the masking input may be generated by using identification information input from a user or by extracting the identification information according to a preset selection algorithm.

Here, in the performing of the masking, using the tracking information, extracting frame images in which candidate objects corresponding to the identification information appear, and setting a masking area corresponding to the location of the candidate objects in the extracted frame images Can.

Here, in the step of performing the masking, the masking area may be masked by using blurring, mosaic processing, or image replacement.

Here, the step of performing the masking may store masking information for a masking area set in the target video and the target video, and mask the target video using the masking information when playing the target video and play the target video. have.

Here, the step of performing the masking may require an access right when the target video is played, and if the access right is not available, the target video may be masked and played.

According to an embodiment of the present invention, a computer program stored in a medium may exist in combination with hardware to execute the image masking method described above.

An image masking apparatus according to an embodiment of the present invention, a frame input unit for generating a frame image of a plurality of frames by decoding (decoding) the input target video; A frame feature vector corresponding to each frame image is extracted from the frame image, a candidate object is detected from the frame feature vector, and using the location information of the detected candidate objects, each candidate object from the frame feature vector A feature vector extraction unit for extracting an object feature vector corresponding to the; An identification information setting unit configured to set identification information for the candidate objects using the object feature vector; A tracking information generation unit that tracks candidate objects having the same identification information in a continuous frame image and generates tracking information for the candidate objects for each identification information; And a masking unit that, upon receiving the masking input, extracts candidate objects identical to the identification information included in the masking input using the tracking information, and performs masking on the extracted candidate objects.

An image masking apparatus according to an embodiment of the present invention, the processor; And a memory coupled to the processor, wherein the memory includes one or more modules configured to be executed by the processor, and the one or more modules decode a received target video and perform frame-by-frame. A frame image is generated, a frame feature vector corresponding to each frame image is extracted from the frame image, a candidate object is detected from the frame feature vector, and location information of the detected candidate objects is used to obtain the frame feature After extracting an object feature vector corresponding to each candidate object from a vector, the object feature vector is used to set identification information for the candidate objects, and tracking candidate objects having the same identification information in a continuous frame image By generating tracking information for the candidate objects for each identification information, and receiving the masking input, extract the candidate objects identical to the identification information included in the masking input using the tracking information, and extract the candidate It may include a command, which performs masking on the objects.

In addition, the solution means of the above-mentioned subject does not list all the characteristics of this invention. Various features of the present invention and the advantages and effects thereof may be understood in more detail with reference to specific embodiments below.

According to the image masking apparatus and the image masking method according to an embodiment of the present invention, since each object included in a video can be distinguished, a user-selected object can be selectively masked to be de-identified. That is, rather than collectively de-identifying the objects included in the video, it is possible to selectively de-identify the selected specific object, thereby improving user convenience.

According to the image masking apparatus and the image masking method according to an embodiment of the present invention, since the position of each object included in the video can be tracked, a specific object selected by the user can be easily extracted and masked over the entire video have.

According to the image masking apparatus and the image masking method according to an embodiment of the present invention, since it is possible to perform detection, tracking and recognition of a candidate object from one frame feature vector, it is efficient and machine learning of each detection, tracking and recognition Algorithms and the like can be used to learn to perform at the same time, which can significantly improve performance and computational speed.

According to the image masking apparatus and the image masking method according to an embodiment of the present invention, by providing a masked video when an unspecified person who does not have access permission to view a video, it is possible to prevent personal information exposure and privacy infringement.

However, the effects that the image masking apparatus and the image masking method according to the embodiments of the present invention can achieve are not limited to those mentioned above, and other effects not mentioned are technologies belonging to the present invention from the following description It will be clearly understood by those skilled in the art.

1 is a schematic diagram showing an image masking system according to an embodiment of the present invention.

2 and 3 are block diagrams showing an image masking apparatus according to an embodiment of the present invention.

4 is a schematic diagram showing candidate object extraction and identification information setting according to an embodiment of the present invention.

5 is a schematic diagram showing a masking selection interface according to an embodiment of the present invention.

6 is a flowchart illustrating an image masking method according to an embodiment of the present invention.

Hereinafter, exemplary embodiments disclosed in the present specification will be described in detail with reference to the accompanying drawings, but the same or similar elements will be given the same reference numbers regardless of the reference numerals, and redundant descriptions thereof will be omitted. The suffixes "modules" and "parts" for the components used in the following description are given or mixed only considering the ease of writing the specification, and do not have meanings or roles distinguished from each other in themselves. That is, the term'unit' used in the present invention means a hardware component such as software, FPGA or ASIC, and'unit' performs certain roles. However,'wealth' is not limited to software or hardware. The'unit' may be configured to be in an addressable storage medium or may be configured to reproduce one or more processors. Thus, as an example,'part' refers to components such as software components, object-oriented software components, class components and task components, processes, functions, attributes, procedures, Includes subroutines, segments of program code, drivers, firmware, microcode, circuitry, data, database, data structures, tables, arrays and variables. The functionality provided within components and'parts' may be combined into a smaller number of components and'parts' or further separated into additional components and'parts'.

In addition, in describing the embodiments disclosed in this specification, detailed descriptions of related well-known technologies are omitted when it is determined that the subject matter of the embodiments disclosed herein may be obscured. In addition, the accompanying drawings are only for easy understanding of the embodiments disclosed herein, and the technical spirit disclosed in the specification is not limited by the accompanying drawings, and all modifications included in the spirit and technical scope of the present invention , It should be understood to include equivalents or substitutes.

Referring to FIG. 1, an image masking system according to an embodiment of the present invention may include an image photographing apparatus 1 and an image masking apparatus 100.

Hereinafter, an image masking system according to an embodiment of the present invention will be described with reference to FIG. 1.

The image photographing device 1 may photograph a surrounding environment to generate a video. Here, the video photographing device 1 may be any device that can photograph a video, such as a video camera or a camcorder. The video photographing device 1 may stream the photographed video in real time or store it as a file, and the video photographing device 1 may provide wired or wireless communication to transmit the photographed video.

The image masking apparatus 100 may receive a target video to be masked, and may perform masking on objects included in the received target video. Here, the image masking device 100 may receive the target video shot by the video recording device 1 in a file or data format, and in accordance with an embodiment, receives the target video streaming in real time from the video recording device 1 It is also possible.

In FIG. 1, although the image masking apparatus 100 is shown separately provided from the image photographing apparatus 1, according to an embodiment, the image masking apparatus 100 may include an image processing apparatus 1 such as a CCTV, a vehicle black box, or a camera. ), or may be provided in a separate computer or smartphone.

Meanwhile, the image masking apparatus 100 may distinguish objects included in the target video, and may perform masking on the distinguished objects. That is, in order to prevent exposure of personal information or invasion of privacy, some objects included in the video may be masked and de-identified. For example, in the case of a video recording device 1 such as a CCTV or a black box for a vehicle, since the set shooting area is randomly recorded, the face or body part of another person, a vehicle license plate, etc. may be photographed. Here, when a video is released through the Internet or the like, personal information such as another person's face may be exposed, causing problems such as invasion of privacy. Therefore, in order to prevent such a problem, it is necessary to process such that it cannot be identified by masking the face or the license plate of another person included in the target video.

Conventionally, there was a difficulty in that a user has to perform masking directly on a target video, and even in the case of providing automatic masking, it is common to collectively mask all objects included in the video. Therefore, in the case of masking only a specific person from among various persons included in a video, or masking excluding only a specific person, manually check whether or not it corresponds to a specific person for all persons included in the entire video, and perform masking. Difficulties existed such as having to be performed.

On the other hand, according to the image masking apparatus 100 according to an embodiment of the present invention, it is possible to identify each object appearing in the video, and automatically performs masking by tracking only selected objects among objects included in the video It is possible to do.

Hereinafter, an image masking apparatus according to an embodiment of the present invention will be described.

2 is a block diagram showing an image masking apparatus according to an embodiment of the present invention.

Referring to FIG. 2, the image masking apparatus 100 according to an embodiment of the present invention includes a frame input unit 110, a feature vector extraction unit 120, an identification information setting unit 130, and tracking information generation unit 140. , An error detection unit 150, a masking unit 160, an object tracking image generation unit 170, and a masking selection interface display unit 180.

The frame input unit 110 may generate an image in a frame unit by decoding an input target video. Specifically, the frame input unit 110 may receive a target video encoded in a Moving Picture Experts Group (MPEC) method from the image photographing device 1, and then decodes the target video in the form of a compressed MPEC file, and the target video Each of the frames of can be captured to generate a frame image. Here, the target video may include N frame images, and frame numbers from 1 to N may be assigned to each frame image. Here, N corresponds to a natural number.

The feature vector extracting unit 120 may extract a frame feature vector corresponding to each frame image from the frame image. The frame feature vector generated by the feature vector extracting unit 120 may be used to detect candidate objects included in the frame image or to distinguish and recognize each candidate object, and is included in consecutive frame images. It can also be used to track objects.

First, the feature vector extracting unit 120 may generate a frame feature vector using pixel information of the frame image, such as location information and pixel value information of each pixel included in the frame image. Here, in order to extract a frame feature vector using pixel information corresponding to each frame image, a machine learning algorithm such as a convolutional neural network (CNN) or a recurrent neural network (RNN) is used, or a histogram of oriented gradient (HOG) , LBP (Local Binary Pattern) can be used to extract the statistical characteristics of the image. In addition, the frame feature vector can be generated in various ways, and the content of the present invention is not limited by the above-described method.

Thereafter, the feature vector extracting unit 120 may detect candidate objects included in each frame image by using the generated frame feature vectors. Here, the candidate object is an object that can be the object of masking, and the candidate object can be set differently according to embodiments. For example, CCTV (Closed Circuit Television) can photograph various kinds of objects, such as park benches, sports equipment, as well as people, animals, and vehicles passing through the shooting area. However, in relation to the protection of personal information, an object requiring masking may be a person or a vehicle license plate, etc., so the feature vector extraction unit 120 extracts a person or vehicle license plate, etc. from a target image captured by CCTV and sets it as a candidate object. Can.

Here, features such as shape and luminance of candidate objects to be extracted can be learned in advance using a machine learning algorithm, and the feature vector extraction unit 120 utilizes them to correspond to candidate objects from each frame image. Areas can be detected. For example, it is possible to repeatedly learn the shapes of various people and extract the shape of a person included in a frame image as a candidate object. At this time, CNN, RNN, Principal Component Analysis (PCA), Logistic as a machine learning algorithm Regression, Decision Tree, etc. can be used.

Meanwhile, the feature vector extracting unit 120 may specify and display the extracted candidate object as a bounding box or a segmentation mask on the frame image. Here, the bounding box may be displayed as a rectangle around the pedestrian, which is a candidate object, as shown in FIG. 4(a), and the user can easily distinguish candidate objects through the bounding box. According to an embodiment, the position coordinates (x,y) of the upper left corner of the bounding box and four numbers (x,y,w,h) indicating the width and height (w,h) of the bounding box are used. , Each bounding box can be specified.

In addition, in the case of using a split mask, a foreground corresponding to a candidate object among frame images may be displayed separately from the background, which is the remaining background, in pixels. That is, as illustrated in FIG. 4(b), the pixel values of pixels corresponding to the background may be set to 0, and the pixel values of candidate objects corresponding to the foreground may be set to 1.

After detecting the candidate object, the feature vector extraction unit 120 may generate an object feature vector corresponding to the candidate object. The feature vector extracting unit 120 may specify a region corresponding to a candidate object among frame feature vectors, and extract a feature vector value corresponding to the specified region and set it as an object feature vector.

Here, since the object feature vector is set differently for each candidate object, candidate objects can be distinguished using the object feature vector. For example, when the same candidate object A appears consecutively on a plurality of frame images, the object feature vectors of the same candidate object A may be set identical or very similar on each frame image. On the other hand, in case of different candidate objects, the object feature vector has an error value at least equal to or greater than a set value. Accordingly, candidate objects having the same object feature vector may be determined to correspond to the same object, and candidate objects having different object feature vectors may be determined to correspond to different objects.

On the other hand, the size of the candidate object may be changed for each frame image according to the position or moving direction of the candidate object, and accordingly, the size of the object feature vector may be set differently for each frame image. Therefore, the feature vector extracting unit 120 may uniformly transform the size of each object feature vector to a predetermined size using interpolation.

As described above, the feature vector extracting unit 120 according to an embodiment of the present invention may detect a candidate object and generate an object feature vector of a candidate object using the frame feature vector. That is, since candidate object detection and object feature vector generation can be performed using the extracted frame feature vector, efficient operation is possible and the operation speed can be improved.

The identification information setting unit 130 may set identification information for candidate objects using the object feature vector. That is, the identification information setting unit 130 may distinguish each candidate object by using the object feature vector of the candidate objects, and may display identification information for each distinguished candidate object. For example, FIG. 4(a) includes a plurality of candidate objects, and each candidate object corresponds to a different object. Accordingly, the identification information setting unit 130 distinguishes each candidate object and can set “ID” as “138”, “147”, “128”, and “153” as identification information for the distinguished candidate object, respectively. have. In this case, the identification information setting unit 130 may set identification information for each candidate object using an identification information database.

Specifically, the identification information setting unit 130 may search the identification information database d for a registration feature vector corresponding to the object feature vector of candidate objects extracted from each frame image. Here, in the identification information database (d), the registration feature vector and identification information corresponding to each registration feature vector may be matched and stored. Accordingly, the identification information setting unit 130 may search for identification information corresponding to the object feature vector of the candidate object in the identification information database d.

Here, when the registered feature vector corresponding to the object feature vector is searched, the identification information matched with the registered feature vector can be extracted from the identification information database d and set as identification information of the corresponding candidate object. On the other hand, there may be a case where the registered feature vector corresponding to the object feature vector is not searched, and this case corresponds to the case where the object feature vector first appeared in the target video. Accordingly, the identification information setting unit 130 may newly generate identification information corresponding to the object feature vector, and register the newly generated identification information and the object feature vector in the identification information database d to identify the identification information database (d ).

Here, the identification information setting unit 130 may determine that the object feature vector corresponds to the registered feature vector when the object feature vector matches the registered feature vector within a preset error range. That is, even in the case of the same candidate object, since the object feature vector may include some errors for each frame image, it is possible to determine the identity in consideration of the error range.

Since the identification information setting unit 130 sets identification information of each candidate object with reference to the identification information database d, candidate objects having the same object feature vector among candidate objects included in the target video have the same identification information. Can be set. For example, if the candidate object having the object feature vector A appears in the frame image 3-10 and then again appears in the frame image 20-26, the candidate object is frame images 3-10 and 20-26 It may be set to have the same identification information b in the frame image.

Additionally, the identification information database d may be provided for each target video, but according to an embodiment, the identification mask database 100 includes one identification information database d for all target videos. It is also possible. In this case, if the object feature vector included in the newly provided target video is the same as the object feature vector included in the previous target video, it may be set to the same identification information as the identification information set in the previous video. That is, the same identification information may be set for the same candidate object for different target videos.

The tracking information generating unit 140 may track candidate objects having the same identification information in a continuous frame image, and generate tracking information for candidate objects for each identification information. That is, the tracking information can be generated by combining the location information and the identification information of the candidate object extracted from each frame image, and the tracking information can be used to track the positional change of the candidate object that naturally follows in the continuous frame image. have. Here, the tracking information may include identification information of the candidate object, frame information such as the frame number where the candidate object appears, location information and size information of the candidate object included in the frame image. Further, depending on the embodiment, it is also possible to further include information on the pose of the candidate object, facial feature points, clothing feature points, and the like in the tracking information. A single target video may include a plurality of tracking information, and each tracking information may be generated for each candidate object identification information.

According to an embodiment, the tracking information generator 140 tracks the same candidate object by using the difference value of the object feature vector of each candidate object included in the continuous frame image and the position and size change of the candidate objects. It is also possible to do. For example, when tracking a first candidate object included in a first frame image, tracking a plurality of candidate objects included in a second frame image subsequent to the first frame image, with the first candidate object Error values can be calculated. Thereafter, the candidate object having the smallest tracking error value may be determined as the same candidate object as the first candidate object.

Specifically, the tracking error value can be calculated using Error = (V1-V2) + a × (d1-d2) + b × (s1-s2). Here, Error is the tracking error value, V1 is the object characteristic vector of the first candidate object included in the first frame image, V2 is the object characteristic vector of the second candidate object included in the second frame image, and d1 is the first from the reference point. The distance from the reference point to the center point of the candidate object, d2 is the distance from the reference point to the center point of the second candidate object, s1 is the area of the first candidate object, s2 is the area of the second candidate object, and a and b are weights and can be assigned to any constant. It corresponds.

In general, since the time difference between successive frame images is very short, it is difficult for the same candidate object to move a large distance between successive frame images or to rapidly increase or decrease the area. Therefore, the smaller the difference between the object feature vectors and the smaller the change in distance and area, the more likely it is to be the same candidate object. Therefore, the candidate object can be tracked in a continuous frame image using the tracking error value described above. In addition, according to an embodiment, if V1-V2 is greater than or equal to a set limit error, it is possible to determine the second candidate object as a different candidate object without calculating the tracking error value. That is, since the object feature vector is outside the set error range, the second candidate object may be determined to be different from the first candidate object.

The error detection unit 150 may detect an error in setting identification information for a candidate object. For example, when a candidate object having the same identification information is partially missing in a continuous frame image, the error detection unit 150 may determine that an error has occurred in setting identification information for the candidate object.

In general, since the frame images of the target video are photographed at very short time intervals, it can be expected that candidate objects that exist simultaneously in adjacent frame images and whose positions in each frame image are close to each other have the same identification information. Therefore, in the case where the existing candidate object suddenly disappears within the adjacent frame image, it can be considered that an error has occurred, such as setting identification information for the candidate object, rather than actually moving the candidate object.

For example, a first frame image at a time t-1, a second frame image at a time t, and a third frame image at a time t+1 exist, and (x1 of the first frame image and the third frame image , y1) There may be a case where a candidate object with {id=0} exists. At this time, if a candidate object with {id=0} does not exist in (x1, y1) of the second frame image, or an object with {id=1} is suddenly located at (x1, y1), identification is performed. It can be determined that an error has occurred in information setting or the like. Therefore, the error detection unit 150 may notify the user or the like by displaying the occurrence of the error.

When receiving the masking input, the masking unit 160 may extract candidate objects identical to the identification information included in the masking input using the tracking information, and perform masking on the extracted candidate objects. That is, the masking unit 160 may selectively mask the selected candidate objects, and the candidate objects to be masked may be identified using identification information.

Here, the masking input may be generated using identification information input from a user, or may be generated using identification information extracted according to a preset selection algorithm. For example, the image masking apparatus 100 may provide a separate masking selection interface to the user, and the user may select identification information corresponding to a candidate object to be masked using the masking selection interface. In this case, the masking unit 160 may receive the masking input through the masking selection interface, and may mask candidate objects corresponding to the received masking input.

In addition, in an embodiment in which candidate objects to be masked are automatically extracted using a separate selection algorithm set in the image masking apparatus 100, all candidate objects included in a specific section in the target video are identified to be masked. Information may be extracted, or identification information corresponding to a specific candidate object included in a target video may be extracted and masked. For example, using a selection algorithm, identification information corresponding to a specific gender or age group may be extracted from candidate objects included in the target video.

Meanwhile, the masking unit 160 may extract the frame images in which the candidate object corresponding to the selected identification information appears using the tracking information, and set a masking area in correspondence with the position of the candidate object indicated in the extracted frame image. have. Thereafter, the set masking area may be masked to de-identify so that users cannot identify it. Here, the masking area may be limited to a region corresponding to a face among candidate objects.

In this case, the masking unit 160 may mask the masking area by blurring or mosaicing. Here, blurring may be implemented using a low-pass filter, and depending on the embodiment, masking may also be performed using image substitution that covers the masking area with a single color or a specific pattern, a separate image or animation, or a character.

Additionally, the masking unit 160 may encode an image synthesized by masking, and allow a masked image to be output through an image output unit (not shown). Meanwhile, according to an embodiment, the masking unit 160 may separately store masking information and a target video for a masking area set in the target video. That is, after storing the original file of the target video separately, the target video may be masked using masking information when the target video is played.

In addition, according to an embodiment, when the original file of the target video is to be reproduced, the access right to the original file may be requested. In the absence of the access right, masking of the target video may be applied and provided. Here, the access right can be checked through various types of authentication such as password, fingerprint recognition, and iris recognition.

Additionally, the image masking apparatus 100 according to an embodiment of the present invention may allow a user to select candidate objects to be masked, and in this case, may further include components for enhancing the user's convenience of selection.

Specifically, the object tracking image generator 170 may generate an object tracking image by overlaying a candidate object and identification information for each candidate object on the target video using the tracking information. That is, as shown in FIG. 4(a), an object tracking image may be generated by displaying a boundary box representing the candidate object and identification information together with the target video.

In this case, the user can check the object tracking image and identify identification information of the candidate object being tracked and each candidate object. Accordingly, the user may select an object to perform a masking process among a plurality of candidate objects by referring to the object tracking image.

In addition, the masking selection interface display unit 160 may provide a masking selection interface so that a user can select candidate objects to be masked. Specifically, the masking selection interface can be implemented as shown in FIG. 5. That is, the masking selection interface may display a list of identification information in which identification information corresponding to candidate objects included in the target video is sorted, and the appearance interval information and the frame in which the candidate objects having the same identification information appear in the target video The number of images and the like can be displayed. In addition, the user may further include a frame image in which the candidate object appears, so that the user can identify candidate objects corresponding to each identification information.

Accordingly, the user can select candidate objects to be masked in the target video through the masking selection interface, and can select candidate objects to be masked by checking the checkbox displayed next to the identification information list. That is, it is possible to selectively set candidate objects for masking, rather than collectively masking all objects included in the target video.

On the other hand, as shown in Figure 3, the image masking apparatus 100 according to an embodiment of the present invention, may include a physical configuration, such as a processor 10, memory 40, memory 40 One or more modules configured to be executed by the processor 10 may be included therein. Specifically, the one or more modules may include a frame input module, feature vector extraction module, identification information setting module, tracking information generation module, error detection module, masking module, object tracking image generation module, and masking selection interface display module. .

The processor 10 may perform various functions and execute data processing functions by executing various software programs and a set of instructions stored in the memory 40. The peripheral interface unit 30 may connect input/output peripheral devices of the image masking device 100 to the processor 10 and the memory 40, and the memory controller 20 may include the processor 10 or the image masking device 100. When a component of the access to the memory 40, it can perform a function for controlling the memory access. Depending on the embodiment, the processor 10, the memory controller 20 and the peripheral interface unit 30 may be implemented on a single chip or may be implemented as separate chips.

The memory 40 may include high-speed random access memory, one or more magnetic disk storage devices, and non-volatile memory such as a flash memory device. In addition, the memory 40 may further include a storage device located away from the processor 10 or a network attached storage device accessed through a communication network such as the Internet.

On the other hand, as shown in Figure 3, the image masking apparatus 100 according to an embodiment of the present invention, the memory 40, the operating system, the frame input module corresponding to the application program, feature vector extraction module, identification It may include an information setting module, a tracking information generation module, an error detection module, a masking module, an object tracking image generation module, and a masking selection interface display module. Here, each module is a set of instructions for performing the above-described functions, and may be stored in the memory 40.

Therefore, in the image masking apparatus 100 according to an embodiment of the present invention, the processor 10 may access the memory 40 and execute instructions corresponding to each module. However, the frame input module, feature vector extraction module, identification information setting module, tracking information generation module, error detection module, masking module, object tracking image generation module, and masking selection interface display module include the frame input unit, feature vector extraction unit, The identification information setting unit, the tracking information generation unit, the error detection unit, the masking unit, the object tracking image generation unit, and the masking selection interface display unit respectively correspond to each, so a detailed description thereof will be omitted here.

Referring to Figure 6, the image masking method according to an embodiment of the present invention, frame input step (S10), candidate object detection step (S20), object feature vector extraction step (S30), identification information setting step (S40), It may include a tracking information generation step (S50) and a masking step (S60). Here, each step may be performed by an image masking apparatus according to an embodiment of the present invention.

Hereinafter, an image masking method according to an embodiment of the present invention will be described with reference to FIG. 6.

In the frame input step (S10 ), the received target video may be decoded to generate a frame-by-frame image. Specifically, the video masking device may receive an MPEC-encoded target video from the video photographing device, and in this case, decode a target video in the form of a compressed MPEC file, and then capture each frame of the target video to generate a frame image. Can.

In the candidate object detection step (S20), a frame feature vector corresponding to each frame image is extracted from the frame image, and a candidate object can be detected from the frame feature vector. Here, the image masking apparatus may generate a frame feature vector using pixel information of the frame image, such as location information and pixel value information of each pixel included in the frame image, and according to an embodiment, CNN (Convolutional Neural) Machine learning algorithms such as Network), Recurrent Neural Network (RNN), and methods for extracting statistical characteristics of images such as Histogram of Oriented Gradient (HOG) and Local Binary Pattern (LBP) can be used.

Thereafter, the image masking apparatus may detect candidate objects included in each frame image using the generated frame feature vector. Here, the candidate object is an object that can be subjected to masking, and features such as shape and luminance of the candidate objects can be learned in advance using a machine learning algorithm. That is, CNN, RNN, PCA (Principal Component Analysis), Logistic Regression, Decision Tree, etc. can be learned using machine learning algorithms, and by using this, a region corresponding to a candidate object can be detected from each frame image. . At this time, the extracted candidate objects may be specified and displayed on a frame image with a bounding box or a segmentation mask.

In the object feature vector extraction step (S30 ), the object feature vector corresponding to each candidate object may be extracted from the frame feature vector using the detected location information of the candidate objects. That is, a region corresponding to a candidate object may be specified among frame feature vectors, and a feature vector value corresponding to the specified region may be extracted and set as an object feature vector. Here, since the object feature vector is set differently for each candidate object, candidate objects can be distinguished using the object feature vector.

On the other hand, the size of the candidate object may be changed for each frame image according to the position or moving direction of the candidate object, and accordingly, the size of the object feature vector may be set differently for each frame image. Therefore, the size of each object feature vector can be constantly modified to a predetermined size by using interpolation.

In the identification information setting step (S40 ), identification information for candidate objects may be set using an object feature vector. That is, each candidate object may be distinguished using the object feature vector of the candidate objects, and identification information may be assigned to each distinguished candidate object and displayed. At this time, identification information for each candidate object may be set using an identification information database.

Specifically, a registration feature vector corresponding to the object feature vector of candidate objects extracted from each frame image may be searched in the identification information database. Here, in the identification information database, a registered feature vector and identification information corresponding to each registered feature vector may be matched and stored. Therefore, it is possible to search identification information corresponding to the object feature vector of the candidate object in the identification information database.

Here, when the registered feature vector corresponding to the object feature vector is searched, identification information matched with the registered feature vector can be extracted from the identification information database and set as identification information of the corresponding candidate object. On the other hand, there may be a case where the registered feature vector corresponding to the object feature vector is not searched, and this case corresponds to the case where the object feature vector first appeared in the target video. Accordingly, identification information corresponding to the object feature vector can be newly generated, and the identification information database can be updated by registering the newly generated identification information and the object feature vector in the identification information database.

Here, if the object feature vector matches the registered feature vector within a predetermined error range, it can be determined that the object feature vector corresponds to the registered feature vector. That is, even in the case of the same candidate object, since the object feature vector may include some errors for each frame image, it is possible to determine the identity in consideration of the error range.

In the tracking information generation step (S50), candidate objects having the same identification information may be tracked in a continuous frame image, and tracking information for candidate objects may be generated for each identification information. That is, the tracking information can be generated by combining the location information and the identification information of the candidate object extracted from each frame image, and the tracking information can be used to track the positional change of the candidate object that naturally follows in the continuous frame image. have. Here, the tracking information may include identification information of the candidate object, frame information such as the frame number where the candidate object appears, location information and size information of the candidate object included in the frame image. Further, depending on the embodiment, it is also possible to further include information on the pose of the candidate object, facial feature points, clothing feature points, and the like in the tracking information. A single target video may include a plurality of tracking information, and each tracking information may be generated for each candidate object identification information.

According to an embodiment, it is also possible to track the same candidate object by using a difference value of an object feature vector of each candidate object included in a continuous frame image and a change in the position and size of the candidate objects. Specifically, the tracking error value can be calculated using Error = (V1-V2) + a × (d1-d2) + b × (s1-s2). Here, Error is the tracking error value, V1 is the object characteristic vector of the first candidate object included in the first frame image, V2 is the object characteristic vector of the second candidate object included in the second frame image, and d1 is the first from the reference point. The distance from the reference point to the center point of the candidate object, d2 is the distance from the reference point to the center point of the second candidate object, s1 is the area of the first candidate object, s2 is the area of the second candidate object, and a and b are weights and can be assigned to any constant. It corresponds.

Meanwhile, although not illustrated, according to an embodiment, an error detection step may be further included. That is, in the error detection step, an error in setting identification information for a candidate object may be detected, and when an error in setting identification information is detected, an error may be notified to the user. For example, there may be a case where a candidate object having the same identification information is partially missing within a continuous frame image, and in this case, an error in setting identification information for the candidate object may occur because the interval between each frame image is very short. It can be determined that it has occurred.

In the masking step (S60 ), when the masking input is received, candidate objects identical to the identification information included in the masking input may be extracted using the tracking information, and masking of the extracted candidate objects may be performed. That is, the selected candidate objects can be selectively masked, and candidate objects to be masked can be specified using identification information. Here, the masking input may be generated using identification information input from a user, or may be generated using identification information extracted according to a preset selection algorithm.

Meanwhile, in the masking step (S60 ), frame images in which candidate objects corresponding to the selected identification information appear using the tracking information may be extracted. Subsequently, a masking area may be set in correspondence to the position of the candidate object indicated in the extracted frame image, and the set masking area may be masked to be de-identified so that users cannot identify it. Here, the masking area may be limited to a region corresponding to a face among candidate objects.

At this time, the image masking device may mask the masking area by blurring (blurring) or mosaicing. Here, blurring can be implemented using a low-pass filter. Also, depending on the embodiment, the masking area may be masked by using a single color or a specific pattern, image substitution covering a separate image, animation, character, or the like.

In addition, the image masking device may encode the synthesized image of the masking, and may output the masked image using an image output unit. On the other hand, depending on the embodiment, it is also possible to separately store the masking information for the masking area set in the target video and the target video, respectively. That is, after storing the original file of the target video separately, the target video may be masked using masking information when the target video is played.

Additionally, the image masking method according to an embodiment of the present invention may allow a user to select candidate objects to be masked, and in this case, may further include components for increasing the convenience of user selection.

Specifically, an object tracking image generation step (not shown) may be further included, and by using the tracking information, the candidate object and identification information for each candidate object may be overlaid on the target video to generate the object tracking image. That is, an object tracking image may be generated by displaying a boundary box indicating the candidate object and identification information together with the target video.

In addition, a masking selection interface display step (not shown) may be further included to provide a masking selection interface so that a user can select candidate objects to be masked. Specifically, the masking selection interface may display a list of identification information in which identification information corresponding to candidate objects included in the target video is sorted, and appearance interval information in which candidate objects having the same identification information appear in the target video. The number of appeared frame images may be displayed. In addition, an image of a frame image in which the candidate object appears may be further included so that the user can identify candidate objects corresponding to each identification information.

The above-described present invention can be embodied as computer readable codes on a medium on which a program is recorded. The computer-readable medium may be one that continuously stores an executable program on a computer or temporarily stores it for execution or download. In addition, the medium may be various recording means or storage means in the form of a combination of single or several hardware, and is not limited to a medium directly connected to a computer system, but may be distributed on a network. Examples of the medium include magnetic media such as hard disks, floppy disks and magnetic tapes, optical recording media such as CD-ROMs and DVDs, and magneto-optical media such as floptical disks, And program instructions including ROM, RAM, flash memory, and the like. In addition, examples of other media include an application store for distributing applications, a site for distributing or distributing various software, and a recording medium or storage medium managed by a server. Accordingly, the above detailed description should not be construed as limiting in all respects, but should be considered illustrative. The scope of the invention should be determined by rational interpretation of the appended claims, and all changes within the equivalent scope of the invention are included in the scope of the invention.

The present invention is not limited by the above-described embodiments and the accompanying drawings. For those skilled in the art to which the present invention pertains, it will be apparent that components according to the present invention can be substituted, modified and changed without departing from the technical spirit of the present invention.

Claims

Decoding the input target video to generate a frame-based image;

Extracting a frame feature vector corresponding to each frame image from the frame image, and detecting a candidate object from the frame feature vector;

Extracting an object feature vector corresponding to each candidate object from the frame feature vector using the detected location information of the candidate objects;

Setting identification information for the candidate objects using the object feature vector;

Tracking candidate objects having the same identification information in a continuous frame image to generate tracking information for the candidate objects for each identification information; And

And receiving the masking input, extracting candidate objects identical to the identification information included in the masking input using the tracking information, and performing masking on the extracted candidate objects.
The method of claim 1, wherein the step of setting the identification information

Retrieving a registered feature vector corresponding to the object feature vector from an identification information database;

When a registration feature vector corresponding to the object feature vector is searched, extracting identification information matching the registration feature vector from the identification information database and setting the identification information of the candidate object; And

And if the registration feature vector corresponding to the object feature vector is not searched, generating identification information of the candidate object, and registering the object feature vector and the identification information in the identification information database. How to mask video.
According to claim 2, Searching in the identification information database is

And if the object feature vector matches the registered feature vector within a predetermined error range, it is determined that the object feature vector corresponds to the registered feature vector.
The method of claim 1, wherein generating the tracking information

An image masking method characterized in that the same candidate object is tracked using a difference value of an object feature vector of each candidate object included in the continuous frame image and a change in the position and size of the candidate objects.
The method of claim 4, wherein the step of generating the tracking information

Error = (V1-V2) + a × (d1-d2) + b × (s1-s2)

Here, Error is the tracking error value, V1 is the object characteristic vector of the first candidate object included in the first frame image, V2 is the object characteristic vector of the second candidate object included in the second frame image, and d1 is the first from the reference point. The distance from the reference point to the center point of the candidate object, d2 is the distance from the reference point to the center point of the second candidate object, s1 is the area of the first candidate object, s2 is the area of the second candidate object, a, b are weights,

An image masking method characterized in that the second candidate object having the minimum tracking error value is determined as a candidate object having the same identification information as the first candidate object.
According to claim 1,

When the candidate object having the same identification information is partially missing in a continuous frame image, the image masking method further comprises an error detection step of determining that an error has occurred in setting the identification information for the candidate object.
The method of claim 1, wherein the tracking information

The image masking method comprising at least one of identification information of the candidate object, frame information where the candidate object appears, and location information and size information of the candidate object included in the frame image.
According to claim 1,

And using the tracking information, generating an object tracking image that overlays the candidate object and identification information for each candidate object on the target video.
The method of claim 8,

Masking including a list of identification information in which identification information corresponding to candidate objects included in the target video is sorted, section information in which candidate objects having the same identification information appear in the target video, and a frame image in which the candidate object appears And displaying the selection interface.
The method of claim 1, wherein the masking input

Video masking method characterized in that it is generated using the identification information input from the user or by extracting the identification information according to a preset selection algorithm.
According to claim 1, The step of performing the masking

A video masking method comprising extracting a frame image in which a candidate object corresponding to the identification information appears using the tracking information and setting a masking area corresponding to the position of the candidate object in the extracted frame image.
The method of claim 11, wherein the step of performing the masking

The image masking method characterized in that the masking area is masked by using blurring, mosaic processing, or image replacement.
The method of claim 12, wherein performing the masking

Video masking method characterized in that the masking information for the masking area set in the target video, and the target video are respectively stored, and the target video is masked and played using the masking information when the target video is played.
The method of claim 13, wherein the step of performing the masking

A video masking method comprising requesting access permission when playing the target video, and if the access permission is not present, masking and playing the target video.
A computer program stored in a medium to execute the image masking method of any one of claims 1 to 14 in combination with hardware.
A frame input unit that decodes an input target video to generate a frame-based frame image;

A frame feature vector corresponding to each frame image is extracted from the frame image, a candidate object is detected from the frame feature vector, and using the location information of the detected candidate objects, each candidate object from the frame feature vector A feature vector extraction unit for extracting an object feature vector corresponding to the;

An identification information setting unit configured to set identification information for the candidate objects using the object feature vector;

A tracking information generation unit that tracks candidate objects having the same identification information in a continuous frame image and generates tracking information for the candidate objects for each identification information; And

When receiving the masking input, an image masking device including a masking unit that extracts candidate objects identical to identification information included in the masking input using the tracking information and performs masking on the extracted candidate objects.
Processor; And

As comprising a memory coupled to the processor,

The memory includes one or more modules configured to be executed by the processor,

The one or more modules,

Decoding the received target video to generate a frame-by-frame image,

Extracting a frame feature vector corresponding to each frame image from the frame image, detecting a candidate object from the frame feature vector,

After extracting the object feature vector corresponding to each candidate object from the frame feature vector using the detected location information of the candidate objects,

Set identification information for the candidate objects using the object feature vector,

The candidate objects having the same identification information are tracked in a continuous frame image, and tracking information for the candidate objects is generated for each identification information.

When receiving the masking input, extracting candidate objects identical to the identification information included in the masking input using the tracking information, and performing masking on the extracted candidate objects,

Video masking device comprising a command.