WO2021057035A1

WO2021057035A1 - Camera system and video processing method

Info

Publication number: WO2021057035A1
Application number: PCT/CN2020/089930
Authority: WO
Inventors: 魏子昆; 张至先
Original assignee: 上海依图网络科技有限公司
Priority date: 2019-09-27
Filing date: 2020-05-13
Publication date: 2021-04-01
Also published as: CN110674765A

Abstract

Provided are a camera system and a video processing method. The method comprises: acquiring an original video image; recognizing all people areas in an original video, and extracting structural features of each person in the original video; recognizing all facial areas in the original video, and respectively extracting facial feature vectors; performing a change operation on the extracted facial feature vectors to form fabricated facial feature vectors; and forming a fabricated face according to both the structural features of each person and the fabricated facial feature vectors. In a video, which has been subjected to privacy removal, obtained in this way, the structural features of a person can be still reserved, so that the use of video data used as effective commercial or security-protection video data is ensured, and the privacy of the person in the video can also be protected.

Description

Camera system and video processing method

Technical field

The invention relates to the field of face recognition, in particular to a camera system and a video processing method.

Background technique

Face recognition technology has high development prospects and economic benefits in the fields of public security investigations, access control systems, target tracking and other civilian security control systems. But while face recognition technology can become the most powerful security tool, it may also be detrimental to protecting personal privacy. However, when the prior art protects the privacy of the characters in the video, various characteristics of the characters are usually removed, or the characteristics of the characters are changed, so that the video data cannot be used as commercial or security data. The existing technology cannot protect the privacy of video characters while ensuring that the video data can also be used as effective commercial or security video data.

Summary of the invention

In order to solve the problems in the prior art, at least one embodiment of the present invention provides a camera system and a video processing method, which can protect the privacy of video characters while ensuring that the video data can also be used as effective commercial or security video data. .

In the first aspect, an embodiment of the present invention proposes a camera system, the system includes: a video acquisition module, which is used to acquire original video images; a structured feature acquisition module, which is used to extract the structured character of each person in the original video Features; face feature vector acquisition module, used to acquire all face regions in the original video, and extract the feature vector of each face separately; feature vector change module, used to extract the face of the face recognition module The feature vector is changed to a forged face feature vector; a face forged module is used to form a forged face according to the structural features of each person and the forged face feature vector; a video generation module is used to convert the The fake faces formed by the face fake module are respectively covered on the original faces of the original video to form a privacy-removed video.

In some embodiments, the structural feature includes at least one of the following: gender, age, whether to wear glasses, accessories, and clothing.

In some embodiments, the camera system further includes: an encryption module for encrypting the original video image; or, encrypting the deprived video.

In some embodiments, the decryption module is used to decrypt the encrypted original video image, or decrypt the encrypted deprived video.

In some embodiments, the video acquisition module, the structured feature acquisition module, the face feature vector acquisition module, the feature vector change module, the face forgery module, and the encryption module are packaged in one Or, the video capture module is located in a camera, the structured feature acquisition module, the face feature vector acquisition module, the feature vector change module, the face forgery module, and the The encryption module is located in the background of the system.

In a second aspect, an embodiment of the present invention also provides a video processing method, including: obtaining an original video image; identifying all character regions in the original video, extracting structural features of each character in the original video; All face regions, and extract the feature vector of the face respectively; perform a change operation on the feature vector of the extracted face to form a forged face feature vector; according to the structured features of each person and the forged person The face feature vector forms a fake human face; the fake human face is respectively covered on the original human face of the original video to form a privacy-removed video.

In some embodiments, the video processing method further includes: encrypting the original video image; or encrypting the deprived video.

In some embodiments, the video processing method further includes: decrypting the encrypted original video image, or decrypting the encrypted de-privacy video.

In some embodiments, in the video processing method, the encryption includes: encrypting the original video or the deprived video frame by frame; or, encrypting data of a preset size in the original video or the deprived video Block encryption; or, encrypt the entire video of the original video or the deprived video.

In a third aspect, an embodiment of the present invention also provides a video processing device, including: at least one processor; a memory coupled with the at least one processor, the memory storing executable instructions, wherein the executable instructions When executed by the at least one processor, the method as described in any one of the above second aspect is realized.

In a fourth aspect, an embodiment of the present invention also provides a computer-readable storage medium having a computer program stored on the computer-readable storage medium, and when the computer program is executed by a processor, the above-mentioned second aspect is implemented. The method described.

It can be seen that, in at least one embodiment of the embodiments of the present invention, after acquiring the original video image, all areas of people in the original video are identified, the structural features of each person in the original video are extracted, and all face areas in the original video are identified, and Extract the feature vector of the human face respectively, perform a change operation on the extracted feature vector of the human face to form a forged face feature vector, and form a forgery according to the structured features of each person and the forged face feature vector. The face, the deprived video obtained in this way, can also retain the structural characteristics of the character, so that while protecting the privacy of the video character, it can ensure that the video data can also be used as effective commercial or security video data.

Description of the drawings

In order to explain the technical solutions of the embodiments of the present invention more clearly, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only some of the present invention. Embodiments, for those of ordinary skill in the art, without creative labor, other drawings can be obtained based on these drawings.

FIG. 1 is a schematic diagram of the composition structure of an embodiment of the camera system of the present invention;

Fig. 2 is a flowchart of an embodiment of the video processing method of the present invention.

detailed description

In order to make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments They are a part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present invention.

It should be noted that in this article, relational terms such as "first" and "second" are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply these There is any such actual relationship or sequence between entities or operations.

As shown in FIG. 1, in the first aspect, this embodiment provides a camera system, which includes:

Video acquisition module 210, which is used to acquire original video images;

The structured feature acquisition module 220 is used to extract the structured features of each person in the original video; if there are multiple characters in the original video, extract the structured features of each person respectively. At the same time, the recorded person can be identified. For example, the corresponding position of each person in the video can be recorded to identify a recorded person.

The face feature vector acquiring module 230 is used to acquire all face regions in the original video, and extract the feature vectors of each face respectively. If there are multiple faces in the original video, the feature vectors of each face are extracted respectively. At the same time, the corresponding position of each face in the video can be recorded, so that each forged face can be covered to the corresponding original face position when the privacy removal module is finally generated.

The feature vector changing module 240 is used to change the feature vector of the face extracted by the face recognition module into a fake face feature vector. Specifically, the feature vector change module can use a one-way hash algorithm, such as the md5 algorithm, or other mathematical methods, to change it into a new face feature vector, which can be recorded as a fake feature vector.

The face forgery module 250 is used to form a forged human face according to the structural features of each person and the forged facial feature vector. For example, according to the forged feature vector and the structured features of the character, a face generation algorithm based on the anti-neural network, such as the deepfake algorithm, is used to generate a new face. This face is different from the original face, and it is recorded as a fake face. .

Since the forged face is formed based on the feature vector of the original face and the structural feature of the character, based on the face with the same feature vector, on different occasions, an approximate face can be formed after forging. It can be ensured that the fake face formed by the same original face is similar under any circumstances within the preset time. That is, the horizontal consistency of the forged face is guaranteed. Based on the forged face based on the structural features of the original person, it can ensure that the structural features are consistent before and after the camouflage. Optionally, extract information such as gender, age, whether to wear glasses, etc. through face structuring algorithms. When generating faces, use conditional face generation algorithms, such as conditionGan, stylegan, etc., to control the generated faces It conforms to the structural attributes of the original face, such as age, gender, whether to wear glasses, accessories, hair length, hair color, length of clothing, color depth, etc. The accessories can be various accessories, such as hats, headwear, etc.

The video generation module 260 is configured to cover the fake faces formed by the face forgery module on the original faces of the original video, respectively, to form a privacy-removed video.

In this embodiment, a forged face is formed according to the structural characteristics of each person and the forged face feature vector data. The deprivation video obtained in this way can protect the privacy of the person in the video while retaining the structural characteristics of the person to ensure Video data can also be used as effective commercial or security video data.

In addition, the camera system of this embodiment may further include: an encryption module, which is used to encrypt the original video image; or, to encrypt the deprived video. And, it may also include: a decryption module for decrypting the encrypted original video image, or decrypting the encrypted deprived video. The encryption module encrypts the original video or the deprived video frame by frame; or, encrypts the data block of a preset size in the original video or the deprived video; or, encrypts the entire video of the original video or the deprived video.

Specifically, the encryption module can use, for example, an RSA encryption algorithm, and can perform one-way encryption through a public key. The specific method can be to encrypt the video frame by frame, or to encrypt each data block of a certain size, or to directly encrypt the entire video. The encrypted video is stored and cannot be decrypted without an authorized private key to ensure its security. When decrypting, it is similar to the encryption method, and the original video is obtained by decrypting with the private key.

In an embodiment, the video acquisition module, structured feature acquisition module, face feature vector acquisition module, feature vector change module, face forgery module, and encryption module of the camera system are packaged in one camera. For example, all modules are packaged in a camera, and the original video is obtained through the camera. Encryption operations are performed through encryption chips or general-purpose processors. The de-privacy module calculation is performed through the AI chip or a general-purpose accessory processor (gpu) or central processing unit (cpu).

In this embodiment, the chip module encapsulated in the camera first performs de-privacy processing on the video data, and then transmits it to the background server through a wired or wireless network or a combination thereof. Since the transmitted video data has been processed for privacy, it can reduce the occurrence of leaks and improve the reliability of the system.

In another embodiment, the video acquisition module of the camera system is located in a camera, and the structured feature acquisition module, the face feature vector acquisition module, the feature vector change module, the face forgery module and the encryption module are located in the background of the system. For example, the structured feature acquisition module, the face feature vector acquisition module, the feature vector change module, the face forgery module, and the encryption module are extracted from the camera and placed on the back-end server. The front-end camera is just a general-purpose camera.

In this embodiment, when different camera systems need to be upgraded, the system can be upgraded directly in the background, without the need to replace cameras one by one, and there is no need to perform separate upgrade processing for each camera. Improve the efficiency of replacement and reduce costs.

As shown in FIG. 2, in a second aspect, this embodiment provides a video processing method, including:

310. Obtain an original video image. Specifically, the original video image may be obtained through any camera of the existing technology or the future technology.

320. Identify all character regions in the original video, and extract structural features of each character in the original video. If there are multiple people in the original video, the structured features of each person are extracted respectively. At the same time, the recorded person can be identified. For example, the corresponding position of each person in the video can be recorded to identify the recorded person.

330. Recognize all face regions in the original video, and extract feature vectors of the faces respectively. If there are multiple faces in the original video, the feature vector of each face is extracted through recognition. If there are multiple faces in the original video, the feature vectors of each face are extracted respectively. At the same time, the corresponding position of each face in the video can be recorded, so that each forged face can be covered to the corresponding original face position when the privacy removal module is finally generated.

340. Perform a change operation on the feature vector of the extracted face to form a forged feature vector of the face. Specifically, a one-way hash algorithm, such as the md5 algorithm, or other mathematical methods can be used to perform a change operation to change it into a new face feature vector, which can be recorded as a forged feature vector.

350. Form a fake face according to the structural features of each person and the fake face feature vector. Specifically, according to the forged feature vector and the structured features of the character, a face generation algorithm based on the anti-neural network, such as the deepfake algorithm, can be used to generate a new face. This face is different from the original face, so remember it as Falsify human faces.

Optionally, extract information such as gender, age, whether to wear glasses, etc. through face structuring algorithms. When generating faces, use conditional face generation algorithms, such as conditionGan, stylegan, etc., to control the generated faces It conforms to the structural attributes of the original face, such as age, gender, whether to wear glasses, accessories, length of clothing, and shade of color.

360, covering the fake faces on the original faces of the original video to form a privacy-removed video.

In this embodiment, since the forged face is formed based on the feature vector of the original face and the structural feature of the character, based on the face with the same feature vector, in different situations, an approximation can be formed after the forgery Human face. It can be ensured that the fake face formed by the same original face is similar under any circumstances within the preset time. That is, the horizontal consistency of the forged face is guaranteed. Based on the forged face based on the structural features of the original person, it can ensure that the structural features before and after the camouflage are consistent.

In this embodiment, after the extracted feature vectors of all faces in the obtained original video are imitated, the imitated faces are respectively covered on the faces of the original video to form a deprivation video, and the deprivation video does not contain any one. The face in the original video can still be viewed normally, and information such as pedestrian behavior and crowd distribution can be analyzed normally, and the structural characteristics of the character can be maintained. The video can be used as effective security or commercial data while protecting privacy.

In another embodiment, a video processing method further includes: encrypting the original video image; or, encrypting the deprived video.

And, after encryption, it also includes: decrypting the encrypted original video image, or decrypting the encrypted deprived video.

Specifically, when encrypting, the original video or the deprived video can be encrypted frame by frame; or, the original video or the deprived video of a data block of a preset size can be encrypted; or, the original video or the entire video of the deprived video can be encrypted .

It is understandable that decryption is a process corresponding to encryption, and the decryption method can be adapted to the encryption method.

In the third aspect, the present invention also provides a video processing device, including:

At least one processor; a memory coupled with the at least one processor, and the memory stores executable instructions, where the executable instructions, when executed by the at least one processor, enable the method of the second aspect of the present invention to be implemented.

This embodiment provides a video processing device, including: at least one processor; and a memory coupled with the at least one processor. For example, the memory may include random access memory, flash memory, read-only memory, programmable read-only memory, non-volatile memory, or registers. The processor may be a central processing unit (Central Processing Unit, CPU) or the like. The memory can store executable instructions. The processor can execute executable instructions stored in the memory to implement the various processes described herein.

It can be understood that the memory in this embodiment may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memory. Among them, the non-volatile memory can be ROM (Read-OnlyMemory), PROM (ProgrammableROM, Programmable Read-Only Memory), EPROM (ErasablePROM, Erasable Programmable Read-Only Memory), EEPROM (Electrically EPROM, Electronic Erasable programmable read-only memory) or flash memory. The volatile memory may be RAM (Random Access Memory), which is used as an external cache. By way of exemplary but not restrictive description, many forms of RAM are available, such as SRAM (StaticRAM, static random access memory), DRAM (DynamicRAM, dynamic random access memory), SDRAM (SynchronousDRAM, synchronous dynamic random access memory), DDRSDRAM (DoubleDataRate SDRAM, double data rate synchronous dynamic random access memory), ESDRAM (Enhanced SDRAM, enhanced synchronous dynamic random access memory), SLDRAM (SynchlinkDRAM, synchronous connection dynamic random access memory) and DRRAM (DirectRambusRAM, direct RAM bus random access memory). The memory 42 described herein is intended to include, but is not limited to, these and any other suitable types of memory.

In some embodiments, the memory stores the following elements, upgrade packages, executable units, or data structures, or a subset of them, or an extended set of them: operating systems and applications.

Among them, the operating system includes various system programs, such as a framework layer, a core library layer, and a driver layer, which are used to implement various basic services and process hardware-based tasks. Application programs, including various application programs, used to implement various application services. A program that implements the method of the embodiment of the present invention may be included in an application program.

In the embodiment of the present invention, the processor calls a program or instruction stored in the memory, specifically, a program or instruction stored in an application program, and the processor is used to execute the method steps provided in the second aspect.

In addition, in the fourth aspect, the present invention also provides a computer-readable storage medium with a computer program stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method of the second aspect of the present invention are implemented.

For example, the machine-readable storage medium may include, but is not limited to, various known and unknown types of non-volatile memory.

Those skilled in the art can understand that the units and algorithm steps of the examples described in combination with the embodiments disclosed herein can be implemented by electronic hardware or a combination of software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different ways to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of the present application.

In the embodiments of the present application, the disclosed system, device, and method may be implemented in other ways. For example, the division of units is only a logical function division, and there may be other division methods in actual implementation. For example, multiple units or components can be combined or integrated into another system. In addition, the coupling between the various units may be direct coupling or indirect coupling. In addition, the functional units in the embodiments of the present application may be integrated into one processing unit, or may be a separate physical existence, and so on.

If the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a machine-readable storage medium. Therefore, the technical solution of the present application may be embodied in the form of a software product. The software product may be stored in a machine-readable storage medium, which may include a number of instructions to make an electronic device execute the technical solutions described in the embodiments of the present application. All or part of the process. The foregoing storage media may include various media capable of storing program codes, such as ROM, RAM, removable disks, hard disks, magnetic disks, or optical disks.

The above content is only specific implementation manners of this application, and the protection scope of this application is not limited thereto. Those skilled in the art can make changes or substitutions within the technical scope disclosed in this application, and these changes or substitutions should all fall within the protection scope of this application.

Claims

A camera system, characterized in that the system includes:

Video acquisition module, which is used to acquire original video images;

The structured feature acquisition module is used to extract the structured features of each person in the original video;

The face feature vector acquisition module is used to acquire all face regions in the original video, and extract the feature vectors of each face respectively;

The feature vector change module is used to change the feature vector of the face extracted by the face recognition module into a forged feature vector of the face;

A face forgery module, configured to form a forged human face according to the structural features of each person and the forged human face feature vector;

The video generation module is used to overlay the fake face formed by the face forge module on the original face of the original video, respectively, to form a privacy-removed video.
The camera system according to claim 1, wherein the structural feature includes at least one of the following: gender, age, whether to wear glasses, accessories, and clothing.
The camera system of claim 1, further comprising:

An encryption module for encrypting the original video image; or

Encrypt the deprived video.
The camera system according to claim 3, further comprising: a decryption module for decrypting the encrypted original video image, or decrypting the encrypted deprived video.
The camera system according to claim 3, wherein the video acquisition module, the structured feature acquisition module, the face feature vector acquisition module, the feature vector change module, the face forgery module, and The encryption module is packaged in a camera; or

The video acquisition module is located in a camera, the structured feature acquisition module, the face feature vector acquisition module, the feature vector change module, the face forgery module, and the encryption module are located in the system Backstage.
A video processing method, characterized in that it comprises:

Obtain the original video image;

Identifying all character regions in the original video, and extracting structural features of each character in the original video;

Identifying all face regions in the original video, and extracting feature vectors of the faces respectively;

Performing a change operation on the feature vector of the extracted human face to form a forged feature vector of the human face;

Forming a fake face according to the structural features of each person and the fake face feature vector;

The fake faces are respectively covered on the original faces of the original video to form a privacy-removed video.
The video processing method according to claim 6, further comprising:

Encrypt the original video image; or

Encrypt the deprived video.
8. The video processing method of claim 7, further comprising:

Decrypt the encrypted original video image, or decrypt the encrypted deprived video.
The video processing method according to claim 7 or 8, wherein the encryption is:

Encrypt the original video or the deprived video frame by frame; or

Encrypt data blocks of a preset size in the original video or the deprived video; or

Encrypt the original video or the entire video of the privacy-removed video.
A video processing device includes:

At least one processor;

A memory coupled to the at least one processor, the memory storing executable instructions, wherein the executable instructions, when executed by the at least one processor, enable the implementation of the method according to any one of claims 6 to 9 The method described.
A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the method according to any one of claims 6 to 9 is implemented. step.