CN113139527B

CN113139527B - Video privacy protection method, device, equipment and storage medium

Info

Publication number: CN113139527B
Application number: CN202110593414.XA
Authority: CN
Inventors: 刘勃; 孙杨; 王煜龙
Original assignee: Shenzhen Wenda Zhitong Technology Co ltd
Current assignee: Shenzhen Wenda Zhitong Technology Co ltd
Priority date: 2021-05-28
Filing date: 2021-05-28
Publication date: 2023-09-22
Anticipated expiration: 2041-05-28
Also published as: CN113139527A

Abstract

The application discloses a video privacy protection method, a device, equipment and a storage medium, wherein the method comprises the following steps: acquiring an original video; extracting a reference image containing a human face from the original video; anonymizing the face in the reference image to obtain a first image; according to the reference image, extracting motion information of a face in the original video, which corresponds to the reference image, and obtaining a motion information stream of the face in the reference image; and fusing the first image and the motion information stream to obtain a target video. The application provides a video privacy protection method and device, which can solve the contradiction between privacy protection and video file application value in the existing video privacy protection.

Description

Video privacy protection method, device, equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a video privacy protection method, apparatus, device, and storage medium.

Background

The rapid development of applications such as video monitoring and short video platforms facilitates the lives of people. However, these videos contain sensitive information of many individuals, especially identity information of faces. With the increasing widespread use of computer vision technology and deep learning, the privacy and security of video data is severely compromised. Current video privacy protection techniques typically blur or delete privacy-sensitive information, which has proven vulnerable to and ineffective in utilities. The video data processed by the method loses much information irrelevant to privacy, and the practical value of the data is greatly reduced.

In summary, the ideal video privacy protection method should be able to remove the identity information of people and keep the integrity of other privacy-independent information as much as possible, such as the number of people appearing in the video, the peer relationship of people, the actions of people, etc. So that the processed video file can still be applied to other video analysis tasks.

Disclosure of Invention

In view of the above technical problems, the application provides a video privacy protection method, a device, equipment and a storage medium, so as to provide a technical scheme capable of solving the contradiction between privacy protection and video file application value in the existing video privacy protection.

Other features and advantages of the present disclosure will be apparent from the following detailed description, or may be learned in part by the practice of the disclosure.

According to an aspect of the present application, there is provided a video privacy protection method, the method including: acquiring an original video; extracting a reference image containing a human face from the original video; anonymizing the face in the reference image to obtain a first image; according to the reference image, extracting motion information of a face in the original video, which corresponds to the reference image, and obtaining a motion information stream of the face in the reference image; and fusing the first image and the motion information stream to obtain a target video.

Further, the anonymizing the face in the reference image to obtain a first image includes: and based on the generated countermeasure network, replacing the face information in the reference image with the generated face image to obtain the first image.

Further, the extracting, according to the reference image, motion information of a face in the original video, which corresponds to the reference image, to obtain a motion information stream of the face in the reference image includes: acquiring a key frame sequence; extracting key points from faces corresponding to the reference image in each frame in the key frame sequence, and generating a first characterization matrix for characterizing the key points of the key frame sequence; extracting key points of the faces in the reference image to generate a second characterization matrix for characterizing the key points of the reference image; dividing each first characterization matrix by the second characterization matrix, and calculating to obtain a third characterization matrix for characterizing the motion of the reference image; and obtaining a motion information stream of the motion information of the face corresponding to the reference image in the original video according to the third characterization matrix.

Further, the acquiring the key frame sequence includes: and detecting the face corresponding to the reference image in the original video to obtain a key frame sequence, wherein the face corresponding to the reference image in the key frame sequence has motion change.

Further, the keypoints are determined by a pre-trained deep neural network model.

Further, the obtaining, according to the third characterization matrix, a motion information stream of motion information of the face corresponding to the reference image in the original video includes: arranging the third characterization matrix according to a time sequence; obtaining a characterization matrix of the motion of the face of the reference image corresponding to other frames except the key frame sequence in the original video through interpolation operation; and obtaining a motion information stream representing motion information of the face corresponding to the reference image in the original video.

Further, the fusing the first image and the motion information stream to obtain a target video includes: and fusing the first image into the motion information stream based on a fusion network to obtain the target video.

According to a second aspect of the present disclosure, there is provided a video privacy protection apparatus comprising: the video acquisition module acquires an original video; the reference image selecting module extracts a reference image containing a human face from the original video; the image anonymization processing module performs anonymization processing on the face in the reference image to obtain a first image; the motion information estimation module is used for extracting motion information of a face in the original video, which corresponds to the reference image, according to the reference image to obtain a motion information stream of the face in the reference image; and the video generation module fuses the first image and the motion information stream to obtain a target video.

According to a third aspect of the present disclosure, there is provided a video privacy protecting apparatus comprising: a processor; and a memory arranged to store computer executable instructions that, when executed, cause the processor to: acquiring an original video; extracting a reference image containing a human face from the original video; anonymizing the face in the reference image to obtain a first image; according to the reference image, extracting motion information of a face in the original video, which corresponds to the reference image, and obtaining a motion information stream of the face in the reference image; and fusing the first image and the motion information stream to obtain a target video.

According to a fourth aspect of the present disclosure, there is provided a computer readable storage medium storing a computer program which, when executed by a processor, performs the video privacy preserving method described above.

The technical scheme of the present disclosure has the following beneficial effects:

according to the video privacy protection method, the video privacy protection device and the video privacy protection equipment based on anonymization processing, the contradiction between privacy protection and video file application value in the prior video privacy protection can be solved, and the integrity of other privacy-independent information, such as the number of people, the peer relationship of people, the actions of people and the like, in the video can be kept as much as possible while the identity information of people is removed. So that the processed video file can still be applied to other video analysis tasks.

Drawings

Fig. 1 is a workflow diagram of a video privacy protection method according to an embodiment of the present disclosure;

FIG. 2 is a flow chart of acquiring motion information flow in an embodiment of the present disclosure;

FIG. 3 is a flow chart of acquiring motion information flow in an embodiment of the present disclosure;

fig. 4 is a block diagram of a video privacy protection apparatus in the embodiment of the present specification;

FIG. 5 is a block diagram of a motion estimation module in an embodiment of the present disclosure;

fig. 6 is a terminal device for implementing a video privacy protection method in an embodiment of the present disclosure;

fig. 7 is a computer-readable storage medium for implementing a video privacy preserving method in an embodiment of the present specification.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the present disclosure. One skilled in the relevant art will recognize, however, that the aspects of the disclosure may be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.

Furthermore, the drawings are only schematic illustrations of the present disclosure. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in software or in one or more hardware modules or integrated circuits or in different networks and/or processor devices and/or microcontroller devices.

As shown in fig. 1, an embodiment of the present disclosure provides a video privacy protection method, where an execution subject of the method may be a terminal device, and the terminal device may be a mobile phone, a tablet computer, a personal computer, or the like. The method specifically comprises the following steps S101-S105:

in step S101, an original video is acquired.

The original video may be a video directly acquired, that is, a video directly captured by a video recording device such as a camera, a video camera, a mobile phone, or the like, or may be a video stored in a memory.

In step S102, a reference image containing a face is extracted from the original video.

The reference image contains a face which needs to be anonymously protected, and the face can be one or more than one, so that the face in the reference image is clear and the five sense organs information is comprehensive for facilitating identification.

In step S103, anonymizing the face in the reference image to obtain a first image.

The face detector may be one of Haar cams, hog+linear support vector machine and face detector based on deep learning, after extracting the face region in the reference image, anonymizing the face in the reference image, and storing the anonymized face in the original image to obtain a first image, wherein the anonymizing process may be to replace or blur the face information, so long as the identity information of the face needing anonymizing can be hidden, the anonymizing process belongs to anonymizing process means pointed by the disclosure.

In step S104, motion information corresponding to a face in the reference image in the original video is extracted according to the reference image, so as to obtain a motion information stream of the face in the reference image.

According to the face in the reference image, searching the position corresponding to the face in each frame of the original video, and establishing a motion path of the face, wherein the motion path is the motion information flow of the face in the reference image.

In step S105, the first image and the motion information stream are fused to obtain a target video.

The first image is fused into the motion information stream, and contains face information incapable of identifying the original identity because the first image is an anonymized image, the anonymized face information is fused into the motion information, so that an anonymized target video is obtained, sensitive privacy information related to the personal identity such as five officials is removed from the generated target video, but the information such as the number, the gesture and the motion of other people in the video is reserved, and the method can still be used for various applications including crowd monitoring and the like. The conventional video anonymization processing means is to divide the video into frames, anonymize faces in the frames, and reconstruct the anonymized frames into video, so that the problems of low processing efficiency, hard anonymization effect and the like exist.

In an alternative embodiment, when step S103 is performed, specific steps are performed: and based on the generated countermeasure network, replacing the face information in the reference image with the generated face image to obtain the first image.

The specific extraction method is set forth in the above embodiment, and is not repeated herein, and then a countermeasure generation network (GAN) is adopted to generate a realistic but actually nonexistent face, so as to replace the real face in the reference image, and identity information contained in the face is different, namely, the face belongs to different people in judgment of a viewer or a face recognition system, so that anonymization of the reference image is realized.

In an alternative embodiment, as shown in fig. 2, motion estimation is performed on an original video, motion information of a face corresponding to the reference image in the original video is extracted, and a motion information stream of the face in the reference image is obtained, including steps S201 to S205:

in step S201, a key frame sequence is acquired.

Wherein a key frame sequence in the original video is obtained as a set of motion reference imagesThe number of key frames may be N, where N is a positive integer, and the key frames may be acquired by specifically acquiring an interval time, that is, acquiring one frame every fixed time T as a key frame. In the key frame sequence, the faces contained in each key frame correspond to the faces of the reference image, and it is noted that the faces contained in the key frames do not necessarily correspond to the faces of the reference image one by one, so long as the faces in the key frames correspond to one or more faces in the reference image.

Specifically, step S201 includes: the specific detection method for detecting the face of the corresponding reference image in the original video is already provided in the above embodiment, and will not be described herein. The method comprises the steps of obtaining a key frame sequence, wherein the faces corresponding to a reference image in the key frame sequence have motion changes, namely, the faces in each key frame are in different states, can be different in direction or/and gesture, can also be different in position of the faces, and can be considered to have motion changes as long as the same face in the key frame has changes.

In step S202, key point extraction is performed on a face corresponding to the reference image in each frame in the key frame sequence, so as to generate a first characterization matrix for characterizing key points of the key frame sequence.

Extracting key points of faces corresponding to the reference image in each frame in the key frame sequence, specifically, setting face images, positioning key area positions of faces including eyebrows, eyes, nose, mouth, face contours and the like, extracting features, and expressing a first characterization matrix as followsDimension matrix->。

In step S203, key point extraction is performed on the face in the reference image, and a second characterization matrix characterizing the key points of the reference image is generated.

Wherein the second characterization matrix is expressed asDimension matrix->。

Specifically, the key points in steps S202 and S203 are determined by the pre-trained deep neural network model, and may specifically be, but not include, one of model-based ASM and AAM, cascade shape regression CPR-based, or deep learning-based methods.

In step S204, each of the first characterization matrices is divided by the second characterization matrix, so as to calculate a third characterization matrix characterizing the motion of the reference image.

Wherein each is toDimension matrix->Divided by->Dimension matrix->Obtaining a third characterization matrixAnd obtaining the motion information of the faces in the N characterization key frame sequences relative to the faces in the reference images.

In step S205, according to the third characterization matrix, a motion information stream of motion information of the face corresponding to the reference image in the original video is obtained.

The N third characterization matrices obtained in S204 are amplified to global, and motion information of the whole original video relative to the face in the reference image is calculated.

Specifically, as shown in fig. 3, when step S205 is executed, steps S301 to S303 are executed:

in step S301, the N third characterization matrices are arranged in time order.

In step S302, a characterization matrix of the motion of the face of the reference image corresponding to the other frames except the key frame sequence in the original video is obtained through interpolation operation.

In step S303, a motion information stream representing motion information of a face of a corresponding reference image in an original video is obtained.

In an alternative embodiment, in executing step S105, specific execution is performed: and fusing the first image into the motion information stream based on the fusion network to obtain the first image.

The fusion network may be image fusion based on deep learning or other image fusion algorithms.

Based on the same concept, the exemplary embodiment of the present disclosure further provides a video privacy protecting apparatus, as shown in fig. 4, the video privacy protecting apparatus 400 includes: the video acquisition module 401 acquires an original video; a reference image selection module 402, configured to extract a reference image containing a face from the original video; an image anonymization processing module 403, configured to anonymize a face in the reference image to obtain a first image; the motion information estimation module 404 extracts motion information of a face in the original video corresponding to the reference image according to the reference image to obtain a motion information stream of the face in the reference image; and the video generating module 405 fuses the first image and the motion information stream to obtain a target video.

In an alternative embodiment, the image anonymization processing module 403 further includes: and based on the generated countermeasure network, replacing the face information in the reference image by the generated face image to obtain a first image.

In an alternative embodiment, as shown in fig. 5, the motion information estimation module 404 further includes: a key frame acquisition unit 501 that acquires a key frame sequence; the first extraction unit 502 performs key point extraction on a face corresponding to the reference image in each frame in the key frame sequence, and generates a first characterization matrix for characterizing key points of the key frame sequence; a second extraction unit 503, configured to perform key point extraction on a face in the reference image, and generate a second characterization matrix that characterizes key points of the reference image; an estimation unit 504, comparing each first characterization matrix with the second characterization matrix, and calculating a third characterization matrix for characterizing the motion of the reference image; and the output unit 505 obtains a motion information stream of the motion information of the face corresponding to the reference image in the original video according to the third characterization matrix.

In an alternative embodiment, the motion information estimation module 404 further includes: and detecting the face corresponding to the reference image in the original video to obtain a key frame sequence, wherein the face corresponding to the reference image in the key frame sequence has motion change.

In an alternative embodiment, the keypoints in the first extraction unit 502 and the second extraction unit 503 are determined by a pre-trained deep neural network model.

In an alternative embodiment, the output unit 505 specifically includes: arranging the third characterization matrix according to a time sequence; obtaining a characterization matrix of the motion of the face of the reference image corresponding to other frames except the key frame sequence in the original video through interpolation operation; and obtaining a motion information stream representing the motion information of the face of the corresponding reference image in the original video.

In an alternative embodiment, the video generation module 405 further includes: the first image is fused into the motion information stream based on the fusion network.

The embodiment of the specification provides a video privacy protection device, which can solve the contradiction between privacy protection and video file application value in the prior video privacy protection, and can keep the integrity of other privacy-independent information as much as possible while removing the identity information of people, such as the number of people, the peer relationship of people, the actions of people and the like in the video. So that the processed video file can still be applied to other video analysis tasks.

The specific details of each module/unit in the above apparatus are already described in the method section embodiments, and the details not disclosed may refer to the method section embodiments, so that they will not be described in detail.

Based on the same thought, the embodiment of the present disclosure further provides a video privacy protection device, as shown in fig. 6.

The video privacy protecting device may be a terminal device or a server provided in the above embodiments.

The video privacy preserving device may vary widely in configuration or performance, may include one or more processors 601 and memory 602, and may have one or more stored applications or data stored in memory 602. The memory 502 may include readable media in the form of volatile memory units, such as Random Access Memory (RAM) units and/or cache memory units, and may further include read-only memory units. The application programs stored in memory 602 may include one or more program modules (not shown) including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment. Still further, the processor 601 may be arranged to communicate with the memory 602, executing a series of computer executable instructions in the memory 602 on the video privacy preserving device. The video privacy preserving device may also include one or more power supplies 603, one or more wired or wireless network interfaces 604, one or more I/O interfaces (input output interfaces) 605, one or more external devices 606 (e.g., keyboard, pointing device, bluetooth device, etc.), and may also communicate with one or more devices that enable a user to interact with the device, and/or with any device that enables the device to communicate with one or more other computing devices (e.g., routers, modems, etc.). Such communication may occur through the I/O interface 605. Also, devices can communicate with one or more networks (e.g., a Local Area Network (LAN)) via a wired or wireless interface 604.

In particular, in this embodiment, the video privacy preserving apparatus includes a memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs may include one or more modules, and each module may include a series of computer executable instructions for the video privacy preserving apparatus, and configured to be executed by the one or more processors, the one or more programs comprising computer executable instructions for:

the anonymizing the face in the reference image to obtain a first image includes: and based on the generated countermeasure network, replacing the face information in the reference image with the generated face image to obtain the first image.

The step of extracting the motion information of the face in the original video corresponding to the reference image according to the reference image to obtain the motion information stream of the face in the reference image comprises the following steps: acquiring a key frame sequence; extracting key points from faces corresponding to the reference image in each frame in the key frame sequence, and generating a first characterization matrix for characterizing the key points of the key frame sequence; extracting key points of the faces in the reference image to generate a second characterization matrix for characterizing the key points of the reference image; comparing each first characterization matrix with a second characterization matrix, and calculating to obtain a third characterization matrix for characterizing the motion of the reference image; and obtaining a motion information stream of the motion information of the face corresponding to the reference image in the original video according to the third characterization matrix.

The acquiring the key frame sequence comprises the following steps: and detecting the face corresponding to the reference image in the original video to obtain a key frame sequence, wherein the face corresponding to the reference image in the key frame sequence has motion change.

The keypoints are determined by a pre-trained deep neural network model.

The obtaining the motion information stream of the motion information of the face corresponding to the reference image in the original video according to the third characterization matrix comprises the following steps: arranging the third characterization matrix according to a time sequence; obtaining a characterization matrix of the motion of the face of the reference image corresponding to other frames except the key frame sequence in the original video through interpolation operation; and obtaining a motion information stream representing motion information of the face corresponding to the reference image in the original video.

The fusing the first image and the motion information stream to obtain a target video comprises the following steps: and fusing the first image into the motion information stream based on a fusion network to obtain the target video.

Based on the same idea, exemplary embodiments of the present disclosure further provide a computer readable storage medium having stored thereon a program product capable of implementing the method described in the present specification. In some possible implementations, various aspects of the disclosure may also be implemented in the form of a program product comprising program code for causing a terminal device to carry out the steps according to the various exemplary embodiments of the disclosure as described in the "exemplary methods" section of this specification, when the program product is run on the terminal device.

Referring to fig. 7, a program product 700 for implementing the above-described method according to an exemplary embodiment of the present disclosure is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present disclosure is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The computer readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).

From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, including several instructions to cause a computing device (may be a personal computer, a server, a terminal device, or a network device, etc.) to perform the method according to the exemplary embodiments of the present disclosure.

Furthermore, the above-described figures are only schematic illustrations of processes included in the method according to the exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily appreciated that the processes shown in the above figures do not indicate or limit the temporal order of these processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, for example, among a plurality of modules.

It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit in accordance with exemplary embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method of video privacy protection, the method comprising:

acquiring an original video;

extracting a reference image containing a human face from the original video;

anonymizing the face in the reference image to obtain a first image;

detecting faces corresponding to the reference image in the original video according to the reference image to obtain a key frame sequence, wherein motion changes exist in the faces corresponding to the reference image in the key frame sequence, key point extraction is performed on the faces corresponding to the reference image in each frame in the key frame sequence, and a first characterization matrix representing key points of the key frame sequence is generated;

extracting key points of the faces in the reference image to generate a second characterization matrix for characterizing the key points of the reference image;

dividing each first characterization matrix by the second characterization matrix, and calculating to obtain a third characterization matrix for characterizing the motion of the reference image;

arranging the third characterization matrix according to time sequence, and obtaining a characterization matrix of the motion of the face of the reference image corresponding to other frames except the key frame sequence in the original video through interpolation operation, so as to obtain a motion information stream for characterizing the motion information of the face of the reference image in the original video;

and fusing the first image and the motion information stream to obtain a target video.

2. The video privacy protection method according to claim 1, wherein the anonymizing the face in the reference image to obtain a first image includes: and based on the generated countermeasure network, replacing the face information in the reference image with the generated face image to obtain the first image.

3. The video privacy preserving method of claim 1, wherein the keypoints are determined by a pre-trained deep neural network model.

4. The method according to claim 1, wherein the fusing the first image and the motion information stream to obtain the target video comprises: and fusing the first image into the motion information stream based on a fusion network to obtain the target video.

5. A video privacy preserving apparatus, comprising:

the video acquisition module acquires an original video;

the reference image selecting module extracts a reference image containing a human face from the original video;

the image anonymization processing module performs anonymization processing on the face in the reference image to obtain a first image;

the motion information estimation module is used for detecting the face corresponding to the reference image in the original video according to the reference image to obtain a key frame sequence, wherein motion changes exist in the face corresponding to the reference image in the key frame sequence, key point extraction is carried out on the face corresponding to the reference image in each frame in the key frame sequence to generate a first characterization matrix representing the key points of the key frame sequence, key point extraction is carried out on the face in the reference image to generate a second characterization matrix representing the key points of the reference image, each first characterization matrix is divided by the second characterization matrix, a third characterization matrix representing the motion of the reference image is obtained through calculation, the third characterization matrix is arranged according to time sequence, and motion information flow representing the motion information of the face corresponding to the reference image in the original video is obtained through interpolation operation;

and the video generation module fuses the first image and the motion information stream to obtain a target video.

6. A video privacy preserving apparatus, comprising:

a processor; and a memory arranged to store computer executable instructions that, when executed, cause the processor to: acquiring an original video;

extracting a reference image containing a human face from the original video;

anonymizing the face in the reference image to obtain a first image;

7. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the video privacy protection method of any of claims 1 to 4.