CN113139527A

CN113139527A - Video privacy protection method, device, equipment and storage medium

Info

Publication number: CN113139527A
Application number: CN202110593414.XA
Authority: CN
Inventors: 刘勃; 孙杨; 王煜龙
Original assignee: Shenzhen Wenda Zhitong Technology Co ltd
Current assignee: Shenzhen Wenda Zhitong Technology Co ltd
Priority date: 2021-05-28
Filing date: 2021-05-28
Publication date: 2021-07-20
Anticipated expiration: 2041-05-28
Also published as: CN113139527B

Abstract

The invention discloses a video privacy protection method, a device, equipment and a storage medium, wherein the method comprises the following steps: acquiring an original video; extracting a reference image containing a human face from the original video; anonymizing the face in the reference image to obtain a first image; extracting the motion information of the face in the original video corresponding to the reference image according to the reference image to obtain the motion information flow of the face in the reference image; and fusing the first image and the motion information stream to obtain a target video. The invention provides a video privacy protection method and device, which can solve the contradiction between privacy protection and video file application value of the conventional video privacy protection.

Description

Video privacy protection method, device, equipment and storage medium

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a storage medium for protecting video privacy.

Background

The rapid development of applications such as video monitoring and short video platforms facilitates the life of people. However, these videos contain sensitive information of many individuals, especially the identity information of human faces. With the increasingly widespread use of computer vision technology and deep learning, the privacy and security of video data is severely compromised. Current video privacy protection techniques typically obfuscate or delete privacy-sensitive information, which has proven to be vulnerable and ineffective for utilities. And the video data processed by the method loses a lot of information irrelevant to privacy, thereby greatly reducing the practical value of the data.

In summary, the ideal video privacy protection method should be able to remove the identity information of people and keep the integrity of other information that is not relevant to privacy as much as possible, such as the number of people appearing in the video, the peer relationship of people, the actions of people, and so on. So that the processed video file can still be applied to other video analysis tasks.

Disclosure of Invention

In view of the above technical problems, the present invention provides a method, an apparatus, a device and a storage medium for protecting video privacy, so as to provide a technical solution that can solve the contradiction between privacy protection and video file application value in the existing video privacy protection.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.

According to an aspect of the present invention, a video privacy protection method is provided, the method including: acquiring an original video; extracting a reference image containing a human face from the original video; anonymizing the face in the reference image to obtain a first image; extracting the motion information of the face in the original video corresponding to the reference image according to the reference image to obtain the motion information flow of the face in the reference image; and fusing the first image and the motion information stream to obtain a target video.

Further, the anonymizing the face in the reference image to obtain the first image includes: and replacing the face information in the reference image by using the generated face image based on the generated confrontation network to obtain the first image.

Further, the extracting, according to the reference image, motion information of a face in the original video corresponding to the reference image to obtain a motion information stream of the face in the reference image includes: acquiring a key frame sequence; extracting key points of the face corresponding to the reference image in each frame of the key frame sequence to generate a first characterization matrix for characterizing the key points of the key frame sequence; extracting key points of the face in the reference image to generate a second characterization matrix for characterizing the key points of the reference image; comparing each first characterization matrix with each second characterization matrix, and calculating to obtain a third characterization matrix representing the motion of the reference image; and obtaining a motion information flow of the motion information of the face corresponding to the reference image in the original video according to the third feature matrix.

Further, the obtaining the key frame sequence includes: and detecting the face corresponding to the reference image in the original video to obtain a key frame sequence, wherein the face corresponding to the reference image in the key frame sequence has motion change.

Further, the key points are determined by a pre-trained deep neural network model.

Further, the obtaining, according to the third feature matrix, a motion information stream of motion information of a face corresponding to the reference image in the original video includes: arranging the third feature matrix according to a time sequence; obtaining a representation matrix of the motion of the human face of the reference image corresponding to other frames except the key frame sequence in the original video through interpolation operation; and obtaining a motion information stream representing the motion information of the face corresponding to the reference image in the original video.

Further, the fusing the first image and the motion information stream to obtain a target video includes: and fusing the first image into the motion information stream based on a fusion network to obtain the target video.

According to a second aspect of the present disclosure, there is provided a video privacy protecting apparatus comprising: the video acquisition module is used for acquiring an original video; the reference image selection module is used for extracting a reference image containing a human face from the original video; the image anonymization processing module is used for carrying out anonymization processing on the face in the reference image to obtain a first image; the motion information estimation module extracts the motion information of the face in the original video corresponding to the reference image according to the reference image to obtain a motion information stream of the face in the reference image; and the video generation module is used for fusing the first image and the motion information stream to obtain a target video.

According to a third aspect of the present disclosure, there is provided a video privacy protecting apparatus comprising: a processor; and a memory arranged to store computer executable instructions that, when executed, cause the processor to: acquiring an original video; extracting a reference image containing a human face from the original video; anonymizing the face in the reference image to obtain a first image; extracting the motion information of the face in the original video corresponding to the reference image according to the reference image to obtain the motion information flow of the face in the reference image; and fusing the first image and the motion information stream to obtain a target video.

According to a fourth aspect of the present disclosure, there is provided a computer-readable storage medium storing a computer program which, when executed by a processor, performs the above-described video privacy protection method.

The technical scheme of the disclosure has the following beneficial effects:

according to the video privacy protection method, device and equipment based on anonymization processing, the contradiction between privacy protection and video file application value of the existing video privacy protection can be solved, and the integrity of other privacy irrelevant information, such as the number of people appearing in a video, the peer relationship of people, the actions of people and the like, can be maintained as much as possible while the identity information of people is removed. So that the processed video file can still be applied to other video analysis tasks.

Drawings

Fig. 1 is a flowchart of a method for protecting video privacy according to an embodiment of the present disclosure;

FIG. 2 is a flow chart of obtaining a motion information stream in an embodiment of the present description;

FIG. 3 is a flow chart of obtaining a motion information stream in an embodiment of the present description;

fig. 4 is a block diagram of a video privacy protecting apparatus in an embodiment of the present specification;

fig. 5 is a block diagram of a motion estimation module in an embodiment of the present specification;

fig. 6 is a terminal device for implementing a video privacy protection method in an embodiment of the present specification;

fig. 7 is a computer-readable storage medium for implementing a video privacy protection method in an embodiment of the present specification.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and the like. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.

Furthermore, the drawings are only schematic illustrations of the present disclosure. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

As shown in fig. 1, an embodiment of the present specification provides a video privacy protection method, where an execution subject of the method may be a terminal device, where the terminal device may be a mobile phone, a tablet computer, a personal computer, and the like. The method may specifically include the following steps S101 to S105:

in step S101, an original video is acquired.

The original video may be a directly acquired video, that is, a video that can be directly captured by a video recording device such as a camera, a video camera, a mobile phone, or the like, or a video stored in a memory.

In step S102, a reference image containing a human face is extracted from the original video.

The reference image contains one or more faces which need anonymous protection, and the faces in the reference image are clear and comprehensive in facial information for easy identification.

In step S103, a first image is obtained by performing anonymization processing on the face in the reference image.

The method includes the steps that a human face in a reference image can be detected through a human face detector, the human face detector can be one of or does not include Haar cascades, a HOG + linear support vector machine and a deep learning-based human face detector, after a human face region in the reference image is extracted, anonymization processing is conducted on the human face in the reference image, the human face after anonymization processing is stored in an original image, and a first image is obtained, wherein the anonymization processing can be used for replacing or blurring human face information, and the anonymization processing means belongs to anonymization processing means provided by the disclosure as long as identity information of the human face needing anonymization can be hidden.

In step S104, according to the reference image, extracting motion information corresponding to the face in the reference image in the original video to obtain a motion information stream of the face in the reference image.

According to the face in the reference image, the position corresponding to the face is searched in each frame of the original video, and a motion path of the face is established, wherein the motion path is a motion information stream of the face in the reference image.

In step S105, the first image and the motion information stream are fused to obtain a target video.

The first image is fused into the motion information stream, the first image is an anonymized image and contains face information which cannot identify an original identity, the face information which is anonymized is fused into the motion information, an anonymized target video is obtained, sensitive privacy information such as five officials and the like related to the personal identity of the generated target video are removed, but other information such as the number, the posture, the motion and the like of people in the video is reserved, and the method can still be used for various applications such as crowd monitoring and the like. In the embodiment, a reference image, namely a first image, after anonymization is utilized to be fused into a motion information stream corresponding to the face in the reference image, so that the processing efficiency is greatly improved.

In an alternative embodiment, when step S103 is executed, specifically: and replacing the face information in the reference image by using the generated face image based on the generated confrontation network to obtain the first image.

Firstly, extracting a face part of a reference image is required, a specific extraction method is provided in the above embodiment, and is not repeated herein, then a confrontation generation network (GAN) is adopted to generate a vivid but actually nonexistent face, and replace the real face in the reference image, the two faces contain different identity information, namely belong to different people in the judgment of a viewer or a face recognition system, so as to realize anonymization of the reference image.

In an alternative embodiment, as shown in fig. 2, performing motion estimation on an original video, extracting motion information of a face corresponding to a reference image in the original video, and obtaining a motion information stream of the face in the reference image includes steps S201 to S205:

in step S201, a sequence of key frames is acquired.

Wherein a sequence of key frames in an original video is obtained as a set of motion reference images

The number of the key frames may be N, where N is a positive integer, and the key frames may be obtained by time intervals, that is, one frame is obtained as a key frame every fixed time T. In the sequence of the key frames, the face contained in each frame of the key frame corresponds to the face of the reference image, and it should be noted that the face contained in the key frame does not necessarily correspond to the face of the reference image one to one, as long as there is one or more correspondence between the face in the key frame and the face in the reference image.

Specifically, step S201 includes: the specific detection method is proposed in the above embodiments, and is not described herein again. The method comprises the steps of obtaining a key frame sequence, wherein the human face corresponding to a reference image in the key frame sequence has motion change, namely, the human face in each key frame is in different states, which can be different in direction or/and posture, or different in position of the human face, and in summary, as long as the same human face in the key frame has change, the motion change can be considered to exist.

In step S202, a first characterization matrix characterizing the key points of the sequence of key frames is generated by extracting the key points of the face corresponding to the reference image in each frame of the sequence of key frames.

Extracting key points of a face corresponding to the reference image in each frame of the key frame sequence, specifically, giving a face image, positioning key region positions of the face, including eyebrows, eyes, a nose, a mouth, a face contour and the like, extracting characteristics, and expressing a first characterization matrix into an L multiplied by N dimensional matrix

In step S203, extracting key points of the face in the reference image, and generating a second characterization matrix characterizing the key points of the reference image.

Wherein the second characterization matrix is expressed as an L-dimensional matrix M_b。

Specifically, the key points in steps S202 and S203 are determined by a pre-trained deep neural network model, which may specifically be but does not include one of model-based ASM and AAM, CPR based on cascade shape regression, or a deep learning based method.

In step S204, comparing each of the first characterization matrices with the second characterization matrix, and calculating a third characterization matrix representing the motion of the reference image.

Wherein each of the L × N dimensional matrices is divided into

Divided by L-dimensional matrix M_bObtaining a third characterization matrix

Namely, the motion information of the human face in the N characteristic key frame sequences relative to the human face in the reference image is obtained.

In step S205, a motion information stream of the motion information of the face corresponding to the reference image in the original video is obtained according to the third feature matrix.

And amplifying the N third feature matrixes obtained in the step S204 into the whole, and calculating the motion information of the whole original video relative to the human face in the reference image.

Specifically, as shown in fig. 3, when step S205 is executed, steps S301 to S303 are specifically executed:

in step S301, N third feature matrices are arranged in time order.

In step S302, a characterization matrix of the motion of the face of the reference image corresponding to other frames in the original video except the sequence of the key frames is obtained through interpolation operation.

In step S303, a motion information stream representing motion information of a face corresponding to the reference image in the original video is obtained.

In an alternative embodiment, when step S105 is executed, specifically: and fusing the first image into the motion information flow based on the fusion network to obtain the first image.

The fusion network may be an image fusion algorithm based on deep learning or other image fusion algorithms.

Based on the same idea, the exemplary embodiment of the present disclosure also provides a video privacy protecting apparatus, as shown in fig. 4, the video privacy protecting apparatus 400 includes: the video acquisition module 401 acquires an original video; a reference image selecting module 402, which extracts a reference image containing a human face from the original video; an image anonymization processing module 403, configured to perform anonymization processing on the face in the reference image to obtain a first image; a motion information estimation module 404, configured to extract, according to the reference image, motion information corresponding to a face in the reference image in the original video, so as to obtain a motion information stream of the face in the reference image; and a video generation module 405, which fuses the first image and the motion information stream to obtain a target video.

In an alternative embodiment, the image anonymization processing module 403 further comprises: and based on the generated countermeasure network, replacing the face information in the reference image by the generated face image to obtain a first image.

In an alternative embodiment, as shown in fig. 5, the motion information estimation module 404 further includes: a key frame obtaining unit 501, which obtains a key frame sequence; a first extraction unit 502, which extracts key points of a face corresponding to a reference image in each frame of the key frame sequence, and generates a first representation matrix representing the key points of the key frame sequence; a second extraction unit 503, configured to perform key point extraction on the face in the reference image, and generate a second representation matrix representing key points of the reference image; an estimating unit 504, configured to compare each first characterization matrix with the second characterization matrix, and calculate a third characterization matrix representing the motion of the reference image; the output unit 505 obtains a motion information stream of the motion information of the face corresponding to the reference image in the original video according to the third feature matrix.

In an alternative embodiment, the motion information estimation module 404 further includes: and detecting the face corresponding to the reference image in the original video to obtain a key frame sequence, wherein the face corresponding to the reference image in the key frame sequence has motion change.

In an alternative embodiment, the key points in the first extraction unit 502 and the second extraction unit 503 are determined by a pre-trained deep neural network model.

In an alternative embodiment, the output unit 505 specifically includes: arranging the third token matrixes according to a time sequence; obtaining a characterization matrix of the motion of the face of the reference image corresponding to other frames except the key frame sequence in the original video through interpolation operation; and obtaining a motion information flow representing the motion information of the face corresponding to the reference image in the original video.

In an alternative embodiment, the video generation module 405 further comprises: the first image is fused into the motion information stream based on a fusion network.

The embodiment of the specification provides a video privacy protection device, which can solve the contradiction between privacy protection and video file application value in the existing video privacy protection, and can remove the identity information of people and keep the integrity of other information irrelevant to privacy as much as possible, such as the number of people appearing in a video, the peer relationship of people, the actions of people, and the like. So that the processed video file can still be applied to other video analysis tasks.

The specific details of each module/unit in the above-mentioned apparatus have been described in detail in the method section, and the details that are not disclosed may refer to the contents of the method section, and thus are not described again.

Based on the same idea, embodiments of the present specification further provide a video privacy protecting device, as shown in fig. 6.

The video privacy protection device may be the terminal device or the server provided in the above embodiments.

The video privacy protecting apparatus may have a large difference due to different configurations or performances, and may include one or more processors 601 and a memory 602, where one or more stored applications or data may be stored in the memory 602. Memory 502 may include a readable medium in the form of a volatile memory unit such as a random access memory unit (RAM) and/or a cache memory unit, among others, and may further include a read-only memory unit. The application programs stored in memory 602 may include one or more program modules (not shown), including but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment. Still further, the processor 601 may be configured to communicate with the memory 602 to execute a series of computer-executable instructions in the memory 602 on the video privacy protection apparatus. The video privacy protection apparatus may also include one or more power supplies 603, one or more wired or wireless network interfaces 604, one or more I/O interfaces (input output interfaces) 605, one or more external devices 606 (e.g., keyboard, pointing device, bluetooth device, etc.), may also communicate with one or more devices that enable a user to interact with the apparatus, and/or with any devices (e.g., router, modem, etc.) that enable the apparatus to communicate with one or more other computing devices. Such communication may occur via the I/O interface 605. Also, the device may communicate with one or more networks (e.g., a Local Area Network (LAN)) via a wired or wireless interface 604.

In particular, in this embodiment, the video privacy protecting apparatus includes a memory, and one or more programs, where the one or more programs are stored in the memory, and the one or more programs may include one or more modules, and each module may include a series of computer-executable instructions for the video privacy protecting apparatus, and the one or more programs configured to be executed by the one or more processors include computer-executable instructions for:

the anonymizing the face in the reference image to obtain a first image includes: and replacing the face information in the reference image by using the generated face image based on the generated confrontation network to obtain the first image.

The extracting, according to the reference image, motion information corresponding to a face in the reference image in the original video to obtain a motion information stream of the face in the reference image includes: acquiring a key frame sequence; extracting key points of the face corresponding to the reference image in each frame of the key frame sequence to generate a first characterization matrix for characterizing the key points of the key frame sequence; extracting key points of the face in the reference image to generate a second characterization matrix for characterizing the key points of the reference image; comparing each first characterization matrix with each second characterization matrix, and calculating to obtain a third characterization matrix representing the motion of the reference image; and obtaining a motion information flow of the motion information of the face corresponding to the reference image in the original video according to the third feature matrix.

The obtaining a sequence of key frames comprises: and detecting the face corresponding to the reference image in the original video to obtain a key frame sequence, wherein the face corresponding to the reference image in the key frame sequence has motion change.

The key points are determined by a pre-trained deep neural network model.

The obtaining, according to the third feature matrix, a motion information stream of motion information of a face corresponding to the reference image in the original video includes: arranging the third feature matrix according to a time sequence; obtaining a representation matrix of the motion of the human face of the reference image corresponding to other frames except the key frame sequence in the original video through interpolation operation; and obtaining a motion information stream representing the motion information of the face corresponding to the reference image in the original video.

The fusing the first image and the motion information stream to obtain the target video includes: and fusing the first image into the motion information stream based on a fusion network to obtain the target video.

Based on the same idea, the exemplary embodiments of the present disclosure also provide a computer-readable storage medium on which a program product capable of implementing the above-described method of the present specification is stored. In some possible embodiments, various aspects of the disclosure may also be implemented in the form of a program product comprising program code for causing a terminal device to perform the steps according to various exemplary embodiments of the disclosure described in the above-mentioned "exemplary methods" section of this specification, when the program product is run on the terminal device.

Referring to fig. 7, a program product 700 for implementing the above method according to an exemplary embodiment of the present disclosure is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present disclosure is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the exemplary embodiments of the present disclosure.

Furthermore, the above-described figures are merely schematic illustrations of processes included in methods according to exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functions of two or more modules or units described above may be embodied in one module or unit, according to exemplary embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method for protecting privacy of video, the method comprising:

acquiring an original video;

extracting a reference image containing a human face from the original video;

anonymizing the face in the reference image to obtain a first image;

extracting the motion information of the face in the original video corresponding to the reference image according to the reference image to obtain the motion information flow of the face in the reference image;

and fusing the first image and the motion information stream to obtain a target video.

2. The method according to claim 1, wherein the anonymizing the face in the reference image to obtain the first image comprises: and replacing the face information in the reference image by using the generated face image based on the generated confrontation network to obtain the first image.

3. The method according to claim 1, wherein the extracting, according to the reference image, motion information corresponding to a face in the reference image in the original video to obtain a motion information stream of the face in the reference image comprises:

acquiring a key frame sequence;

extracting key points of the face corresponding to the reference image in each frame of the key frame sequence to generate a first characterization matrix for characterizing the key points of the key frame sequence;

extracting key points of the face in the reference image to generate a second characterization matrix for characterizing the key points of the reference image;

comparing each first characterization matrix with each second characterization matrix, and calculating to obtain a third characterization matrix representing the motion of the reference image;

and obtaining a motion information flow of the motion information of the face corresponding to the reference image in the original video according to the third feature matrix.

4. The method according to claim 3, wherein the obtaining the key frame sequence comprises: and detecting the face corresponding to the reference image in the original video to obtain a key frame sequence, wherein the face corresponding to the reference image in the key frame sequence has motion change.

5. The method according to claim 3, wherein the key points are determined by a pre-trained deep neural network model.

6. The method of claim 3, wherein obtaining a motion information stream of motion information of a face of the original video corresponding to the reference image according to the third feature matrix comprises:

arranging the third feature matrix according to a time sequence;

obtaining a representation matrix of the motion of the human face of the reference image corresponding to other frames except the key frame sequence in the original video through interpolation operation;

and obtaining a motion information stream representing the motion information of the face corresponding to the reference image in the original video.

7. The method for protecting video privacy according to claim 1, wherein the fusing the first image and the motion information stream to obtain a target video includes: and fusing the first image into the motion information stream based on a fusion network to obtain the target video.

8. A video privacy protection apparatus, comprising:

the video acquisition module is used for acquiring an original video;

the reference image selection module is used for extracting a reference image containing a human face from the original video;

the image anonymization processing module is used for carrying out anonymization processing on the face in the reference image to obtain a first image;

the motion information estimation module extracts the motion information of the face in the original video corresponding to the reference image according to the reference image to obtain a motion information stream of the face in the reference image;

and the video generation module is used for fusing the first image and the motion information stream to obtain a target video.

9. A video privacy protection device, comprising:

a processor; and a memory arranged to store computer executable instructions that, when executed, cause the processor to: acquiring an original video;

extracting a reference image containing a human face from the original video;

anonymizing the face in the reference image to obtain a first image;

10. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the video privacy protection method of any one of claims 1 to 7.