CN113807275A

CN113807275A - Household video falling detection method based on privacy protection type GAN feature enhancement

Info

Publication number: CN113807275A
Application number: CN202111113708.4A
Authority: CN
Inventors: 刘佶鑫; 孟茹
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2021-09-22
Filing date: 2021-09-22
Publication date: 2021-12-17
Anticipated expiration: 2041-09-22
Also published as: CN113807275B

Abstract

The invention relates to a privacy protection type GAN feature enhancement-based household video fall detection method, which mainly comprises the following steps: (1) inputting a training data set of an original video; (2) multilayer compressed sensing processing; (3) extracting a foreground moving target; (4) obtaining a fall classification result: and (3) performing space-time migration compensation on the foreground moving target in the video with the visual privacy protection effect obtained in the step (3) by using a GAN network with an auxiliary classifier, so that the lost features in the video with the visual privacy protection effect are compensated, and a final falling classification result is obtained on the basis. The invention solves the problem that privacy leakage risks exist in the current home video monitoring process facing the health monitoring of the old. The falling detection of the old people at home can be realized, the protection of the personal and family privacy of the old people can be met, and the practical application value is high.

Description

Household video falling detection method based on privacy protection type GAN feature enhancement

Technical Field

The invention belongs to the field of computer vision and pattern recognition, and particularly relates to a privacy protection type GAN feature enhancement-based old people home video fall detection method.

Background

With the increasing seriousness of China and the problem of household empty nestification, the safety monitoring problem of the solitary old people receives wide attention. The fall is the main reason of the injury and death of the old people, the risk of falling is increased along with the increase of the age, and once the fall occurs, the body and the mind of the old people can be greatly and negatively influenced. Along with the concern of the society on the healthy life of the old, various fall detection modes are developed, and fall detection technologies at the present stage are mainly based on wearable equipment, scene sensors and computer vision. Wearable equipment needs to be worn with a user and is charged at regular time, so that great potential safety hazards exist; the scene sensor technology needs to arrange a large number of various sensors in the range of activities of the old, so that the cost is high and the maintenance is difficult; in contrast, the fall detection technology based on computer vision receives more and more attention of researchers due to the advantages of rich monitoring information, a non-contact monitoring mode, a zero electromagnetic interference monitoring environment and the like. The computer vision fall detection technology is a research direction which is widely concerned at present, but has the defect of personal privacy disclosure, because the computer vision technology is based on one or more cameras to take a picture/a detected object in real time, and uploads a video/picture through a network or other communication modes, in the process, the hidden danger of privacy disclosure exists, such as clothes wearing or toilet process and the like. Even if some technical means are taken to deal with the privacy process, the relevant private information of the home decoration or environmental arrangement, such as gold and silver articles or luxury artwork, is exposed.

CN110942009A provides a method for detecting a fall based on a space-time hybrid convolutional network, which adopts a method for detecting a space-time hybrid convolutional network to divide detection into two parts, namely, coordinate regression and classification, and combines with the space-time hybrid convolutional networks of 2D CNN and 3D CNN to obtain high-quality location features and category features at the same time, the features after coordinate regression are all generated by 2D CNN, the classified features are generated by 3D CNN features, then the location features and the category features are fused, model training is performed to obtain a prediction model, and the prediction model is used to detect video stream input to obtain a prediction result, that is, whether a person falls or not. In the processing process, the original video data acquired by the camera is directly processed, meanwhile, the video monitoring is used as a monitoring means for intelligent processing such as target tracking analysis and the like under the condition of not starting all the day, and the sacrifice cost is the complete disclosure of personal privacy for the application effect of timely detecting the dangerous behaviors of the old. It goes without saying that any elderly with autonomous behavioural capabilities will be reluctant to accept such an unattended monitoring mode, even if the actual risk of a critical behaviour's lack of effective monitoring is clear, as long as there is a possibility of privacy breaches in the visual or cognitive level.

With the continuous and intensive research on GAN networks, since GAN networks (GANs) can achieve a "from none to nothing" breakthrough, i.e., clear target images are obtained with noisy data and are visually indistinguishable from the original target images. It is easy to guess: since GAN can be implemented from none to nothing, it should theoretically be possible to implement a change from "having" to "more". This means that GAN network architecture can be used as a new information migration tool to explore the best balance between quality improvement and privacy protection.

Disclosure of Invention

In order to solve the problem of privacy invasion in the current video monitoring-based old people indoor fall detection method, the privacy protection type GAN feature enhancement-based old people home video fall detection method is provided, and a GAN network is used as a new information migration tool to realize fall detection as accurate as possible.

In order to achieve the purpose, the invention is realized by the following technical scheme:

the invention relates to a privacy protection type GAN feature enhancement-based household video fall detection method, which comprises the following steps:

step 1: based on a public falling detection data set, dividing training data into positive and negative sets according to the characteristics of behaviors in the data set, and manually marking corresponding class labels;

step 2: performing multilayer compressed sensing processing on an original video based on the chaotic pseudorandom Bernoulli measurement matrix to generate a video with a visual privacy protection effect on a visual layer;

and step 3: extracting a foreground moving target by using a low-rank sparse decomposition algorithm based on generalized non-convex robust principal component analysis to obtain the foreground moving target of the video with the visual privacy protection effect;

and 4, step 4: and (3) performing space-time migration compensation on the foreground moving target in the video with the visual privacy protection effect obtained in the step (3) by using a GAN network with an auxiliary classifier, so that the lost features in the video with the visual privacy protection effect are compensated, and a final falling classification result is obtained on the basis.

The GAN network with the auxiliary classifier in the step 4 comprises a generator G network and a discriminator D, both of which use a convolutional neural network, and the processing procedure thereof comprises the following steps:

step 4-1: a training process, wherein the foreground moving target characteristics of the original video in the step 1, corresponding category information and the foreground moving target characteristics compensated in the step 4 are input into a network end of a discriminator D, the foreground moving target characteristics of the video with the visual privacy protection effect after multilayer compressed sensing processing corresponding to the original video are input into a network end of a generator G, and a category information one-to-one corresponding relation is formed between the original video and the video with the visual privacy protection effect;

step 4-2: in the testing process, only the characteristics of the video with the visual privacy protection effect after the multilayer compressed sensing processing in the step 2 are input, the corresponding category information of the characteristics of the video with the visual privacy protection effect is output in the discriminator D, then the loss calculation between the output classification and the real classification is increased, and finally the accuracy of the fall detection is obtained, wherein the loss calculation expression is as follows:

L_S＝E[log P(S＝real|X_real)]+E[logP(S＝fake|X_{fake_cs})]

L_C＝E[log P(C＝cs|X_real)]+E[logP(C＝cs|X_{fake_cs})]

wherein L_SRepresenting the corresponding loss function, L, of the real sample_CLoss function representing correct class of sample，X_realRepresenting original video data, S representing original data class label, C representing class label of data after multi-layer compressed sensing processing, X_{fake_cs}Representing the data after compensation by the generator G, P (S | X), P (C | X) represent the probability distribution of the original data and the probability distribution on the class label, respectively.

The invention is further improved in that: the low-rank sparse decomposition algorithm in step 3 is represented as:

where g (-) is a non-convex, closed, normal lower semicontinuous function, σ_i(L) is the i-th singular value, S, of the low-rank matrix L_jIs the jth element, τ, γ, of the sparse matrix S>0 is a parameter, M is an element of R^m×nFor the data matrix to be processed, L ∈ R^m×nFor a low rank matrix, S ∈ R^m×nIs a sparse matrix;

the invention is further improved in that: the decomposition process of the low-rank sparse decomposition in the step 3 comprises the following steps:

step 3-1: taking noise generated in the multi-layer compression perception processing process in the step 2 as a sparse matrix, and taking the foreground and the background in the step 1 and the step 2 as low-rank matrices to remove the noise;

step 3-2: and (3) taking the background information of the data set in the step (1) and the video in the step (2) as a low-rank matrix, taking the foreground information of the data set in the step (1) and the video in the step (2) as a sparse matrix, and extracting the foreground moving target by using a low-rank sparse decomposition algorithm based on generalized non-convex robust principal component analysis again.

The invention is further improved in that: the multilayer compressed sensing processing of the step 2 specifically comprises the following steps:

step 2-1: pseudo-random numbers are generated using the metson rotation algorithm, expressed as:

wherein ,

denotes alpha_k+1The lowest or rightmost r bits of (a),

denotes alpha_kThe uppermost or leftmost w-r bits (k 0,1,2, …),

representing a bitwise exclusive-or operation; and | represents a join operation.

Step 2-2: generating a chaotic pseudorandom sequence: after pseudo random numbers are obtained through a Meisen rotation algorithm, the pseudo random numbers are used as original signals, the generated random numbers are substituted into a chaotic model to generate a chaotic pseudo random sequence, and the corresponding chaotic model is as follows:

wherein j represents the number of iterations;

step 2-3: constructing a chaotic pseudorandom Bernoulli measurement matrix: carrying out nonlinear transformation on the chaotic pseudorandom sequence obtained in the step 2-2, and then continuously carrying out sign function mapping on the basis to ensure that the mapped sequence meets Bernoulli distribution, thereby constructing a chaotic pseudorandom Bernoulli measurement matrix

And is

Expressed as:

wherein ,

to normalize the coefficients, the matrix elements are arranged in columns as a chaotic pseudorandom sequence { beta }₀,β₁,β₂…,β_n}；

The mapping function is represented as:

step 2-4: multi-layer compressed sensing sampling: partitioning the pseudorandom Bernoulli measurement matrix constructed in the step 2-3, converting the measurement process into performing inner product operation on the same position block at the moment, and performing multiple partitioning and inner product operations to realize multilayer compressed sensing coding and further realize privacy protection in video monitoring, wherein the calculation process of the multilayer compressed sensing coding is as follows:

wherein: y represents video data after multi-layer compressed sensing processing, X represents original video data, and n represents the number of layers of multi-layer compressed sensing coding processing.

The invention has the beneficial effects that:

(1) aiming at the defect of a Gaussian random mechanism in multilayer compressed sensing visual shielding coding, the chaotic pseudo-random Bernoulli mechanism is introduced to be used for generating a measurement matrix in the multilayer compressed sensing processing process, so that the application performance of the multilayer compressed sensing coding is improved while the advantage of visual privacy protection is kept;

(2) aiming at the foreground extraction of video time series data, the generalized non-convex robust principal component analysis algorithm is used for replacing typical low-rank sparse decomposition, so that the foreground extraction and extraction quality is improved, and the robustness of data processing is improved;

(3) aiming at the characteristic enhancement requirement of visual privacy protection state data, the invention takes a GAN network architecture with an auxiliary classifier as a new information migration tool to explore the optimal balance of behavior characterization between quality improvement and privacy protection.

Drawings

FIG. 1 is a schematic flow diagram of the process of the present invention;

FIG. 2 is a diagram of the effect of using a chaotic pseudo-random Bernoulli measurement matrix after sequentially performing 1-4 layers of compressed sensing processing;

FIG. 3 is foreground information of data in a visual privacy preserving state extracted using a low-rank sparse decomposition theory based on generalized non-convex robust principal component analysis.

FIG. 4 is a classifier based on a GAN network with an auxiliary classifier

Detailed Description

In the following description, for purposes of explanation, numerous implementation details are set forth in order to provide a thorough understanding of the embodiments of the invention. It should be understood, however, that these implementation details are not to be interpreted as limiting the invention. That is, in some embodiments of the invention, such implementation details are not necessary. In addition, some conventional structures and components are shown in simplified schematic form in the drawings.

The invention discloses a privacy protection type GAN feature enhancement-based household video fall detection method, which takes the household video of the old as an example, and comprises the following steps:

step 1: input training data set of raw video: based on a public falling detection data set, dividing training data into positive and negative sets according to the characteristics of behaviors in the data set, and manually marking corresponding class labels;

step 2: multilayer compressed sensing processing: performing multilayer compressed sensing processing on an original video based on the chaotic pseudorandom Bernoulli measurement matrix to generate a video with a visual privacy protection effect on a visual layer;

the method comprises the following three steps:

in a first step, pseudo random numbers are generated, which are generated using the merson rotation algorithm and are represented as:

wherein ,

denotes alpha_k+1The lowest or rightmost r bits of (a),

denotes alpha_kThe uppermost or leftmost w-r bits (k 0,1,2, …),

The second step is that: after the pseudo random number is obtained through the Meisen rotation algorithm, the pseudo random number is used as an original signal, if the pseudo random number is directly used for multilayer compressed sensing coding, the original signal reconstruction performance is poor, and therefore the pseudo random number generated in the first step can be substituted into the chaos model represented by the formula (2) to generate a chaos pseudo random sequence.

Where j represents the number of iterations.

Carrying out nonlinear transformation on the generated chaos pseudo-random sequence to ensure that the transformed chaos sequence obeys uniform distribution, and then carrying out symbolic function mapping on the chaos sequence, wherein the mapping function is expressed as:

the mapped sequences satisfy Bernoulli distribution, thereby constructing the chaotic pseudo-random Bernoulli measurement matrix proposed herein

And is

Expressed as:

wherein ,

The third step: multi-layer compressed sensing sampling: the original video is subjected to multilayer compressed sensing processing, firstly, a measurement matrix needs to be partitioned, and then the measurement process is converted into inner product operation on blocks at the same position. However, because the risk of privacy disclosure still exists in the single-layer visual privacy protection processing, in order to enhance the security of image information, the single-layer compressive sensing sampling can be expanded to the multi-layer compressive sensing sampling, and the calculation process of the multi-layer compressive sensing coding is as follows:

And step 3: and extracting the foreground moving target by using a low-rank sparse decomposition algorithm based on generalized non-convex robust principal component analysis.

After the video data is subjected to multilayer compressed sensing processing, not only can part of information be lost, but also a large amount of noise can be introduced, so that a detected target is difficult to distinguish, the difficulty of subsequent falling behavior identification is increased, therefore, background information and noise need to be removed, and the foreground information of the video data subjected to compressed sensing processing can be effectively obtained by a low-rank sparse decomposition method. Therefore, the foreground moving target is extracted, so that the accuracy rate of the subsequent fall behavior detection can be improved, the peripheral environment information of the detected object is eliminated, and the visual privacy protection is enhanced.

For a surveillance video sequence containing n frames, each frame can be vectorized into a column vector of M dimensions, whereby the entire video sequence can be seen as a data matrix M of size M × n. The background of a matrix formed by the video sequence belongs to a low-rank matrix, the foreground information corresponds to a sparse matrix, and the background information and the foreground information can be separated through low-rank sparse decomposition. Therefore, the low-rank sparse decomposition algorithm is used for reference and improvement, and the method is used for extracting the foreground moving target of the video with the visual privacy protection effect. The conventional low-rank sparse decomposition model is represented as follows:

min r ank(L)+γ||S||₀,s.t.M＝L+S (6)

wherein, M is a data matrix to be processed, L is a low-rank matrix, S is a sparse matrix, rank (L) represents a rank function of the low-rank matrix L, | | S | | y₀L representing the sparse matrix S₀Norm, which represents the sparsity of matrix S, and γ is a compromise parameter.

Since this is a non-convex model and is difficult to solve directly. The method and the device use the generalized non-convex robust principal component analysis algorithm to extract the sparse foreground component in the video with the visual privacy protection effect, and have strong robustness on noise interference and data processing. The generalized non-convex robust principal component analysis adopts a non-convex generalized kernel norm to approximate a rank function, and most importantly, in order to better approximate sparsity, non-convex generalized norm approximation is introduced to replace l representing sparse prior₀Norm, ratio l₁The norm is more approximate to the sparsity of the matrix, and the low-rank sparse decomposition model is as follows:

where g (-) is a non-convex, closed, normal lower semicontinuous function, σ_i(L) is the i-th singular value, S, of the low-rank matrix L_jIs the jth element, τ, γ, of the sparse matrix S>0 is a parameter, M is an element of R^m×nFor the data matrix to be processed, L ∈R^m×nFor a low rank matrix, S ∈ R^m×nIs a sparse matrix;

the video with the visual privacy protection effect obtained through the multi-layer compressed sensing process includes not only foreground information and background information, but also a large amount of noise introduced in the compression process, so that the matrix M of formula (7) is:

M⁽ⁿ⁾＝L⁽ⁿ⁾+(E⁽ⁿ⁾+S⁽ⁿ⁾) (8)

the decomposition process comprises the following steps:

the first step is as follows: taking noise generated in the multi-layer compression perception processing process in the step 2 as a sparse matrix, and taking the foreground and the background in the step 1 and the step 2 as low-rank matrices to remove the noise;

the second step is that: and (3) taking the background information of the data set in the step (1) and the video in the step (2) as a low-rank matrix, taking the foreground information of the data set in the step (1) and the video in the step (2) as a sparse matrix, and extracting the foreground moving target by using a low-rank sparse decomposition algorithm based on generalized non-convex robust principal component analysis again.

The GAN network with the auxiliary classifier comprises a generator G network and a discriminator D, wherein the generator G network and the discriminator D both use a convolutional neural network, the input of the G network is noise z and a corresponding class label c, the class label c is artificially calibrated, and the generator generates a sample by using the noise z and the class label c, so that the generated sample contains label information. The discriminator D gives the probability distribution of true and false samples and the categories to which the samples belong, and the loss functions of the GAN network with the auxiliary classifier are shown in the formulas (9) and (10).

L_S＝E[log P(S＝real|X_real)]+E[log P(S＝fake|X_{fake_cs})] (9)

L_C＝E[log P(C＝c|X_real)]+E[log P(C＝c|X_{fake_cs})] (10)

wherein L_SRepresenting the corresponding loss function, L, of the real sample_CLoss function, X, representing samples corresponding to the correct class_realRepresenting original video data, S representing a class label of the original data, C representing a class label of the data after multi-layer compressed sensing processing, X_{fake_cs}Representing the data after compensation by the generator G, P (S | X), P (C | X) represent the probability distribution of the original data and the probability distribution on the class label, respectively. The training goal of the discriminator is to maximize L_s+L_cThe training goal of the generator is to maximize L_s-L_c。

In the processing process of the GAN network with the auxiliary classifier, the foreground information and the label of the visual privacy state are input into the generator, the information with the visual privacy protection effect contains state characteristics, the guidance of the label information is assisted, the data generated by the generator is purposefully gathered to the original video frame data set with the same label, in the process, the original data is used for supplementing part of information missing from the visual privacy state, the compensation of the information with the visual privacy protection effect is achieved, then the original data and the data with the visual privacy protection effect are sent into the discriminator D together, and the true and false probabilities and the category probabilities of the information are output. Obviously, in the early stage of iteration, the difference from the data characteristic is large, and the corresponding loss function is large. The G network and the D network of the GAN network with the auxiliary classifier are both of convolutional neural network structures, loss functions are propagated backwards to the D network and the G network according to a chain derivation rule, parameters such as the weight of the D network and the G network are continuously corrected and updated, the G network can obtain better information compensation from the loss functions, after multiple iterations, approximation is output, and meanwhile, the D network outputs more accurate classification information. The loss functions of the improved GAN network with the auxiliary classifier are shown in equations (11), (12).

L_S＝E[log P(S＝real|X_real)]+E[log P(S＝fake|X_{fake_cs})] (11)

L_C＝E[log P(C＝cs|X_real)]+E[log P(C＝cs|X_{fake_cs})] (12)

The discriminator D of the GAN network with the auxiliary classifier can output corresponding class information, and then increase the loss calculation of the output classification and the real classification, so that the generated simulation data corresponds to the class to which the simulation data belongs one to one. And considering the time sequence information among the video frames, fusing the classification probability of each frame in the video sequence with the pre-trained weight by pre-training the weight of the video, thereby obtaining the classification result of the video sequence.

The invention can realize the falling detection of the old people at home, can meet the protection of the privacy of the old people and the family, and has higher practical application value.

The above description is only an embodiment of the present invention, and is not intended to limit the present invention. Various modifications and alterations to this invention will become apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the scope of the claims of the present invention.

Claims

1. A household video fall detection method based on privacy protection type GAN feature enhancement is characterized by comprising the following steps: the household video fall detection method comprises the following steps:

and step 3: and (3) extraction of foreground moving targets: extracting a foreground moving target by using a low-rank sparse decomposition algorithm based on generalized non-convex robust principal component analysis to obtain the foreground moving target of the video with the visual privacy protection effect;

and 4, step 4: obtaining a fall classification result: and (3) performing space-time migration compensation on the foreground moving target in the video with the visual privacy protection effect obtained in the step (3) by using a GAN network with an auxiliary classifier, so that the lost features in the video with the visual privacy protection effect are compensated, and a final falling classification result is obtained on the basis.

2. The privacy protection type GAN feature enhancement based home video fall detection method as claimed in claim 1, wherein: the low-rank sparse decomposition algorithm in step 3 is represented as:

the decomposition process comprises the following steps:

3. The privacy protection type GAN feature enhancement based home video fall detection method as claimed in claim 1, wherein: the GAN network with the auxiliary classifier in the step 4 comprises a generator G network and a discriminator D, wherein the generator G network and the discriminator D both use a convolutional neural network.

4. The privacy protection type GAN feature enhancement based home video fall detection method as claimed in claim 3, wherein: the processing procedure of the GAN network with the auxiliary classifier in the step 4 comprises the following steps:

step 4-2: in the testing process, only the characteristics of the video with the visual privacy protection effect after the multilayer compressed sensing processing in the step 2 are input, the corresponding category information of the characteristics of the video with the visual privacy protection effect is output in the discriminator D, then the loss calculation between the output classification and the real classification is increased, and finally the accuracy of the fall detection is obtained.

5. The privacy protection type GAN feature enhancement based home video fall detection method as claimed in claim 4, wherein: the loss calculation in step 4-2 is expressed as:

L_S＝E[log P(S＝real|X_real)]+E[log P(S＝fake|X_{fake_cs})]

L_C＝E[log P(C＝cs|X_real)]+E[log P(C＝cs|X_{fake_cs})]

wherein L_SRepresenting the corresponding loss function, L, of the real sample_CLoss function, X, representing samples corresponding to the correct class_realRepresenting original video data, S representing a class label of the original data, C representing a class label of the data after multi-layer compressed sensing processing, X_{fake_cs}Representing the data after compensation by the generator G, P (S | X), P (C | X) represent the probability distribution of the original data and the probability distribution on the class label, respectively.

6. The privacy protection type GAN feature enhancement based home video fall detection method as claimed in claim 1, wherein: the multi-layer compressed sensing processing of step 2 specifically includes the following steps:

wherein ,

denotes alpha_k+1The lowest or rightmost r bits of (a),

denotes alpha_kThe uppermost or leftmost w-r position (k 0,1, 2.),

wherein j represents the number of iterations;

And is

Expressed as:

wherein ,

to normalize the coefficients, the matrix elements are arranged in columns to chaos pseudo-random sequences { beta }₀，β₁，β₂...，β_n}；

Step 2-4: multi-layer compressed sensing sampling: pseudo-random Bernoulli measurement matrix constructed by steps 2-3

And partitioning is carried out, the measurement process is converted into the inner product operation of the same position block, and multi-layer compressed sensing coding is realized through multiple partitioning and inner product operations, so that the privacy protection in video monitoring is realized.

7. The privacy protection type GAN feature enhancement based home video fall detection method as claimed in claim 6, wherein: the mapping function in the sign function mapping in step 2-3 is represented as:

8. the privacy protection type GAN feature enhancement based home video fall detection method as claimed in claim 6, wherein: the calculation process of the multilayer compressed sensing coding in the step 2-4 is as follows: