CN109711277A

CN109711277A - Behavioural characteristic extracting method, system, device based on space-time frequency domain blended learning

Info

Publication number: CN109711277A
Application number: CN201811494799.9A
Authority: CN
Inventors: 胡古月; 崔波; 余山
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2018-12-07
Filing date: 2018-12-07
Publication date: 2019-05-03
Anticipated expiration: 2038-12-07
Also published as: CN109711277B; WO2020113886A1

Abstract

The invention belongs to Activity recognition fields, and in particular to a kind of behavioural characteristic extracting method based on space-time frequency domain blended learning, system, device, it is intended to for the low problem of skeleton behavioural characteristic extraction accuracy.The method of the present invention includes: to obtain the video behavior sequence based on skeleton, extracts time-space domain behavioural characteristic figure by converting network；Time-space domain is gone back in inversion after input frequency domain pays attention to network progress frequency selection, is added with time-space domain behavioural characteristic figure；It is synchronous to carry out part and non local reasoning, and carry out high-rise local reasoning；The time-space domain behavioural characteristic figure global pool that reasoning is obtained, obtains the behavioural characteristic vector of video behavior sequence, can be applied to behavior classification and behavioral value etc..The present invention selects effective frequency mode in frequency domain adaptive, uses in time-space domain while there is the network of local and non local affine field to carry out spatio temporal reasoning, can synchronize and excavate local detail and non local semantic information, to effectively raise the precision of Activity recognition.

Description

Behavioural characteristic extracting method, system, device based on space-time frequency domain blended learning

Technical field

The invention belongs to Activity recognition fields, and in particular to a kind of behavioural characteristic extraction based on space-time frequency domain blended learning Method, system, device.

Background technique

Activity recognition has a wide range of applications in fields such as intelligent monitoring, human-computer interaction and automatic Pilots, Activity recognition packet Behavior classification and behavioral value are included, is specifically exactly with dedicated acquisition equipment acquisition based on information such as RGB, depth, skeletons Behavior video, classified to it, positioned and detected.Based on the Activity recognition of skeleton since computing cost is small, indicate succinct, And to the variations such as environment, appearance more robust, cause the broad interest of academia and industrial circle in recent years.Specifically, skeleton row It is exactly the video sequence of the 2D 3D coordinate composition of the artis according to target object in the environment for identification, to realize to row For identification.

Existing skeleton Activity recognition method mainly stacks the localized network only with the affine field in part using in time-space domain Hierarchically to extract the space-time characteristic of behavior sequence, and then behavior is identified and detected.As clapping hands, brushing teeth, shaking hands etc. this A little behaviors are rich in the inherent frequecy characteristic for having distinction, and existing method is confined to excavate spatiotemporal mode, has ignored behavior The frequency domain mode of middle inherence, and previously the hierarchical in time-space domain stacks localized network, so that semantic information can only be in high level Extract, detailed information is mainly extracted in bottom again, detailed information and semantic information cannot synchronous extraction and fusion, be unfavorable for excavating Effective behavioural characteristic is unable to satisfy requirement so that skeleton Activity recognition precision is low.

Summary of the invention

In order to solve the above problem in the prior art, in order to solve the problems, such as that behavioural characteristic extraction accuracy is low, this hair It is bright to provide a kind of behavioural characteristic extracting method based on space-time frequency domain blended learning, comprising:

Step S1 obtains the video behavior sequence based on skeleton, and as original video behavior sequence, it is adaptive to carry out time-space domain It should convert, obtain the first time-space domain behavioural characteristic figure；

First time-space domain behavioural characteristic figure is sent into inversion after frequency domain carries out frequency selection and gains time-space domain by step S2, with First time-space domain behavioural characteristic figure is added in a manner of residual error, obtains the second time-space domain behavioural characteristic figure；

Step S3, it is synchronous to the second time-space domain behavioural characteristic figure to carry out part and non local reasoning, and with the first time-space domain Behavioural characteristic figure is added in a manner of residual error, obtains third time-space domain behavioural characteristic figure；

Step S4 carries out high-rise local reasoning to third time-space domain behavioural characteristic figure, obtains the 4th time-space domain behavioural characteristic Figure；

4th time-space domain behavioural characteristic figure global pool is obtained behavioural characteristic vector by step S5.

In some preferred embodiments, in step S1 " time-space domain self-adaptive transformation ", it the steps include:

Step S11 uses core oblique at K to the original video behavior sequence for 1 convolutional network or fully-connected network The adaptive augmentation that coordinate system is carried out under coordinate system, obtains the augmentation video behavior sequence under K coordinate system, and K is super ginseng Number.

Step S12, using using core to be 1 convolutional network or fully-connected network in the augmentation video behavior sequence Skeleton carries out joint number and joint arrangement sequence is converted, and obtains the augmentation optimization video behavior sequence comprising structural information Characteristic pattern, be the first time-space domain behavioural characteristic figure.

In some preferred embodiments, " the first time-space domain behavioural characteristic figure is sent into frequency domain in step S2 and carries out frequency Inversion gains time-space domain after selection, is added in a manner of residual error with the first time-space domain behavioural characteristic figure ", it the steps include:

The characteristic pattern in each channel is transformed to frequency domain using two dimension discrete fourier transform by step S21 respectively, comprising just String frequency domain character figure and cosine frequency domain character figure；

In view of computational efficiency, two-dimensional discrete Fast Fourier Transform can be used and realize characteristic pattern transformation.

Step S22 respectively learns the sinusoidal frequency domain character figure and cosine frequency domain character figure out just by paying attention to network String ingredient attention weight and cosinusoidal component attention weight；

Wherein, notice that network, including a channel average layer, two full articulamentums, a softmax function and one lead to Road duplicating layer.

Step S23 carries out dot product, cosinusoidal component with the sine component attention weight learnt and sinusoidal frequency domain character figure Attention weight and cosine frequency domain character figure carry out dot product, sine and cosine frequency domain character figure after obtaining frequency selection.

Sinusoidal and cosine frequency domain character figure is transformed to time-space domain using two-dimensional discrete inverse fourier transform by step S24, with The mode of residual error is added with the first time-space domain behavioural characteristic figure, obtains the second time-space domain behavioural characteristic figure；

In view of computational efficiency, two-dimensional discrete fast Fourier inverse transformation can be used and realize characteristic pattern inverse transformation.

It is " local and non-to the synchronous progress of the second time-space domain behavioural characteristic figure in step S3 in some preferred embodiments Local reasoning ", the steps include:

Step S31 constructs the neural network submodule y with the affine field in part_i, the nerve net with non local affine field String bag module y '_i:

Wherein, x_iRepresent the feature vector of the time-space domain characteristic pattern of current layer network；y_iWith y '_iRespectively represent next layer of net The feature vector of the time-space domain characteristic pattern of the part of network and non local affine field；A(x_i,x_j) it is parent between calculating position i and j With the binary transformation matrix of degree；g(x_i) it is to calculate x_jFeature insertion unitary transforming function transformation function, by convolution kernel be 1 or 1 × 1 volume Lamination is realized；Z_iIt (X) is normalization factor, Ω enumerates all feature locations, δ_iFor local domain.

The feature that local and non local affine field neural network submodule extracts is had the right to be superimposed and obtains characteristic pattern, and to institute It states characteristic pattern and carries out batch normalization reduction feature drift, introduce non-linear unit, then carry out down-sampling to reduce the resolution of characteristic pattern Rate；

Step S32 is led using the M1 parts and non local affine field neural network submodule calculating position i and part Domain δ_iThe affinity of all possible positions in affinity and i and Ω between interior neighbours, M1 are the nature more than or equal to 1 Number；

Step S33, will be by the characteristic pattern and first of M1 part and non local affine field neural network submodule reasoning Time-space domain characteristic pattern is added in a manner of residual error, obtains third time-space domain behavioural characteristic figure.

In some preferred embodiments, " carry out high-rise part to third time-space domain behavioural characteristic figure to push away in step S4 Reason ", method are as follows:

The part constructed using M2 is affine, and neural network submodule in field calculates the third time-space domain behavioural characteristic figure group Position i and local domain δ_iAffinity between interior neighbours, M2 are the natural number more than or equal to 1；Characteristic pattern after reasoning For the 4th time-space domain behavioural characteristic figure.

Another aspect of the present invention proposes a kind of behavioural characteristic extracting method based on space-time frequency domain blended learning, packet It includes:

Velocity information is obtained to the original difference on time dimension of the video behavior sequence based on skeleton, construction includes position With the behavior sequence of speed；

The described in any item step S1- steps of claim 1-5 are used to the behavior sequence comprising position and speed respectively S5 is handled, and the feature vector of corresponding speed and the feature vector of corresponding position are obtained；

Described eigenvector is spliced to obtain splicing feature vector, the behavioural characteristic vector of extraction be velocity characteristic vector, Position feature vector sum splices feature vector.

Third aspect present invention proposes a kind of behavioural characteristic extraction system based on space-time frequency domain blended learning, including Video sequence obtain module, adaptive transformation module, frequency-selecting module, it is local with non local synchronous reasoning module, high-rise office Portion's reasoning module, global pool module, splicing module, output module；

The video sequence obtains module, is configured to obtain the video behavior sequence based on skeleton, as original video row For sequence；

The adaptive transformation module is configured in time-space domain in such a way that augmentation optimizes, extracts the first time-space domain row It is characterized figure；

The frequency-selecting module is configured to the first time-space domain behavioural characteristic figure feeding frequency domain noticing that network carries out frequency Selection, transforms to time-space domain and first time-space domain behavioural characteristic figure phase in a manner of residual error for the frequency domain behavioural characteristic figure of acquisition Add, obtains the second time-space domain behavioural characteristic figure；

The part and non local synchronous reasoning module are configured to synchronous to the second time-space domain behavioural characteristic figure carry out part With non local reasoning, and it is added to obtain third time-space domain behavioural characteristic in a manner of residual error with the first time-space domain behavioural characteristic figure Figure；

The high-rise local reasoning module is configured to carry out third time-space domain behavioural characteristic figure high-rise local reasoning, obtain To the 4th time-space domain behavioural characteristic figure；

The global pool module is configured to the 4th time-space domain behavioural characteristic figure group global pool obtaining corresponding row For feature vector；

The splicing module is configured to splice multi-channel feature, obtains splicing feature vector accordingly；

The output module is configured to the behavioural characteristic vector that will be extracted output.

Fourth aspect present invention proposes a kind of storage device, wherein be stored with a plurality of program, described program be suitable for by Reason device is loaded and is executed to realize the above-mentioned behavioural characteristic extracting method based on space-time frequency domain blended learning.

Fifth aspect present invention proposes a kind of processing unit, including processor, is adapted for carrying out each program；And it deposits Storage device is suitable for storing a plurality of program；Described program is suitable for being loaded by processor and being executed above-mentioned based on time space frequency to realize The behavioural characteristic extracting method of domain blended learning.

Beneficial effects of the present invention:

(1) it is only sufficiently dug with the limitation of the spatiotemporal mode of depth Web Mining behavior frame sequence before the present invention is broken through In pick behavior the frequency mode for having judgement index frequency domain character is carried out to frequency domain character figure in frequency domain using attention mechanism Automobile driving, by learning end to end, effective frequency mode is adaptive selected in final association.

(2) detailed information and semanteme can only be asynchronously extracted respectively in low layer and upper layer network compared to pervious localized network Information, synchronization proposed by the present invention have part and the network module of non local affine field each layer can be synchronous extraction with Local detail and overall situation semanteme are merged, the number of plies and parameter of network can be effectively reduced in relatively traditional localized network.

(3) adaptive transformation network proposed by the present invention, coordinate transform network can be by original in single rectangular co-ordinate The lower skeleton indicated of system is transformed under multiple oblique coordinates systems by study, obtains richer expression；Skeleton transformation network simultaneously Optimal joint number and joint arrangement sequence, which can also be relearned, can acquire and more tie compared to previous structureless expression The feature of structure, and then improve feature extraction precision.

Detailed description of the invention

By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, the application's is other Feature, objects and advantages will become more apparent upon:

Fig. 1 is that the present invention is based on the flow diagrams of the behavioural characteristic extracting method of space-time frequency domain blended learning；

Fig. 2 is the overall framework signal of the behavioural characteristic extracting method embodiment the present invention is based on space-time frequency domain blended learning Figure；

Fig. 3 is that the frequency domain of the behavioural characteristic extracting method embodiment the present invention is based on space-time frequency domain blended learning pays attention to network Structural schematic diagram；

Fig. 4 is the non-office of two-dimension time-space of the behavioural characteristic extracting method embodiment the present invention is based on space-time frequency domain blended learning Portion's network plug-in schematic diagram；

Fig. 5 is the localized network module of the behavioural characteristic extracting method embodiment the present invention is based on space-time frequency domain blended learning Schematic diagram；

Fig. 6 is the part of the behavioural characteristic extracting method embodiment the present invention is based on space-time frequency domain blended learning and non local Synchronization module schematic diagram；

Fig. 7 is the part of the behavioural characteristic extracting method embodiment the present invention is based on space-time frequency domain blended learning and non local The affine field schematic diagram of synchronization module.

Specific embodiment

The application is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is only used for explaining related invention, rather than the restriction to the invention.It also should be noted that in order to just Part relevant to related invention is illustrated only in description, attached drawing.

It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase Mutually combination.The application is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.

Existing Activity recognition method is mainly layered using the localized network only with the affine field in part is stacked in time-space domain Ground extracts the space-time characteristic of behavior sequence, and then behavior is identified and detected, and is confined to excavate spatiotemporal mode, has ignored row For the frequency domain mode of middle inherence, and localized network is stacked in the hierarchical of time-space domain, so that semantic information can only be mentioned in high level It takes, detailed information is mainly extracted in bottom again, and detailed information and semantic information are unable to synchronous fusion, are unfavorable for excavating effective row It is characterized.The technical solution of the present invention selection effective frequency mode adaptive using attention mechanism in frequency domain, in time-space domain Using having the network of local and non local affine field to carry out spatio temporal reasoning simultaneously, network is made to synchronize excavation in each layer module Local detail and non local semantic information, to effectively raise the precision of skeleton behavioural characteristic extraction.

A kind of behavioural characteristic extracting method based on space-time frequency domain blended learning of the invention, comprising:

The first time-space domain behavioural characteristic figure is sent into inversion after frequency domain carries out frequency selection and gains space-time by step S2 Domain is added in a manner of residual error with the first time-space domain behavioural characteristic figure, obtains the second time-space domain behavioural characteristic figure；

Step S3, it is synchronous to the second time-space domain behavioural characteristic figure to carry out part and non local reasoning, and with described the One time-space domain behavioural characteristic figure is added in a manner of residual error, obtains third time-space domain behavioural characteristic figure；

Step S4 carries out high-rise local reasoning to the third time-space domain behavioural characteristic figure, obtains the 4th time-space domain behavior Characteristic pattern；

The 4th time-space domain behavioural characteristic figure global pool is obtained behavioural characteristic vector by step S5.

In order to more clearly to the present invention is based on the Activity recognition methods of space-time frequency domain blended learning to be illustrated, tie below It closes Fig. 1-Fig. 7 and expansion detailed description is carried out to each step in a kind of embodiment of our inventive method.

We invent a kind of behavioural characteristic extracting method based on space-time frequency domain blended learning of embodiment, including step S1- step S5, each step are described in detail as follows:

Step S1 obtains the video behavior sequence based on skeleton, and as original video behavior sequence, it is adaptive to carry out time-space domain It should convert, obtain the first time-space domain behavioural characteristic figure.

Step S11, note original video behavior sequence are X, and dimension C0*T0*N0, C0 are port number, and T0 is time dimension, N0 is space artis number；

Use core for 1 convolutional network or fully-connected network to the original video behavior sequence under K oblique coordinates system The adaptive augmentation for carrying out coordinate system, obtains the augmentation video behavior sequence under K coordinate system, K is hyper parameter；

Step S12 carries out joint number to the skeleton in the augmentation video behavior sequence using multilayer fully-connected network It is converted with joint arrangement sequence, obtains the characteristic pattern of the augmentation optimization video behavior sequence comprising structural information, be first Time-space domain behavioural characteristic figure X', dimension C'*T'*N', C' are port number, and T' is time dimension, and N' is space artis number.

The first time-space domain behavioural characteristic figure is sent into inversion after frequency domain carries out frequency selection and gains space-time by step S2 Domain is added in a manner of residual error with the first time-space domain behavioural characteristic figure, obtains the second time-space domain behavioural characteristic figure.

Step S21 is utilized two dimension discrete fourier transform (2D-DFT, 2D-Discrete Fourier Transform) The characteristic pattern in each channel is transformed into frequency domain respectively, is denoted as Y, as shown in formula (1):

Wherein, c, u, v represent the channel of frequency domain character figure, temporal frequency dimension, spatial frequency dimension；When c, t, n are represented The channel of spatial feature figure, time dimension, Spatial Dimension；T is the port number of the first time-space domain characteristic pattern；N is frequency domain character figure Spatial Dimension is always counted.

In view of computational efficiency, two-dimensional discrete Fast Fourier Transform (2D-FFT, 2D-Fast Fourier can be used Transformation characteristic pattern transformation) is realized.

Finally obtained frequency domain character figure Y includes two ingredients, a sinusoidal frequency domain character figure F altogether_sin, a cosine frequency Characteristic of field figure F_cos。

Step S22, building frequency domain pay attention to network, as shown in figure 3, including a channel average layer, two full articulamentums, one A softmax function and a channel duplicating layer.

Respectively by sinusoidal frequency domain character figure F_sinWith cosine frequency domain character figure F_cosBy paying attention to network, learn sine component out Attention weight M_sinWith cosinusoidal component attention weight M_cos。

Step S23, with the sinusoidal attention weight M learnt_sinWith sinusoidal frequency domain character figure F_sinCarry out dot product, cosinusoidal component Attention weight M_cosWith cosine frequency domain character figure F_cosDot product is carried out, the frequency component for having judgement index is selected, is denoted as F '_i, such as formula (2) shown in:

Step S24 utilizes two-dimensional discrete inverse fourier transform (2D-IDFT, 2D-Inverse Discrete Fourier Transform sinusoidal and cosine frequency domain character figure) is switched back into time-space domain, obtains time-space domain characteristic pattern X ", as shown in formula (3):

X "=X'+iift2 (F '_sin+F′_cos), X " ∈ R^{C”×T”×N”}Formula (3)

Wherein, C ", T " and N " are respectively the port number of time-space domain characteristic pattern X ", and time dimension is always counted and Spatial Dimension is total Points.

In view of computational efficiency, two-dimensional discrete fast Fourier inverse transformation (2D-IFFT, 2D-Inverse Fast can be used Fourier Transformation) realize characteristic pattern inverse transformation.

X " is added with the first time-space domain behavioural characteristic figure in a manner of residual error, obtains the second time-space domain behavioural characteristic figure.

Step S3, it is synchronous to the second time-space domain behavioural characteristic figure to carry out part and non local reasoning, and with described the One time-space domain behavioural characteristic figure is added in a manner of residual error, obtains third time-space domain behavioural characteristic figure.

Step S31 constructs the neural network submodule y with the affine field in part_i, the nerve net with non local affine field String bag module y '_i, as shown in formula (4) and formula (5):

The feature that local and non local affine field neural network submodule extracts is had the right to be superimposed, as shown in formula (6):

O=wo_non-local+o_localFormula (6)

Wherein, O is superimposed characteristic pattern；o_non-localAnd o_localFor same layer part and non local affine field nerve net The output of string bag module；W is linear transformation function, by convolution kernel be 1 or 1 × 1 convolutional layer realize, for measure it is non local at Significance level of the split-phase to local part.

Obtained characteristic pattern is subjected to batch normalization and reduces feature drift, introduces non-linear unit, then carry out down-sampling drop The resolution ratio of low characteristic pattern.

Step S32 is led using the M1 parts and non local affine field neural network submodule calculating position i and part Domain δ_iThe affinity of all possible positions in affinity and i and Ω between interior neighbours, M1 are the nature more than or equal to 1 Number.

The localized network prototype of the present embodiment is three convolutional neural networks, affinity matrix A (x_i,x_j)=1, g (x_i) letter Number is linear transformation function.Localized network module is as shown in figure 5, include time part plug-in unit (tLocal), space part plug-in unit (sLocal) and 3 plug-in units of space-time part plug-in unit (stLocal), the convolution kernel size of three plug-in units are respectively k × 1,1 × k, k ×k.Similarly, non-local network also includes 3 plug-in units, respectively, time non local plug-in unit (tNon-Local), the non-office in space Portion's plug-in unit (sNon-Local) and the non local plug-in unit of space-time (stNon-Local)；Wherein, the non local plug-in unit of two-dimensional space-time (stNon-Local) mode is specifically completed as shown in figure 4, in figureψ, g, w are the convolutional layer that different core is 1 × 1, ψ completes the function that affinity calculates, and g completes the function of linear transformation, and w measures the relative importance of non local ingredient；One-dimensional Similar completion can be used in the non local plug-in unit of time non local plug-in unit (tNon-Local) and one-dimensional space (sNon-Local) Mode.It is combined and office as shown in Figure 6 can be obtained by 3 plug-in units of localized network module and 3 plug-in units of non-local network's module Portion and non local synchronization module (SLnL), corresponding affine field figure are as shown in Figure 7.

After M1 part carries out time-space domain reasoning with non local synchronous time-space network module, the parent of local submodule Constantly increase with field, characteristic pattern resolution ratio constantly reduces, and semantic information has been obtained to be extracted well.Next it only needs to adopt The excavation of high-rise spatiotemporal mode feature is carried out with local space time's network module.

Step S4 carries out high-rise local reasoning to the third time-space domain behavioural characteristic figure, obtains the 4th time-space domain behavior Characteristic pattern, method are as follows:

The affine field nerve submodule in part constructed using M2 calculate the third time-space domain behavioural characteristic figure position i with Local domain δ_iAffinity between interior neighbours, M2 are the natural number more than or equal to 1；When characteristic pattern after reasoning is the 4th Airspace behavioural characteristic figure.

Using M1 part and non local synchronous time-space network module and the M2 affine field nerve submodule in part, C × T × N is dimension signal, and the input for representing network is the three-dimensional tensor being made of channel C, tri- dimensions of time T and space N, C × TN, TN × TN represents dimension as the two-dimensional matrix of C × TN, TN × TN, and the value of C, T, N is not identical in each submodule.

The 4th time-space domain behavioural characteristic figure global pool is obtained feature vector f by step S5^p。

The behavioural characteristic extracting method based on space-time frequency domain blended learning of second embodiment of the invention, comprising:

Velocity information is obtained to the original difference on time dimension of the video behavior sequence based on skeleton, construction includes position With the behavior sequence of speed.

The described in any item step S1- steps of claim 1-5 are used to the behavior sequence channel of position and speed respectively S5 is handled, and the feature vector f of corresponding speed is obtained^pWith the feature vector f of corresponding position^v。

Described eigenvector is spliced to obtain splicing feature vector f^c, the behavioural characteristic vector of extraction is velocity characteristic vector f^p, position feature vector f^vWith splicing feature vector f^c。

Behavioural characteristic extracting method in order to further illustrate the present invention based on space-time frequency domain blended learning, below with reference to spy Application of the vector in terms of behavior classification is levied, the present invention is described further:

By described eigenvector f^p、f^vAnd f^cBy the speed in virtual multitask network, position, splicing feature branch, obtain Belong to the prediction probability p of each classification to behavior^p、p^vAnd p^c.Training stage utilizes prediction probability and true behavior classification, meter Calculate the loss L of three respective predictions of branch_p、L_vAnd L_c.The present embodiment is calculated using cross entropy loss function, such as formula (7) institute Show:

Wherein, b is the true one-hot class label of behavior, N_CFor total behavior class number.

Shown in the total losses of multitask network such as formula (8):

L=λ_pL_p+λ_vL_v+λ_cL_cFormula (8)

Wherein, λ_p、λ_vAnd λ_cFor three hyper parameters, the weight of each information channel is controlled.Optimized using total loss entire Network is until being optimal.

Prediction probability p of (application) stage of test according only to splicing tunnel^cClassification results are obtained, i.e., directly take p^cIn have The classification of maximum predicted probability is as the behavior classification results exported to the video behavior.

Person of ordinary skill in the field can be understood that, for convenience and simplicity of description, foregoing description The specific works mistake of the step S1- step S5 of the behavioural characteristic extracting method based on space-time frequency domain blended learning of second embodiment Journey and related explanation, can be with reference to the behavioural characteristic extracting method step based on space-time frequency domain blended learning of aforementioned first embodiment Rapid corresponding process, details are not described herein.

The behavioural characteristic extraction system based on space-time frequency domain blended learning of the third embodiment of the present invention, including video sequence Column obtain module, adaptive transformation module, frequency-selecting module, it is local with non local synchronous reasoning module, high-rise local reasoning Module, global pool module, splicing module, multitask network module, output module；

It should be noted that the behavioural characteristic extraction system provided by the above embodiment based on space-time frequency domain blended learning, Only the example of the division of the above functional modules, in practical applications, it can according to need and divide above-mentioned function With being completed by different functional modules, i.e., by the embodiment of the present invention module or step decompose or combine again, for example, The module of above-described embodiment can be merged into a module, can also be further split into multiple submodule, to complete above retouch The all or part of function of stating.For module involved in the embodiment of the present invention, the title of step, it is only for distinguish each A module or step, are not intended as inappropriate limitation of the present invention.

A kind of storage device of 4th example of the invention, wherein being stored with a plurality of program, described program is suitable for by handling Device is loaded and is executed to realize the above-mentioned behavioural characteristic extracting method based on space-time frequency domain blended learning.

A kind of processing unit of 5th example of the invention, including processor, storage device；The processor, suitable for holding Each program of row；The storage device is suitable for storing a plurality of program；Described program is suitable for being loaded by processor and being executed to realize The above-mentioned behavioural characteristic extracting method based on space-time frequency domain blended learning.

Person of ordinary skill in the field can be understood that, for convenience and simplicity of description, foregoing description The specific work process and related explanation of storage device, processing unit, can refer to corresponding processes in the foregoing method embodiment, Details are not described herein

Those skilled in the art should be able to recognize that, mould described in conjunction with the examples disclosed in the embodiments of the present disclosure Block, method and step, can be realized with electronic hardware, computer software, or a combination of the two, software module, method and step pair The program answered can be placed in random access memory (RAM), memory, read-only memory (ROM), electrically programmable ROM, electric erasable and can compile Any other form of storage well known in journey ROM, register, hard disk, moveable magnetic disc, CD-ROM or technical field is situated between In matter.In order to clearly demonstrate the interchangeability of electronic hardware and software, in the above description according to function generally Describe each exemplary composition and step.These functions are executed actually with electronic hardware or software mode, depend on technology The specific application and design constraint of scheme.Those skilled in the art can carry out using distinct methods each specific application Realize described function, but such implementation should not be considered as beyond the scope of the present invention.

Term " space-time frequency domain " is " time-space domain " and " frequency domain ", and " time-space domain " is description mathematical function or physical signal to pure Time, pure space or when space relationship a kind of coordinate system, " frequency domain " description signal used in characteristic in terms of frequency A kind of coordinate system.

Term " first ", " second " etc. are to be used to distinguish similar objects, rather than be used to describe or indicate specific suitable Sequence or precedence.

Term " includes " or any other like term are intended to cover non-exclusive inclusion, so that including a system Process, method, article or equipment/device of column element not only includes those elements, but also including being not explicitly listed Other elements, or further include the intrinsic element of these process, method, article or equipment/devices.

So far, it has been combined preferred embodiment shown in the drawings and describes technical solution of the present invention, still, this field Technical staff is it is easily understood that protection scope of the present invention is expressly not limited to these specific embodiments.Without departing from this Under the premise of the principle of invention, those skilled in the art can make equivalent change or replacement to the relevant technologies feature, these Technical solution after change or replacement will fall within the scope of protection of the present invention.

Claims

1. a kind of behavioural characteristic extracting method based on space-time frequency domain blended learning characterized by comprising

Step S1 obtains the video behavior sequence based on skeleton, as original video behavior sequence, carries out time-space domain self-adaptive change It changes, obtains the first time-space domain behavioural characteristic figure；

The first time-space domain behavioural characteristic figure is sent into inversion after frequency domain carries out frequency selection and gains time-space domain by step S2, with The first time-space domain behavioural characteristic figure is added in a manner of residual error, obtains the second time-space domain behavioural characteristic figure；

Step S3, it is synchronous to the second time-space domain behavioural characteristic figure to carry out part and non local reasoning, and when with described first Airspace behavioural characteristic figure is added in a manner of residual error, obtains third time-space domain behavioural characteristic figure；

Step S4 carries out high-rise local reasoning to the third time-space domain behavioural characteristic figure, obtains the 4th time-space domain behavioural characteristic Figure；

2. the behavioural characteristic extracting method according to claim 1 based on space-time frequency domain blended learning, which is characterized in that step In rapid S1 " time-space domain self-adaptive transformation ", it the steps include:

Step S11 carries out the original video behavior sequence using convolutional network or fully-connected network under K oblique coordinates system The adaptive augmentation of coordinate system obtains the augmentation video behavior sequence under K coordinate system, and K is hyper parameter；

Step S12 carries out joint number and pass to the skeleton in the augmentation video behavior sequence using multilayer fully-connected network Section, which puts in order, to be converted, and is obtained the characteristic pattern of the augmentation optimization video behavior sequence comprising structural information, is the first space-time Domain behavioural characteristic figure.

3. the behavioural characteristic extracting method according to claim 1 based on space-time frequency domain blended learning, which is characterized in that step " the first time-space domain behavioural characteristic figure is sent into inversion after frequency domain carries out frequency selection in rapid S2 and gains time-space domain, with the first space-time Domain behavioural characteristic figure is added in a manner of residual error ", method are as follows:

The characteristic pattern in each channel is transformed to frequency domain using two dimension discrete fourier transform by step S21 respectively, includes sinusoidal frequency Characteristic of field figure and cosine frequency domain character figure；

Step S22, respectively by the sinusoidal frequency domain character figure and cosine frequency domain character figure by paying attention to network, learn out it is sinusoidal at Divide attention weight and cosinusoidal component attention weight；

The attention network, including a channel average layer, two full articulamentums, a softmax function and a channel are multiple Preparative layer；

Step S23 carries out dot product with the sine component attention weight learnt and sinusoidal frequency domain character figure, and cosinusoidal component pays attention to Power weight and cosine frequency domain character figure carry out dot product, sine and cosine frequency domain character figure after obtaining frequency selection；

Sinusoidal and cosine frequency domain character figure is transformed to time-space domain using two-dimensional discrete inverse fourier transform, with residual error by step S24 Mode be added with the first time-space domain behavioural characteristic figure, obtain the second time-space domain behavioural characteristic figure.

4. the feature extraction recognition methods according to claim 1 based on space-time frequency domain blended learning, which is characterized in that step In rapid S3 " synchronous to the second time-space domain behavioural characteristic figure to carry out part and non local reasoning ", it the steps include:

Step S31 constructs the neural network submodule y with the affine field in part_i, the nerve net string bag with non local affine field Module y '_i:

Wherein, x_iRepresent the feature vector of the time-space domain characteristic pattern of current layer network；y_iWith y '_iRespectively represent next layer network The feature vector of the time-space domain characteristic pattern of local and non local affine field；A(x_i,x_j) it is affinity between calculating position i and j Binary transformation matrix；g(x_i) it is to calculate x_jFeature insertion unitary transforming function transformation function, by convolution kernel be 1 or 1 × 1 convolutional layer It realizes；Z_iIt (X) is normalization factor, Ω enumerates all feature locations, δ_iFor local domain；

The feature that local and non local affine field neural network submodule extracts is had the right to be superimposed and obtains characteristic pattern, and to the spy Sign figure carries out batch normalization reduction feature drift, introduces non-linear unit, then carry out down-sampling to reduce the resolution ratio of characteristic pattern；

Step S32, using the M1 parts and non local affine field neural network submodule calculating position i and local domain δ_i The affinity of all possible positions in affinity and i and Ω between interior neighbours, M1 are the natural number more than or equal to 1；

Step S33, will be by the characteristic pattern and the first space-time of M1 part and non local affine field neural network submodule reasoning Characteristic of field figure is added in a manner of residual error, obtains third time-space domain behavioural characteristic figure.

5. the behavioural characteristic extracting method according to claim 4 based on space-time frequency domain blended learning, which is characterized in that step " high-rise local reasoning is carried out to third time-space domain behavioural characteristic figure " in rapid S4, method are as follows:

The part constructed using M2 is affine, and nerve submodule in field calculates the third time-space domain behavioural characteristic figure group position i and office Portion field δ_iAffinity between interior neighbours, M2 are the natural number more than or equal to 1；Characteristic pattern after reasoning is the 4th space-time Domain behavioural characteristic figure.

6. a kind of behavioural characteristic extracting method based on space-time frequency domain blended learning characterized by comprising

Velocity information is obtained to the original difference on time dimension of the video behavior sequence based on skeleton, construction includes position and speed The behavior sequence of degree；

Respectively to the behavior sequence channel of position and speed using the described in any item step S1- step S5 of claim 1-5 into Row processing, obtains the feature vector of corresponding speed and the feature vector of corresponding position；

Described eigenvector is spliced to obtain splicing feature vector, the behavioural characteristic vector of extraction is velocity characteristic vector, position Feature vector and splicing feature vector.

7. a kind of behavioural characteristic extraction system based on space-time frequency domain blended learning, which is characterized in that obtained including video sequence Module, adaptive transformation module, frequency-selecting module, it is local with non local synchronous reasoning module, it is high-rise local reasoning module, complete Office's pond module, splicing module, output module；

The video sequence obtains module, is configured to obtain the video behavior sequence based on skeleton, as original video behavior sequence Column；

The adaptive transformation module is configured in time-space domain in such a way that augmentation optimizes, it is special to extract the first time-space domain behavior Sign figure；

The frequency-selecting module is configured to the first time-space domain behavioural characteristic figure feeding frequency domain noticing that network carries out frequency choosing It selects, the frequency domain behavioural characteristic figure of acquisition is transformed into time-space domain and is added with the first time-space domain behavioural characteristic figure, the second space-time is obtained Domain behavioural characteristic figure；

The part and non local synchronous reasoning module are configured to the part and non-of carrying out synchronous to the second time-space domain behavioural characteristic figure Local reasoning, and be added to obtain third time-space domain behavioural characteristic figure in a manner of residual error with the first time-space domain behavioural characteristic figure；

The high-rise local reasoning module is configured to carry out third time-space domain behavioural characteristic figure high-rise local reasoning, obtains the Four time-space domain behavioural characteristic figures；

The global pool module is configured to the 4th time-space domain behavioural characteristic figure group global pool it is special to obtain corresponding behavior Levy vector；

8. a kind of storage device, wherein being stored with a plurality of program, which is characterized in that described program is suitable for being loaded and being held by processor Row is to realize the behavioural characteristic extracting method described in any one of claims 1-6 based on space-time frequency domain blended learning.

9. a kind of processing unit, including

Processor is adapted for carrying out each program；And

Storage device is suitable for storing a plurality of program；

It is characterized in that, described program is suitable for being loaded by processor and being executed to realize:

Behavioural characteristic extracting method described in any one of claims 1-6 based on space-time frequency domain blended learning.