CN113158970B

CN113158970B - Action identification method and system based on fast and slow dual-flow graph convolutional neural network

Info

Publication number: CN113158970B
Application number: CN202110510781.9A
Authority: CN
Inventors: 高跃; 陈自强
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2021-05-11
Filing date: 2021-05-11
Publication date: 2023-02-07
Anticipated expiration: 2041-05-11
Also published as: CN113158970A

Abstract

The invention provides a method and a system for recognizing actions based on a fast-slow dual-flow graph convolutional neural network, wherein the method comprises the following steps: acquiring human skeleton joint characteristics; regularizing the human body skeleton joint characteristics, and deforming the shapes of the human body skeleton joint characteristics in one batch; copying the processed human body skeleton joint characteristics to generate two identical human body skeleton joint characteristics, and respectively inputting the two identical human body skeleton joint characteristics to a fast branch and a slow branch of a fast-slow double-flow graph convolution network for characteristic learning; and performing dimensionality elimination on the features of each action category through a global pooling layer, mapping the features subjected to dimensionality elimination to the corresponding action categories through a full connection layer, and obtaining the score of each action category through a Softmax function. The method solves the problem that the modeling of the time sequence information is weak in the prior art, and is a method for capturing the time sequence information and the fast and slow motion information better.

Description

Action identification method and system based on fast and slow dual-flow graph convolutional neural network

Technical Field

The invention relates to the technical field of action recognition based on skeleton information, in particular to the technical field of action recognition based on skeleton information.

Background

In the task of motion recognition based on skeletal information, a method based on a graph convolution neural network is the current mainstream method. The graph convolution neural network is designed for feature extraction of a single static graph structure, and is weak for extracting time sequence information. The human skeleton information is a time-series continuous graph structure data, and can also be regarded as a dynamic graph data. For the task of motion recognition, capturing only the spatial structure information (single frame skeleton information) of the static image and ignoring the timing information cannot achieve satisfactory performance. Generally, for actions which only need a single frame of static information and can be distinguished, the method based on the graph convolution neural network can obtain better performance; and some actions are similar to other actions due to the static frame, and the actions can be distinguished by adding time sequence action information, so that the model has better modeling capability of the time sequence information.

The design center of gravity of many current methods based on the graph convolution neural network improves the performance of the model by defining adaptive adjacency matrixes, new graph structure modeling methods, new node connection and the like on the aspect of capturing spatial structure information. Compared with the ST-GCN which applies GCN to the task of human skeleton action recognition for the first time, the methods have certain performance improvement. However, in the modeling of the timing information, the methods simply follow the two-dimensional convolution used by the ST-GCN to model the timing information, and are not greatly improved.

In the RGB video-based method, interaction between modeling timing information and modeling spatio-temporal information has been an important issue, and researchers have modeled motion information using an optical flow modality or modeled both temporal and spatial information using a 3D convolutional network. In recent years, a convolutional neural network based method Slowfast has been greatly successful in an RGB video based motion recognition method.

Disclosure of Invention

The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.

Therefore, the invention provides a method for recognizing actions based on a fast and slow dual-flow graph convolutional neural network, which is designed on the basis of a graph convolutional neural network and is used for capturing time sequence information and fast and slow action information better by using the fast and slow dual-flow graph convolutional neural network so as to improve the accuracy of action recognition.

The second purpose of the invention is to provide an action recognition system based on a fast and slow dual-flow graph convolutional neural network.

A third object of the invention is to propose a computer device.

A fourth object of the invention is to propose a non-transitory computer-readable storage medium.

In order to achieve the above object, an embodiment of a first aspect of the present invention provides a method for identifying an action based on a fast-slow dual-flow graph convolutional neural network, including the following steps:

step S10, obtaining human skeleton joint characteristics;

step S20, regularization processing is carried out on the human body skeleton joint features, wherein the shapes of a batch of the human body skeleton joint features are deformed, a one-dimensional regularization module is used for regularizing time sequence dimension, and the shapes of a batch of the human body skeleton joint features are deformed into the original shapes again;

step S30, copying the human body skeleton joint features processed in the step S20 to generate two identical human body skeleton joint features, inputting the two identical human body skeleton joint features to a fast branch and a slow branch of a fast-slow double-flow graph convolution network respectively for feature learning, and fusing learning results of the fast branch and the slow branch to obtain features of each action category, wherein the fast branch and the slow branch of the fast-slow double-flow graph convolution network have the same network structure and have different network parameter configurations and input features;

and S40, performing dimensionality elimination on the features of each action category through the global pooling layer, mapping the features subjected to dimensionality elimination to the corresponding action categories through the full connection layer, and obtaining the score of each action category through a Softmax function.

Optionally, in an embodiment of the present application, the step S10 includes the following steps:

human skeleton joint features are obtained from the data set, and the feature shape of each sample is as follows:

(C,T,M,V)

wherein C is the number of characteristic channels, has a value of 3 and represents the three-dimensional coordinates (x, y, z) of the joint points; t represents the number of frames of the action; m represents the number of persons performing the action; v represents the number of human joint points.

Optionally, in an embodiment of the present application, the step S20 includes the following steps:

carrying out regularization processing on data, using batch training in the training process, wherein the characteristic shape of the tensor of one batch is as follows:

(B,C,T,M,V)

firstly, deforming the one-dimensional batch tensor into the following steps:

(B,M*V*C,T)

and then, using one-dimensional batch regularization module to regularize the time sequence T dimension, and re-deform the features into the original shape (B, C, T, M, V).

Optionally, in an embodiment of the present application, the specific steps in step S30 include:

each branch comprises a plurality of continuously superposed graph convolution blocks, and each graph convolution block comprises a space graph convolution layer and a time sequence convolution layer; the time sequence convolution layer is a two-dimensional convolution module, the size of a convolution kernel is (t, 1), and t is the time sequence receptive field of the convolution kernel; after the two convolution layers, a batch regularization layer and a ReLU activation function are attached, so that the characteristics of each channel are ensured to keep the same distribution; the computation of the convolution block is described using the following formula:

B _k and C _k Is an adaptive adjacency matrix proposed in 2s-AGCN, which changes during network training, wherein B _k Is set to A at initialization _k For learning the potential association of any two nodes; c _k Is a matrix calculated according to the sample characteristics for describing the sample specific node associations.

Optionally, in an embodiment of the present application, the following two formulas respectively describe the feature shapes of the input features of the map convolution block at the same stage:

f _fast _i _n ＝(B,βC,αT,V,M)

f _slow _i _n ＝(B,C,T,V,M)

the timing dimension of the fast branch is always at ₁ Alpha is a positive integer representing the ratio of the input frame rate of the fast branch, in which the number of channels beta, to the frame rate of the slow branch in the initial input signature _i C _i Is significantly less than the channel number C of the convolution block of the same-stage slow branch graph _i Where i is the block number, β _i Is a value less than 1, e.g. 1/3, V of both branches being identicalThe number of graph nodes.

Optionally, in one embodiment of the present application, the information learned by both fast and slow branches is shared using a cross-connect module, fusing from fast to slow branches, since

And

the characteristic shapes of (A, B, beta C, alpha T, V, M) and (B, C, T, M, V) are respectively, firstly, a two-dimensional convolution layer is adopted to carry out characteristic shape conversion, a batch regularization layer and a ReLU function are added after the characteristic shape conversion is carried out, and then the two characteristics are fused in a splicing or adding mode.

Optionally, in an embodiment of the application, in step S40, the final feature obtained in step S30 eliminates three dimensions of a time sequence T, a graph node V, and a number M through a global pooling layer, and maps the feature to each action category through a full connection layer, and finally, a score of each action category is obtained through a Softmax function.

In order to achieve the above object, a second aspect of the present invention provides a system for identifying actions based on a fast-slow dual-flow graph convolutional neural network, including the following modules:

the acquisition module is used for acquiring the characteristics of the human skeleton joints;

the processing module is used for carrying out regularization processing on the human body skeleton joint features, wherein the shapes of a batch of the human body skeleton joint features are deformed, a one-dimensional regularization module is used for regularizing a time sequence dimension, and then the shapes of a batch of the human body skeleton joint features are deformed into the original shapes again;

the generating module is used for copying the human body skeleton joint features processed by the processing module, generating two identical human body skeleton joint features, inputting the two identical human body skeleton joint features to a fast branch and a slow branch of a fast-slow double-flow graph convolution network respectively for feature learning, and fusing learning results of the fast branch and the slow branch to obtain the features of each action category, wherein the fast branch and the slow branch of the fast-slow double-flow graph convolution network have the same network structure and have different network parameter configurations and input features;

and the determining module is used for carrying out dimensionality elimination on the features of each action category through the global pooling layer, mapping the features subjected to dimensionality elimination to the corresponding action categories through the full connection layer, and obtaining the score of each action category through a Softmax function.

In order to achieve the above object, a third aspect of the present application provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the method for motion recognition based on inter-joint association modeling according to the first aspect of the present application.

To achieve the above object, a non-transitory computer-readable storage medium is provided in a fourth embodiment of the present application, and a computer program is stored thereon, and when being executed by a processor, the computer program implements a motion recognition method based on inter-joint association modeling as described in the first embodiment of the present application.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

fig. 1 is a flowchart of an action identification method based on a fast-slow dual-flow graph convolutional neural network according to an embodiment of the present application.

FIG. 2 is a schematic structural diagram of a fast-slow dual-flow graph convolutional neural network according to an embodiment of the present application;

FIG. 3 is a diagram illustrating the change of the feature shapes of the input features of the fast and slow branches with the increase of the number of the convolution blocks according to the embodiment of the present application;

fig. 4 is a schematic view of a transverse connection module according to an embodiment of the present application.

Fig. 5 is a schematic structural diagram of an action recognition system based on a fast-slow dual-flow graph convolutional neural network according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative and intended to explain the present invention and should not be construed as limiting the present invention.

The following describes an action recognition method based on a fast and slow dual-flow graph convolutional neural network according to an embodiment of the present invention with reference to the accompanying drawings.

As shown in fig. 1, to achieve the above object, an embodiment of the first aspect of the present invention provides a method for identifying an action based on a fast-slow dual-flow graph convolutional neural network, including the following steps:

step S10, obtaining the characteristics of human skeleton joints;

step S20, regularization processing is carried out on the human body skeleton joint features, wherein the shapes of a batch of the human body skeleton joint features are deformed, a one-dimensional regularization module is used for regularizing a time sequence dimension, and then the shapes of a batch of the human body skeleton joint features are deformed into the original shapes again;

In an embodiment of the present application, further, the step S10 includes the following steps:

human skeletal joint features are obtained from public data sets such as NTU RGB + D, and the feature shape of each sample is as follows:

(C,T,M,V)

wherein C is the number of characteristic channels, has a value of 3 and represents the three-dimensional coordinates (x, y, z) of the joint points; t represents the number of frames of the motion; m represents the number of persons performing the action; v represents the number of human joint points.

In an embodiment of the present application, further, the step S20 includes the following steps:

(B,C,T,M,V)

firstly, deforming the one-dimensional batch tensor into:

(B,M*V*C,T)

As shown in fig. 2, our network structure contains two branches, which we call fast and slow branches, respectively.

In an embodiment of the present application, further, the specific steps in step S30 include:

each branch comprises a plurality of continuously superposed graph convolution blocks, and each graph convolution block comprises a space graph convolution layer and a time sequence convolution layer; the time sequence convolution layer is a two-dimensional convolution module, the size of a convolution kernel is (t, 1), and t is the time sequence feeling of the convolution kernel; after the two convolution layers, a batch regularization layer and a ReLU activation function are attached, so that the characteristics of each channel are ensured to keep the same distribution; the calculation of the convolution block is described using the following formula:

B _k and C _k Is an adaptive adjacency matrix proposed in 2s-AGCN, which changes during network training, wherein B _k Is set to A at initialization _k The method is used for learning the potential association of any two nodes; c _k Is a matrix calculated according to the sample characteristics for describing the sample specific node associations.

f _fast _i _n ＝(B,βC,αT,V,M)

f _slow _i _n ＝(B,C,T,V,M)

the timing dimension of the fast branch is always at ₁ Alpha is a positive integer representing the ratio of the input frame rate of the fast branch, in which the number of channels beta, to the frame rate of the slow branch in the initial input signature _i C _i Is significantly less than the channel number C of the convolution block of the same-stage slow branch graph _i Where i is the block number, β _i Is a value less than 1, e.g., 1/3, and the V of both branches is identical, both being the number of graph nodes.

In one embodiment of the present application, further, assuming that there are N convolutional blocks in the network structure, in the slow branch, in the time-series convolutional layer in the convolutional block of the graph, we will reduce the frame rate by the step size of the time-series convolutional layer, so there is T ₁ ≥T ₂ ≥…≥T _N (ii) a On the other hand, in each tile, the number of output channels will gradually increase with the increase of the tile to improve the capture capability of the slow branch for the graph space structure information, so there is C ₁ ≤C ₂ ≤…≤C _N . In fast branch, the step size of the convolution kernel in the time-ordered convolution layer of all the graph convolution blocksAre all set to 1 to ensure that the frame rate does not drop, so the timing dimension of the fast branch is always aT ₁ And alpha is a positive integer and represents the ratio of the input frame rate of the fast branch to the frame rate of the slow branch in the initial input features. In the fast branch, the number of channels beta _i C _i Is significantly less than the channel number C of the convolution block of the same-stage slow branch graph _i Where i is the block number, β _i Is a value less than 1, such as 1/3. The V of both branches is identical, both being the number of graph nodes.

In one embodiment of the present application, further, as shown in FIG. 4, the information learned by both fast and slow branches is shared using the cross-connect module, fusing from fast branch to slow branch, since

And

We first use a two-dimensional convolutional layer for feature shape transformation, and then add batch regularization layer and ReLU function, and then fuse the two features by means of splicing or addition. The above process can be described by the following formula.

Wherein Conv2D is a two-dimensional convolutional layer, BN is a batch regularization layer, reLU is an activation function, fuse is a fusion function, and the fusion mode can adopt modes such as Sum (Sum) and splicing (collocation), and the two modes have close performance.

Further, the present embodiment employs a cross connection module inserted between the two branches to share information between the two modules. <xnotran> , 10 , 3, 128, 128, 128, 128, 256, 256, 256, 512, 512 3, 32, 32, 64, 64, 64, 64, 64, 128, 128. </xnotran>

In an embodiment of the application, in step S40, the final feature obtained in step S30 is eliminated through a global pooling layer by three dimensions of time sequence T, graph node V, and number M, and the feature is mapped to each action category through a full connection layer, and finally, a score of each action category is obtained through a Softmax function.

To achieve the above object, as shown in fig. 5, a second aspect of the present application provides a fast-slow dual-flow graph convolutional neural network-based action recognition system according to the present invention, which includes the following modules:

the acquisition module is used for acquiring the characteristics of the human skeleton joint;

the processing module is used for carrying out regularization processing on the human body skeleton joint features, wherein the shapes of a batch of the human body skeleton joint features are deformed, a one-dimensional regularization module is used for regularizing time sequence dimension, and the shapes of a batch of the human body skeleton joint features are deformed into the original shapes again;

the generating module is used for copying the human body skeleton joint features processed by the processing module, generating two identical human body skeleton joint features, inputting the two identical human body skeleton joint features into a fast branch and a slow branch of a fast-slow double-flow graph convolution network respectively for feature learning, and fusing learning results of the fast branch and the slow branch to obtain the features of each action category, wherein the fast branch and the slow branch of the fast-slow double-flow graph convolution network have the same network structure and have different network parameter configurations and input features;

In order to implement the foregoing embodiments, the present invention further provides a computer device, where the computer device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the method implements the method for identifying an action based on a fast-slow dual-flow graph convolutional neural network according to the embodiments of the present application.

In order to implement the foregoing embodiments, the present invention further provides a non-transitory computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the method for identifying an action based on a fast-slow dual-flow graph convolutional neural network according to an embodiment of the present application is implemented.

Although the present application has been disclosed in detail with reference to the accompanying drawings, it is to be understood that such description is merely illustrative and not restrictive of the application of the present application. The scope of the present application is defined by the appended claims and may include various modifications, adaptations, and equivalents of the invention without departing from the scope and spirit of the application.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Moreover, various embodiments or examples and features of various embodiments or examples described in this specification can be combined and combined by one skilled in the art without being mutually inconsistent.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one of the feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. A motion recognition method based on a fast and slow dual-flow graph convolutional neural network is characterized by comprising the following steps:

step S10, obtaining human skeleton joint characteristics;

step S40, performing dimensionality elimination on the features of each action category through the global pooling layer, mapping the features subjected to dimensionality elimination to the corresponding action categories through the full connection layer, and obtaining the score of each action category through a Softmax function;

the specific steps in step S30 include:

each branch comprises a plurality of continuously superposed graph convolution blocks, and each graph convolution block comprises a space graph convolution layer and a time sequence convolution layer; the time sequence convolution layer is a two-dimensional convolution module, the size of a convolution kernel is (t, 1), and t is the time sequence receptive field of the convolution kernel; a batch regularization layer and a ReLU activation function are attached to the two convolution layers, so that the characteristics of each channel are kept in the same distribution; the calculation of the convolution block is described using the following formula:

B _k and C _k Is an adaptive adjacency matrix proposed in 2s-AGCN, which changes during network training, wherein B _k Is set to A at initialization _k For learning the potential association of any two nodes; c _k The matrix is obtained by calculation according to sample characteristics and is used for describing specific node association of the sample;

the transverse connection module is used for sharing the information learned by the fast branch and the slow branch, and the fast branch is fused with the slow branch

And

the feature shapes of (A, B, beta C, alpha T, V, M) and (B, C, T, M, V) are respectively (B, beta C, alpha T, V, M) and (B, C, T, M, V), firstly, a two-dimensional convolution layer is adopted to carry out feature shape conversion, a batch regularization layer and a ReLU function are added after feature shape conversion is carried out, and then the two features are fused in a splicing or adding mode.

2. The method of claim 1, wherein the step S10 comprises the steps of:

(C,T,M,V)

3. The method of claim 1, wherein the step S20 comprises the steps of:

regularization processing is carried out on data, batch training is used in the training process, and the characteristic shape of the tensor of one batch is as follows:

(B,C,T,M,V)

firstly, a one-dimensional batch of tensors is deformed, and the deformation is as follows:

(B,M*V*C,T)

and then regularizing the time sequence T dimension by using one-dimensional regularization module, and re-deforming the features into the original shape (B, C, T, M, V).

4. The method of claim 1, wherein the following two equations describe the characteristic shapes of the input features of the convolution blocks at the same stage respectively:

f _fastin ＝(B,βC,αT,V,M)

f _slowin ＝(B,C,T,V,M)

the timing dimension of the fast branch is always at, where a is a positive integer and represents the ratio of the input frame rate of the fast branch to the frame rate of the slow branch in the initial input features, and the number of channels in the fast branch is β _i C _i Is significantly less than the channel number C of the convolution block of the same-stage slow branch graph _i Where i is the block number, β _i Is a value less than 1, and the V of both branches are identical, both being the number of graph nodes.

5. The method of claim 1, wherein in step S40, the final features obtained in step S30 are passed through a global pooling layer to eliminate three dimensions of time sequence T, graph nodes V and number of people M, and the features are mapped to various action categories through a full connection layer, and finally, a score of each action category is obtained through a Softmax function.

6. A motion recognition system based on a fast and slow biflow graph convolutional neural network is characterized by comprising:

the determining module is used for carrying out dimensionality elimination on the features of each action category through the global pooling layer, mapping the features subjected to dimensionality elimination to the corresponding action categories through the full connection layer, and obtaining the score of each action category through a Softmax function;

the generation module is further configured to:

each branch comprises a plurality of continuously superposed graph convolution blocks, and each graph convolution block comprises a space graph convolution layer and a time sequence convolution layer; the time sequence convolution layer is a two-dimensional convolution module, the size of a convolution kernel is (t, 1), and t is a time sequence receptive field of the convolution kernel; after the two convolution layers, a batch regularization layer and a ReLU activation function are attached, so that the characteristics of each channel are ensured to keep the same distribution; the calculation of the convolution block is described using the following formula:

B _k and C _k Is in 2s-The adaptive adjacency matrix proposed in the AGCN changes during network training, where B _k Is set to A at initialization _k For learning the potential association of any two nodes; c _k The matrix is obtained by calculation according to sample characteristics and is used for describing specific node association of the sample;

the transverse connection module is used for sharing the information learned by the fast branch and the slow branch, and the information is fused from the fast branch to the slow branch

And

7. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any one of claims 1-5 when executing the computer program.

8. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the method of any one of claims 1-5.