WO2020248581A1 - 图数据识别方法、装置、计算机设备和存储介质 - Google Patents

图数据识别方法、装置、计算机设备和存储介质 Download PDF

Info

Publication number
WO2020248581A1
WO2020248581A1 PCT/CN2019/129708 CN2019129708W WO2020248581A1 WO 2020248581 A1 WO2020248581 A1 WO 2020248581A1 CN 2019129708 W CN2019129708 W CN 2019129708W WO 2020248581 A1 WO2020248581 A1 WO 2020248581A1
Authority
WO
WIPO (PCT)
Prior art keywords
matrix
feature map
neural network
convolutional neural
target
Prior art date
Application number
PCT/CN2019/129708
Other languages
English (en)
French (fr)
Inventor
张一帆
史磊
Original Assignee
中国科学院自动化研究所
中国科学院自动化研究所南京人工智能芯片创新研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院自动化研究所, 中国科学院自动化研究所南京人工智能芯片创新研究院 filed Critical 中国科学院自动化研究所
Publication of WO2020248581A1 publication Critical patent/WO2020248581A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Definitions

  • This application relates to the field of computer technology, and in particular to a method, device, computer equipment and storage medium for identifying graph data.
  • the human body is represented by the coordinates of a number of pre-defined key joint points in the camera coordinate system. It can be easily obtained by depth cameras (such as Kinect) and various pose estimation algorithms (such as OpenPose).
  • Figure 1 shows the key joint points of the human body defined by the Kinect depth camera. It defines the human body as the three-dimensional coordinates of 25 key joint points. Since behaviors often exist in the form of videos, a behavior with a length of T frames can be represented by a Tx25x3 tensor.
  • Each joint point is defined as a node of the graph, and the physical connection between joint points is defined as an edge of the graph, and a time dimension edge is added between the same node in adjacent frames to obtain a space-time graph that can describe human behavior .
  • the common behavior recognition method based on bone points is graph convolution.
  • Graph convolution is different from ordinary convolution.
  • the number of neighboring nodes of each node is not fixed, and the parameters of the convolution operation are fixed.
  • a mapping function needs to be defined, and the corresponding parameters and nodes are realized through the mapping function. If the size of the convolution kernel is defined as three, as shown in Figure 3, the three parameters correspond to the point 001 far from the center of the human body, the point 002 near the center of the human body 000, and the convolution point itself 003. Then the convolution operation can be expressed by formula (1):
  • mapping function can be realized by the adjacency matrix of the graph, and the convolution operation represented by the adjacency matrix is shown in formula (2):
  • A represents the adjacency matrix of the graph
  • K is the size of the convolution kernel
  • is used to normalize A.
  • the adjacency matrix defines the topological structure of the human body graph used in the graph convolution network.
  • the fixed topology structure cannot accurately describe every posture of the human body, resulting in low recognition accuracy.
  • this application provides a method, device, computer equipment, and storage medium for identifying image data.
  • this application provides a method for identifying image data, including:
  • the input feature map is a feature map generated according to the image data
  • the first bias matrix is a matrix generated when the trained convolutional neural network is generated
  • the recognition result of the map data is recognized.
  • this application provides a special image generating device, including:
  • the data acquisition module is used to acquire the input feature map of the current convolutional layer input to the trained convolutional neural network, the input feature map is a feature map generated according to the image data, and the first bias matrix of the current convolutional layer is acquired, where The first bias matrix is the matrix generated when the trained convolutional neural network is generated;
  • the second bias matrix generating module is used to generate a second bias matrix according to the input feature map
  • the target adjacency matrix generation module is used to obtain the reference adjacency matrix, calculate the sum of the reference adjacency matrix, the first offset matrix and the second offset matrix, to obtain the target adjacency matrix;
  • the target output feature map generation module is used to obtain the convolution kernel of the current convolution layer, and generate the target output feature map according to the convolution kernel of the current convolution layer, the target adjacency matrix and the input feature map.
  • the recognition module is used to output the feature map according to the target and identify the recognition result of the map data.
  • a computer device includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, and the processor implements the following steps when the processor executes the computer program:
  • the input feature map is a feature map generated according to the image data
  • the first bias matrix is a matrix generated when the trained convolutional neural network is generated
  • the recognition result of the map data is recognized.
  • a computer-readable storage medium having a computer program stored thereon, and when the computer program is executed by a processor, the following steps are implemented:
  • the input feature map is a feature map generated according to the image data
  • the first bias matrix is a matrix generated when the trained convolutional neural network is generated
  • the recognition result of the map data is recognized.
  • the above-mentioned image data recognition method, device, computer equipment and storage medium includes: obtaining an input feature map of the current convolutional layer input to the trained convolutional neural network, the input feature map being a feature map generated according to the image data, Obtain the first bias matrix of the current convolutional layer, where the first bias matrix is the matrix generated when the trained convolutional neural network is generated, the second bias matrix is generated according to the input feature map, the reference adjacency matrix is obtained, and the reference is calculated The sum of the adjacency matrix, the first bias matrix and the second bias matrix to obtain the target adjacency matrix, obtain the convolution kernel of the current convolution layer, and generate it according to the convolution kernel of the current convolution layer, the target adjacency matrix and the input feature map The target output feature map, according to the target output feature map, the recognition result of the map data is recognized.
  • the first bias matrix in the bias matrix is the matrix determined according to the demand, and the second bias is generated according to the input feature map.
  • the setting matrix is a matrix generated according to the input data. The addition of a bias matrix can characterize the required features and the sample features of the input data, improve the accuracy of generating feature maps, and thereby improve the recognition accuracy of the trained convolutional neural network.
  • FIG. 1 is a schematic diagram of the key joint points of the human body defined by the Kinect depth camera in an embodiment
  • Figure 2 is a time-space diagram describing human behavior in an embodiment
  • FIG. 3 is a schematic diagram of nodes defined in graph convolution in an embodiment
  • FIG. 4 is an application environment diagram of a graph data recognition method in one embodiment in one embodiment
  • FIG. 5 is a schematic flowchart of a method for generating a feature map in an embodiment
  • Figure 6 is a schematic diagram of a data processing flow of a convolutional layer in an embodiment
  • Fig. 7 is a structural block diagram of a feature map generating device in an embodiment
  • Figure 8 is an internal structure diagram of a computer device in an embodiment.
  • Fig. 4 is a diagram of an application environment of a graph data recognition method in an embodiment.
  • the map data recognition method is applied to a feature map generation system.
  • the feature map generation system includes a terminal 110 and a server 120.
  • the terminal 110 and the server 120 are connected through a network.
  • the terminal or server obtains the input feature map of the current convolutional layer of the trained convolutional neural network.
  • the input feature map is a feature map generated based on image data, and the input feature map is a feature map obtained by extracting image data.
  • the first bias matrix of the convolutional layer where the first bias matrix is the matrix generated when the trained convolutional neural network is generated, the second bias matrix is generated according to the input feature map, the reference adjacency matrix is obtained, and the reference adjacency matrix is calculated ,
  • the sum of the first bias matrix and the second bias matrix to obtain the target adjacency matrix obtain the convolution kernel of the current convolution layer, and generate the target output according to the convolution kernel of the current convolution layer, the target adjacency matrix and the input feature map Feature map, output the feature map according to the target, and identify the recognition result of the map data.
  • the terminal 110 may specifically be a desktop terminal or a mobile terminal, and the mobile terminal may specifically be at least one of a mobile phone, a tablet computer, and a notebook computer.
  • the server 120 may be implemented as an independent server or a server cluster composed of multiple servers.
  • a method for identifying image data is provided.
  • the method is mainly applied to the terminal 110 (or server 120) in FIG. 4 as an example.
  • the image data recognition method specifically includes the following steps:
  • Step S201 Obtain an input feature map of the current convolutional layer input to the trained convolutional neural network.
  • the input feature map is a feature map generated based on image data.
  • the trained convolutional neural network refers to the training obtained through a large amount of labeled graph data, where the graph data is a spatiotemporal graph of human behavior, and the spatiotemporal graph is shown in FIG. 2.
  • the tags carried by the image data include human behaviors, such as individual behaviors or multi-person behaviors such as clapping, jumping, pulling hands, and fighting.
  • the trained convolutional neural network contains multiple convolutional layers, and the current convolutional layer can be any convolutional layer in the convolutional neural network.
  • the output data of the previous convolutional layer is the input data of the current convolutional layer, the output data of the previous convolutional layer is obtained as the input data of the current convolutional layer, and the input data is the input feature map.
  • Step S202 Obtain the first bias matrix of the current convolutional layer.
  • the first bias matrix is a matrix generated when the trained convolutional neural network is generated.
  • the first bias matrix is the bias matrix obtained according to the training requirements. Different training requirements refer to the requirements of what the convolutional neural network is used for after training. For example, it is used to recognize clapping hands and recognize fights. A bias matrix is not the same.
  • Step S203 Generate a second bias matrix according to the input feature map.
  • the second bias matrix is a matrix generated according to the input feature map.
  • the second bias matrix is obtained by performing dimensionality reduction and normalization operations on the input matrix.
  • the second bias matrix is related to the input feature map.
  • the input feature map is mapped according to the function in the current convolutional layer to obtain the mapped matrix.
  • the mapped matrix will be multiplied and the product operation result will be normalized. Wait for processing to obtain the second bias matrix.
  • step S203 includes: using the dimensionality reduction function in the trained convolutional neural network to reduce the dimensionality of the input feature map to obtain the dimensionality reduction matrix, normalize the dimensionality reduction matrix, and obtain the normalization matrix,
  • the normalized matrix is the second bias matrix.
  • the dimensionality reduction function includes a first dimensionality reduction function and a second dimensionality reduction function.
  • the input feature map pair is reduced according to the first dimensionality reduction function to obtain the first feature map, and the input feature map is processed according to the second dimensionality reduction function. Reduce the dimension to obtain the second feature map. Calculate the product of the first feature map and the second feature map to obtain the first product matrix.
  • Different dimensionality reduction functions are used to reduce the dimensionality of the input feature map, and different dimensionality reduction matrices are obtained, and the parameters of the dimensionality reduction function are obtained by training according to requirements. Calculate the product of the two reduced-dimensional matrices, that is, the first product matrix.
  • Each point in the first product matrix represents the feature similarity between the point corresponding to the abscissa and the point corresponding to the ordinate.
  • 1 represents joint point 1
  • 2 represents joint point 2
  • the matrix element in the coordinates (1, 2) in the first product matrix represents the feature similarity between joint point 1 and joint point 2.
  • Step S204 Obtain the reference adjacency matrix, calculate the sum of the reference adjacency matrix, the first offset matrix, and the second offset matrix, to obtain the target adjacency matrix.
  • the first offset matrix, the second offset matrix, and the reference adjacency matrix have the same dimensional information, and the sum of the reference adjacency matrix, the first offset matrix, and the second offset matrix is calculated, that is, the matrix elements at the same position are calculated. Add directly to get the target adjacency matrix.
  • Step S205 Obtain the convolution kernel of the current convolution layer.
  • each convolution layer includes multiple convolution kernels, the number of convolution kernels corresponding to each convolution layer may be the same or different, and each convolution kernel may be the same or different.
  • the convolution kernel is used to perform convolution operations on the image, and different convolution kernels can extract different image features.
  • the target adjacency matrix is used to perform feature extraction on the input feature map to obtain the feature map.
  • Step S205 generating a target output feature map according to the convolution kernel of the current convolution layer, the target adjacency matrix and the input feature map.
  • the target adjacency matrix is used for feature extraction corresponding to the input feature map to obtain the corresponding feature map
  • the convolution kernel is used to perform the convolution operation on the feature map extracted from the target path matrix
  • the feature map obtained by the convolution operation is used as the target output feature Figure.
  • the input feature map includes at least three dimensions
  • the target adjacency matrix includes at least three dimensions.
  • Step S205 includes:
  • Step S2051 Reshape the input feature map to obtain a reshaped feature map.
  • the first dimension of the reshaped feature map is the product of the first dimension and the second dimension of the input feature map.
  • reshaping refers to adjusting the input feature map so that the product of the first dimension and the second dimension is the first dimension of the reshaping feature map, such as adjusting an input feature map containing three dimensions to two dimensions Reshape the feature map, assuming that the input feature map is C ⁇ M ⁇ N, where the first dimension is C, the second dimension is M, and the third dimension is N, then the image can be reshaped to C ⁇ M ⁇ N, the first dimension It is the product CM of C and M.
  • the second dimension is the same as the third dimension of the input feature map, keeping the elements of the entire input feature map unchanged, and the total element is the product CMN of C, M and N.
  • the first dimension C is the channel element
  • the second dimension M is the number of rows of the input feature map
  • the third dimension N is the number of columns of the input feature map.
  • N represents the number of human joint points
  • N is defined as 25 in Kinect. Reshaping the matrix is to facilitate calculations.
  • Step S2052 Calculate the product of the matrix of each channel of the reshaped feature map and the target adjacency matrix to obtain a second product matrix of each channel.
  • the second dimension of the reshaped feature map is the same as the first dimension of the matrix of each channel of the target adjacency matrix.
  • the reshaped feature map is C ⁇ M ⁇ N
  • the target adjacency matrix is C ⁇ N ⁇ N
  • each channel The matrix is N ⁇ N
  • the product matrix of the reshaped feature map and the matrix of each channel is CMN.
  • Step S2053 De-reshape the second product matrix of each channel to obtain the anti-reshape feature map of each channel.
  • de-reshaping is the inverse process of reshaping. If reshaping is to convert a three-dimensional matrix into a two-dimensional matrix, then anti-reshaping is to convert a two-dimensional matrix into a three-dimensional matrix.
  • the product matrix of each channel is CM ⁇ N
  • the anti-reshaping feature map obtained after anti-reshaping is a C ⁇ M ⁇ N matrix.
  • Step S2054 Perform a convolution operation on the de-reshaping feature map according to the convolution kernel of each channel and the convolution kernel of each channel to obtain the target feature map of each channel of the current convolution layer.
  • Step S2055 the target feature maps of each channel are summed to obtain the output feature map of the current convolutional layer, and the output feature map of the current convolutional layer is taken as the target output feature map.
  • feature extraction is performed on the anti-reshaping feature map through the convolution kernel corresponding to each channel to obtain the feature corresponding to each convolution kernel, and the features extracted by each convolution kernel form the target feature map of each channel.
  • step S205 further includes:
  • Step S2056 Determine whether the number of channels of the output feature map is consistent with the number of channels of the input feature map.
  • step S2057 when they are consistent, the sum of the input feature map and the output feature map is used as the target output feature map of the current convolutional layer.
  • Step S2058 when inconsistent, perform convolution operation on the input feature map to obtain a convolution feature map consistent with the number of channels of the output feature map, and use the sum of the convolution feature map and the output feature map as the target output feature map.
  • each channel matrix of the output feature map generated by the convolution kernel of each channel and the corresponding reverse reshaping matrix it is determined whether the output feature map is the same as the channel number of the input feature map.
  • the input The feature map and the elements at the position corresponding to the output feature are added to obtain the target output feature map.
  • they are inconsistent perform a convolution operation on the input feature map to obtain a convolution feature map with the same channel as the output feature map, and calculate the sum of elements at the same position of the convolution feature map and the output feature map to obtain the target output feature map.
  • Step S206 output the feature map according to the target, and identify the recognition result of the map data.
  • the target output feature map is input to the recognition layer in the trained convolutional neural network, the candidate recognition result corresponding to the target output feature map is identified through the recognition layer, and the candidate recognition result with the highest recognition probability is selected from the candidate recognition results, as Target recognition result, using the target recognition result as the recognition result corresponding to the image data.
  • the recognition types include clapping, jumping, and holding hands
  • the recognition probability corresponding to clapping is 0.89
  • the recognition probability corresponding to jumping is 0.01
  • the recognition probability corresponding to holding hands is 0.1
  • the recognition result corresponding to the map data is clapping.
  • the target output feature map is used as the input feature map of the next convolutional layer
  • Use the next convolutional layer as the current convolutional layer enter to obtain the input feature map of the current convolutional layer of the trained convolutional neural network, until each convolutional layer in the trained convolutional neural network is completed, Output the target output feature map of the last convolutional layer, and input the target output feature map of the last convolutional layer into the recognition layer to obtain the recognition result corresponding to the book.
  • the data processing flow of the convolutional layer with the same network structure is the same.
  • the above image data recognition method obtains the input feature map of the current convolutional layer input to the trained convolutional neural network.
  • the input feature map is the feature map generated according to the image data, and obtains the first bias matrix of the current convolutional layer, where
  • the first bias matrix is the matrix generated when the trained convolutional neural network is generated, the second bias matrix is generated according to the input feature map, the reference adjacency matrix is obtained, and the reference adjacency matrix, the first bias matrix and the second bias are calculated
  • the sum of the matrices, the target adjacency matrix is obtained, the convolution kernel of the current convolution layer is obtained, and the target output feature map is generated according to the convolution kernel of the current convolution layer, the target adjacency matrix and the input feature map, and the target output feature map is identified Recognition result of graph data.
  • step S206 includes:
  • Step S2061 When the current convolutional layer is the last convolutional layer in the trained convolutional neural network, it is judged whether there are target feature maps that need to be merged in the multiple target output feature maps.
  • Step S2062 when it exists, merge the target output feature maps that need to be merged to obtain a merged feature map.
  • Step S2063 When the combined feature map includes all target output feature maps, the combined feature map is recognized, and the recognition result corresponding to the combined feature map is obtained.
  • Step S2064 When the merged feature map contains all the target output feature maps, the merged feature map is recognized to obtain the recognition result corresponding to the merged feature map, and the unmerged target output feature map is recognized to obtain the unmerged target output feature map The corresponding recognition result.
  • Step S2065 when it does not exist, identify each target output feature map, and obtain a recognition result corresponding to each target output feature map.
  • the current convolutional layer is the last convolutional layer in the trained convolutional neural network, it means that the convolution operation has been explained, and the final target output feature map of the graph data is extracted, and the final target output feature is based on the final target output feature.
  • the need to merge is because some behaviors require multiple people to complete, such as holding hands, fighting and other behaviors require at least two talents to complete.
  • the recognition result of the target output feature map is used as the recognition result of the map data.
  • the recognition result of the graph data may be one or more sub-recognition results.
  • the obtained recognition result includes multiple behaviors, and each behavior serves as a sub-recognition result.
  • the specific merged result is related to the human behavior in the input data.
  • the data includes multiple types of behaviors such as clapping, holding hands, fighting, etc., holding hands and fighting need to be merged
  • the merged feature map includes two. Recognize the merged feature map directly to obtain the recognition result of the merged feature map.
  • the unmerged target output feature map is directly identified, and the recognition result corresponding to the unmerged target output feature map is obtained.
  • FIG. 6 is a schematic diagram of a data processing flow of the convolutional layer in an embodiment.
  • fin is the input feature map of the current convolutional layer
  • fout is the output feature map of the current convolutional layer.
  • the output feature map adopts the specific representation of the input feature map as shown in formula (3):
  • Ak is the k-th adjacency matrix in the reference adjacency matrix
  • Bk is the k-th adjacency matrix in the offset matrix
  • Ck is the k-th adjacency matrix in the offset matrix
  • softmax(S) represents the pair matrix S Perform a normalization calculation
  • Wk is the k-th parameter of the convolution kernel
  • Kv is the size of the convolution kernel
  • the bias matrix Bk is the matrix obtained after training the convolutional neural network. Bk and Ak have the same size information, which is N ⁇ N.
  • the residual network res is used, where the size of the convolution kernel in the residual network is 1 ⁇ 1.
  • the input feature map is adjusted to a matrix consistent with the number of channels of the output feature map, and the sum of the adjusted input feature map and the output feature map is calculated to obtain the target output feature map.
  • the sum of the input feature map and the output feature map is calculated to obtain the target output feature map. According to the target output feature map, the behavior of each map data is recognized, and the corresponding recognition result is obtained.
  • the above-mentioned feature map generation process is the data processing process of training a convolutional neural network
  • the recognition corresponding to each map data is inconsistent with the category in the label of the map data
  • the loss value corresponding to each map data according to the preset loss function
  • Return the loss value according to the gradient return algorithm to obtain the return value of the convolutional layer
  • the recognition result of the feature map is output according to the target as the recognition result corresponding to the map data.
  • the convolutional neural network contains multiple convolutional layers and recognition layers. Each convolutional layer includes a convolution kernel and a target adjacency matrix.
  • the target adjacency matrix pair of each convolutional layer Perform feature extraction on the image data to obtain the corresponding image feature map set, perform convolution operation on the image feature set through the convolution kernel, obtain the target output feature map of each convolution layer, and identify the target output feature map of the previous convolution layer of the layer As the input data of the recognition layer, according to the target output feature map of each map data, the corresponding human behavior type is recognized.
  • Fig. 5 is a schematic flowchart of a method for identifying graph data in an embodiment. It should be understood that, although the various steps in the flowchart of FIG. 5 are displayed in sequence as indicated by the arrows, these steps are not necessarily performed in sequence in the order indicated by the arrows. Unless specifically stated in this article, the execution of these steps is not strictly limited in order, and these steps can be executed in other orders. Moreover, at least part of the steps in FIG. 5 may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but can be executed at different times. The execution of these sub-steps or stages The sequence is not necessarily performed sequentially, but may be performed alternately or alternately with at least a part of other steps or sub-steps or stages of other steps.
  • a special image generating apparatus 200 including:
  • the data acquisition module 201 is used to acquire the input feature map of the current convolutional layer input to the trained convolutional neural network.
  • the input feature map is a feature map generated according to the image data to acquire the first bias matrix of the current convolutional layer,
  • the first bias matrix is the matrix generated when the trained convolutional neural network is generated.
  • the second bias matrix generating module 202 is configured to generate a second bias matrix according to the input feature map.
  • the target adjacency matrix generation module 203 is configured to obtain a reference adjacency matrix, calculate the sum of the reference adjacency matrix, the first offset matrix, and the second offset matrix, to obtain the target adjacency matrix.
  • the target output feature map generating module 204 is configured to obtain the convolution kernel of the current convolution layer, and generate the target output feature map according to the convolution kernel of the current convolution layer, the target adjacency matrix and the input feature map.
  • the recognition module 205 is used to output the feature map according to the target and recognize the recognition result of the map data.
  • the second bias matrix generation module is specifically configured to use the dimensionality reduction function in the trained convolutional neural network to reduce the dimensionality of the input feature map to obtain the dimensionality reduction matrix, and to normalize the dimensionality reduction matrix to obtain The normalized matrix is the second offset matrix.
  • the second bias matrix generation module is specifically configured to reduce the dimensionality of the matrix of each channel of the input feature map according to the first dimensionality reduction function in the dimensionality reduction function to obtain the first dimensionality reduction matrix of each channel,
  • the matrix of each channel of the input feature map is reduced in dimension to obtain the second dimension reduction matrix of each channel, where the dimensionality reduction function includes two, and the input feature map includes at least three Dimension, the first dimension is the number of channels.
  • the above-mentioned image data recognition device further includes:
  • the network generation module is used to generate the trained convolutional neural network.
  • the network generation module includes:
  • the data acquisition unit is used to acquire a training set containing multiple training image data, and the training image data carries label information.
  • the feature extraction unit is used to input training image data and label information into the initial convolutional neural network, and extract the features of each training image data through the initial convolutional neural network.
  • the recognition unit is used to identify the recognition result corresponding to each training image data according to the characteristics of each training image data.
  • the loss value calculation unit is used to calculate the recognition result of each training image data and the loss value of the label according to the preset loss function.
  • the network determining unit is used to obtain the trained convolutional neural network when the loss value is less than or equal to the preset loss value.
  • the network determining unit further includes:
  • the parameter update subunit is used to update the network parameters of the initial convolutional neural network through the gradient backhaul algorithm according to the loss value when the loss value is greater than the preset loss value.
  • the network determination subunit is used to use the initial convolutional neural network with updated network parameters as the initial convolutional neural network, enter the training map data and label information into the initial convolutional neural network, until each training map is calculated according to the preset loss function When the data recognition result and the label loss value are less than or equal to the preset loss value, the trained convolutional neural network is obtained.
  • the network determining subunit when the network determining subunit specifically returns the loss value to any convolutional layer through the gradient return algorithm, the return value of each convolutional layer is obtained, and the volume is updated according to the return value of each convolutional layer.
  • Fig. 8 shows an internal structure diagram of a computer device in an embodiment.
  • the computer device may specifically be the terminal 110 (or the server 120) in FIG. 4.
  • the computer equipment includes a processor, a memory, a network interface, an input device, and a display screen connected through a system bus.
  • the memory includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium of the computer device stores an operating system, and may also store a computer program.
  • the processor can realize the image data identification method.
  • a computer program can also be stored in the internal memory, and when the computer program is executed by the processor, the processor can execute the graph data identification method.
  • the display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen.
  • the input device of the computer equipment can be a touch layer covered on the display screen, or a button, trackball or touch pad set on the housing of the computer equipment. It can be an external keyboard, touchpad, or mouse.
  • FIG. 8 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
  • the specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.
  • the special image generating apparatus may be implemented in the form of a computer program, and the computer program may run on the computer device as shown in FIG. 8.
  • the memory of the computer device can store various program modules that make up the special map generation device, such as the data acquisition module 201, the second bias matrix generation module 202, the target adjacency matrix generation module 203, and the target output characteristic map shown in FIG. A generation module 204 and an identification module 205.
  • the computer program composed of each program module causes the processor to execute the steps in the graph data identification method of each embodiment of the application described in this specification.
  • the computer device shown in FIG. 8 can obtain the input feature map of the current convolutional layer of the trained convolutional neural network through the data acquisition module 201 in the special map generating device shown in FIG.
  • the first bias matrix is the matrix generated when the trained convolutional neural network is generated.
  • the computer device may generate the second bias matrix according to the input feature map through the second bias matrix generation module 202.
  • the computer device can obtain the reference adjacency matrix through the target adjacency matrix generation module 203, calculate the sum of the reference adjacency matrix, the first offset matrix, and the second offset matrix, to obtain the target adjacency matrix.
  • the computer device can obtain the convolution kernel of the current convolution layer through the target output feature map generation module 204, and generate the target output feature map according to the convolution kernel of the current convolution layer, the target adjacency matrix, and the input feature map.
  • the computer device can output the feature map according to the target through the recognition module 205, and recognize the recognition result of the map data.
  • a computer device including a memory, a processor, and a computer program stored in the memory and running on the processor.
  • the processor executes the computer program, the following steps are implemented: Obtain the input trained volume The input feature map of the current convolutional layer of the convolutional neural network.
  • the input feature map is a feature map generated from image data to obtain the first bias matrix of the current convolution layer, where the first bias matrix is to generate the trained convolution
  • the matrix generated in the neural network generates the second bias matrix according to the input feature map, obtains the reference adjacency matrix, calculates the sum of the reference adjacency matrix, the first bias matrix and the second bias matrix, obtains the target adjacency matrix, and obtains the current volume
  • the convolution kernel of the buildup layer generates a target output feature map according to the convolution kernel of the current convolution layer, the target adjacency matrix and the input feature map, and recognizes the recognition result of the map data according to the target output feature map.
  • the second bias matrix generated according to the input feature map includes: using the dimensionality reduction function in the trained convolutional neural network to reduce the dimensionality of the input feature map to obtain the dimensionality reduction matrix, and the normalization reduction Dimensional matrix, the normalized matrix is obtained, and the normalized matrix is the second offset matrix.
  • the dimensionality reduction function includes two, and the input feature map includes at least three dimensions, where the first dimension is the number of channels, including: according to the first dimensionality reduction function in the dimensionality reduction function, each of the input feature maps The matrix of the channel is reduced in dimensionality to obtain the first dimensionality reduction matrix of each channel, and the dimensionality of the matrix of each channel of the input feature map is reduced according to the second dimensionality reduction function in the dimensionality reduction function to obtain the second dimensionality reduction matrix of each channel Calculate the product of the first reduced-dimensional matrix and the second reduced-dimensional matrix of each channel to obtain the first product matrix of each channel, normalize the first product matrix of each channel, and obtain the matrix of the channel corresponding to the normalized matrix.
  • the processor further implements the following steps when executing the computer program: the step of generating a trained convolutional neural network includes: obtaining a training set containing multiple training graph data, the training graph data carrying label information, and the training The graph data and label information are input into the initial convolutional neural network, and the characteristics of each training graph data are extracted through the initial convolutional neural network. According to the characteristics of each training graph data, the recognition result corresponding to each training graph data is identified, and the preset loss function is used Calculate the recognition result of each training image data and the loss value of the label. When the loss value is less than or equal to the preset loss value, the trained convolutional neural network is obtained.
  • the processor further implements the following steps when executing the computer program: when the loss value is greater than the preset loss value, the network parameters of the initial convolutional neural network are updated through the gradient backhaul algorithm according to the loss value, and the network parameters are updated by using The initial convolutional neural network as the initial convolutional neural network, enter the training map data and label information into the initial convolutional neural network, until the recognition result of each training map data and the loss value of the label are calculated according to the preset loss function, which is less than or When it is equal to the preset loss value, a trained convolutional neural network is obtained.
  • the initial convolutional neural network model includes at least one convolutional layer
  • the convolutional layer includes an initial bias matrix and an initial dimensionality reduction function
  • the network of the initial convolutional neural network is updated through a gradient backhaul algorithm according to the loss value
  • Parameters include: when the loss value is returned to any convolutional layer through the gradient return algorithm, the return value of each convolutional layer is obtained, and the parameters of the initial dimensionality reduction function and the initial value are updated according to the return value of each convolutional layer The parameters of the bias matrix.
  • identifying the recognition result corresponding to the map data includes: when the current convolutional layer is the last convolutional layer in the trained convolutional neural network, judging multiple target outputs Whether there is a target feature map that needs to be merged in the feature map, if it exists, merge the target output feature maps that need to be merged to obtain a merged feature map.
  • the merged feature map contains all target output feature maps, identify the merged feature map to get The recognition result corresponding to the merged feature map.
  • the merged feature map contains all target output feature maps
  • the merged feature map is recognized to obtain the recognition result corresponding to the merged feature map
  • the unmerged target output feature map is identified to obtain the unmerged
  • the target output feature map is recognized, and the recognition result corresponding to each target output feature map is obtained.
  • a computer-readable storage medium on which a computer program is stored.
  • the computer program is executed by a processor, the following steps are implemented:
  • Feature map the input feature map is a feature map generated from image data to obtain the first bias matrix of the current convolutional layer, where the first bias matrix is the matrix generated when the trained convolutional neural network is generated, according to the input features
  • the graph generates the second offset matrix, obtains the reference adjacency matrix, calculates the reference adjacency matrix, the sum of the first offset matrix and the second offset matrix, obtains the target adjacency matrix, and obtains the convolution kernel of the current convolution layer, according to the current volume
  • the convolution kernel, the target adjacency matrix and the input feature map of the buildup layer generate the target output feature map, and the recognition result of the map data is recognized according to the target output feature map.
  • the second bias matrix generated according to the input feature map includes: using the dimensionality reduction function in the trained convolutional neural network to reduce the dimensionality of the input feature map to obtain the dimensionality reduction matrix, and the normalization reduction Dimensional matrix, the normalized matrix is obtained, and the normalized matrix is the second offset matrix.
  • the dimensionality reduction function includes two, and the input feature map includes at least three dimensions, where the first dimension is the number of channels, including: according to the first dimensionality reduction function in the dimensionality reduction function, each of the input feature maps The matrix of the channel is reduced in dimensionality to obtain the first dimensionality reduction matrix of each channel, and the dimensionality of the matrix of each channel of the input feature map is reduced according to the second dimensionality reduction function in the dimensionality reduction function to obtain the second dimensionality reduction matrix of each channel Calculate the product of the first reduced-dimensional matrix and the second reduced-dimensional matrix of each channel to obtain the first product matrix of each channel, normalize the first product matrix of each channel, and obtain the matrix of the channel corresponding to the normalized matrix.
  • the step of generating a trained convolutional neural network includes: obtaining a training set containing multiple training graph data, the training graph data carrying label information, and The training map data and label information are input into the initial convolutional neural network, and the characteristics of each training map data are extracted through the initial convolutional neural network. According to the characteristics of each training map data, the recognition results corresponding to each training map data are identified, and the preset loss is The function calculates the recognition result of each training image data and the loss value of the label. When the loss value is less than or equal to the preset loss value, the trained convolutional neural network is obtained.
  • the following steps are also implemented: when the loss value is greater than the preset loss value, the network parameters of the initial convolutional neural network are updated through the gradient backhaul algorithm according to the loss value, and the updated network
  • the initial convolutional neural network with parameters is used as the initial convolutional neural network.
  • the training map data and label information are input into the initial convolutional neural network until the recognition result of each training map data and the loss value of the label are calculated according to the preset loss function. When it is equal to the preset loss value, the trained convolutional neural network is obtained.
  • the initial convolutional neural network model includes at least one convolutional layer
  • the convolutional layer includes an initial bias matrix and an initial dimensionality reduction function
  • the network of the initial convolutional neural network is updated through a gradient backhaul algorithm according to the loss value
  • Parameters include: when the loss value is returned to any convolutional layer through the gradient return algorithm, the return value of each convolutional layer is obtained, and the parameters of the initial dimensionality reduction function and the initial value are updated according to the return value of each convolutional layer The parameters of the bias matrix.
  • identifying the recognition result corresponding to the map data includes: when the current convolutional layer is the last convolutional layer in the trained convolutional neural network, judging multiple target outputs Whether there is a target feature map that needs to be merged in the feature map, if it exists, merge the target output feature maps that need to be merged to obtain a merged feature map.
  • the merged feature map contains all target output feature maps, identify the merged feature map to get The recognition result corresponding to the merged feature map.
  • the merged feature map contains all target output feature maps
  • the merged feature map is recognized to obtain the recognition result corresponding to the merged feature map
  • the unmerged target output feature map is identified to obtain the unmerged
  • the target output feature map is recognized, and the recognition result corresponding to each target output feature map is obtained.
  • Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM double data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous chain Channel
  • memory bus Radbus direct RAM
  • RDRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

一种图数据识别方法、装置、计算机设备和存储介质。所述方法包括:获取输入已训练的卷积神经网络的当前卷积层的输入特征图(S201),输入特征图为根据图像数据生成的特征图,获取当前卷积层的第一偏置矩阵(S202),其中第一偏置矩阵为生成已训练的卷积神经网络时生成的矩阵,根据输入特征图生成第二偏置矩阵(S203),获取参考邻接矩阵,计算参考邻接矩阵、第一偏置矩阵和第二偏置矩阵的和,得到目标邻接矩阵(S204),获取当前卷积层的卷积核(S205),根据当前卷积层的卷积核、目标邻接矩阵和输入特征图生成目标输出特征图(S206),根据目标输出特征图,识别出图数据的识别结果。所述方法对现有的固定的邻接矩阵基础上增加可调整的偏置矩阵,提高已训练的卷积神经网络的识别准确率。

Description

图数据识别方法、装置、计算机设备和存储介质 技术领域
本申请涉及计算机技术领域,尤其涉及一种图数据识别方法、装置、计算机设备和存储介质。
背景技术
在骨骼点数据中,人体是由若干预先定义好的关键关节点在相机坐标系中的坐标来表示的。它可以很方便地通过深度摄像头(例如Kinect)以及各种姿态估计算法(例如OpenPose)获得。图1为Kinect深度摄像机所定义的人体的关键关节点。它将人体定义为25个关键关节点的三维坐标。由于行为往往是以视频的形式存在的,所以一个长度为T帧的行为可以用Tx25x3的张量来表示。
参照图2,图2为一个实施例中的时空图。每个关节点定义为图的节点,关节点之间的物理连接定义为图的边,并且在相邻帧的同一个节点间加上时间维度的边,得到一张可以描述人体行为的时空图。
目前常见的基于骨骼点的行为识别方法为图卷积。图卷积和普通卷积操作不同,在图上做卷积时,每一个节点的邻节点数是不固定的,而卷积操作的参数是固定的,为了将固定数量的参数和不定数量的临节点数对应起来,需要定义映射函数,通过映射函数实现参数和节点的对应。如定义卷积核大小为三,如图3所示,三个参数分别对应于远离人体中心的点001,靠近人体中心点000的点002和卷积点本身003。则卷积操作可以用公式(1)表示:
Figure PCTCN2019129708-appb-000001
其中f是输入输出特征张量,w是卷积参数,v是图中节点,l代表节点与参数间的映射函数,Z是归一化函数。在具体实现时,映射函数可以通过图的邻接矩阵来实现,通过邻接矩阵表示的卷积操作如公式(2)所示:
Figure PCTCN2019129708-appb-000002
其中A代表图的邻接矩阵,K是卷积核大小,Λ用于对A进行归一化处理。通过与邻接矩阵A相乘,从特征张量中“筛选”出所需要的节点并与对应的参数相乘。
上述通过邻接矩阵表示的卷积操作时,邻接矩阵定义了用于图卷积网络中的人体图的拓扑结构。人体姿态多种多样,固定的拓扑结构无法准确地描述人体的每一种姿态,从而导致识别准确率低下。
技术问题
为了解决上述技术问题,本申请提供了一种图数据识别方法、装置、计算机设备和存储介质。
技术解决方案
第一方面,本申请提供了一种图数据识别方法,包括:
获取输入已训练的卷积神经网络的当前卷积层的输入特征图,输入特征图为根据图像数据生成的特征图;
获取当前卷积层的第一偏置矩阵,其中第一偏置矩阵为生成已训练的卷积神经网络时生成的矩阵;
根据输入特征图生成第二偏置矩阵;
获取参考邻接矩阵,计算参考邻接矩阵、第一偏置矩阵和第二偏置矩阵的和,得到目标邻接矩阵;
获取当前卷积层的卷积核;
根据当前卷积层的卷积核、目标邻接矩阵和输入特征图生成目标输出特征图;
根据目标输出特征图,识别出图数据的识别结果。
第二方面,本申请提供了一种特图生成装置,包括:
数据获取模块,用于获取输入已训练的卷积神经网络的当前卷积层的输入特征图,输入特征图为根据图像数据生成的特征图,获取当前卷积层的第一偏置矩阵,其中第一偏置矩阵为生成已训练的卷积神经网络时生成的矩阵;
第二偏置矩阵生成模块,用于根据输入特征图生成第二偏置矩阵;
目标邻接矩阵生成模块,用于获取参考邻接矩阵,计算参考邻接矩阵、第一偏置矩阵和第二偏置矩阵的和,得到目标邻接矩阵;
目标输出特征图生成模块,用于获取当前卷积层的卷积核,根据当前卷积层的卷积核、目标邻接矩阵和输入特征图生成目标输出特征图。
识别模块,用于根据目标输出特征图,识别出图数据的识别结果。
一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现以下步骤:
获取输入已训练的卷积神经网络的当前卷积层的输入特征图,输入特征图为根据图像数据生成的特征图;
获取当前卷积层的第一偏置矩阵,其中第一偏置矩阵为生成已训练的卷积神经网络时生成的矩阵;
根据输入特征图生成第二偏置矩阵;
获取参考邻接矩阵,计算参考邻接矩阵、第一偏置矩阵和第二偏置矩阵的和,得到目标邻接矩阵;
获取当前卷积层的卷积核;
根据当前卷积层的卷积核、目标邻接矩阵和输入特征图生成目标输出特征图;
根据目标输出特征图,识别出图数据的识别结果。
一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现以下步骤:
获取输入已训练的卷积神经网络的当前卷积层的输入特征图,输入特征图为根据图像数据生成的特征图;
获取当前卷积层的第一偏置矩阵,其中第一偏置矩阵为生成已训练的卷积神经网络时生成的矩阵;
根据输入特征图生成第二偏置矩阵;
获取参考邻接矩阵,计算参考邻接矩阵、第一偏置矩阵和第二偏置矩阵的和,得到目标邻接矩阵;
获取当前卷积层的卷积核;
根据当前卷积层的卷积核、目标邻接矩阵和输入特征图生成目标输出特征图;
根据目标输出特征图,识别出图数据的识别结果。
有益效果
上述图数据识别方法、装置、计算机设备和存储介质,所述方法包括:获取输入已训练的卷积神经网络的当前卷积层的输入特征图,输入特征图为 根据图像数据生成的特征图,获取当前卷积层的第一偏置矩阵,其中第一偏置矩阵为生成已训练的卷积神经网络时生成的矩阵,根据输入特征图生成第二偏置矩阵,获取参考邻接矩阵,计算参考邻接矩阵、第一偏置矩阵和第二偏置矩阵的和,得到目标邻接矩阵,获取当前卷积层的卷积核,根据当前卷积层的卷积核、目标邻接矩阵和输入特征图生成目标输出特征图,根据目标输出特征图,识别出图数据的识别结果。对已训练的卷积神经网络中各个卷积层中的邻接矩阵增加偏置矩阵,偏置矩阵中的第一偏置矩阵为根据需求确定的已矩阵,和根据输入特征图生成的第二偏置矩阵为根据输入数据生成的矩阵,增加偏置矩阵能够表征需求所需的特征和输入数据的样本特征,提高生成特征图的准确性,进而提高已训练的卷积神经网络的识别准确率。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本发明的实施例,并与说明书一起用于解释本发明的原理。
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,对于本领域普通技术人员而言,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为一个实施例中Kinect深度摄像机所定义的人体的关键关节点的示意图;
图2为一个实施例中描述人体行为的时空图;
图3为一个实施例中图卷积中定义的节点示意图;
图4为一个实施例中一个实施例中图数据识别方法的应用环境图;
图5为一个实施例中特征图生成方法的流程示意图;
图6为一个实施例中卷积层的数据处理流程示意图;
图7为一个实施例中特征图生成装置的结构框图;
图8为一个实施例中计算机设备的内部结构图。
具体实施方式
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述, 显然,所描述的实施例是本申请的一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本申请保护的范围。
图4为一个实施例中图数据识别方法的应用环境图。参照图4,该图数据识别方法应用于特征图生成系统。该特征图生成系统包括终端110和服务器120。终端110和服务器120通过网络连接。终端或服务器获取输入已训练的卷积神经网络的当前卷积层的输入特征图,输入特征图为根据图像数据生成的特征图,输入特征图为是通过提取图数据得到的特征图,获取当前卷积层的第一偏置矩阵,其中第一偏置矩阵为生成已训练的卷积神经网络时生成的矩阵,根据输入特征图生成第二偏置矩阵,获取参考邻接矩阵,计算参考邻接矩阵、第一偏置矩阵和第二偏置矩阵的和,得到目标邻接矩阵,获取当前卷积层的卷积核,根据当前卷积层的卷积核、目标邻接矩阵和输入特征图生成目标输出特征图,根据目标输出特征图,识别出图数据的识别结果。终端110具体可以是台式终端或移动终端,移动终端具体可以手机、平板电脑、笔记本电脑等中的至少一种。服务器120可以用独立的服务器或者是多个服务器组成的服务器集群来实现。
如图5所示,在一个实施例中,提供了一种图数据识别方法。本实施例主要以该方法应用于上述图4中的终端110(或服务器120)来举例说明。参照图5,该图数据识别方法具体包括如下步骤:
步骤S201,获取输入已训练的卷积神经网络的当前卷积层的输入特征图。
在本具体实施例中,输入特征图为根据图像数据生成的特征图。
具体地,已训练的卷积神经网络是指通过大量的携带标签的图数据训练得到的,其中图数据为人体行为的时空图,时空图如图2所示。图数据携带的标签包括人体的行为,如拍手、跳跃、拉手和打架等个人行为或多人行为。已训练的卷积神经网络包含多个卷积层,当前卷积层可以为卷积神经网络中的任意一个卷积层。上一个卷积层的输出数据为当前卷积层的输入数据,获取上一个卷积层中的输出数据作为当前卷积层的输入数据,输入数据为输入特征图。
步骤S202,获取当前卷积层的第一偏置矩阵。
具体地,第一偏置矩阵为生成已训练的卷积神经网络时生成的矩阵。 第一偏置矩阵是根据训练需求得到的偏置矩阵,不同的训练需求,是指卷积神经网络训练后用于做什么的需求,如用于识别拍手和用于识别打架时,得到的第一偏置矩阵不相同。
步骤S203,根据输入特征图生成第二偏置矩阵。
具体地,第二偏置矩阵为根据输入特征图生成的矩阵。通过对输入矩阵进行降维、归一化等操作得到第二偏置矩阵。第二偏置矩阵与输入特征图相关,根据当前卷积层中的函数对输入特征图进行映射,得到映射后的矩阵,将对映射后的矩阵进行乘积运算,对乘积运算结果进行归一化等处理,得到第二偏置矩阵。
在一个实施例中,步骤S203,包括:采用已训练的卷积神经网络中的降维函数对输入特征图进行降维,得到降维矩阵,归一化降维矩阵,得到归一化矩阵,归一化矩阵为第二偏置矩阵。
具体地,降维函数包括第一降维函数和第二降维函数,根据第一降维函数对输入特征图对进行降维得到第一特征图,根据第二降维函数对输入特征图进行降维得到第二特征图。计算第一特征图和第二特征图的乘积,得到第一乘积矩阵。对输入特征图采用不同的降维函数进行降维,得到不同的降维矩阵,其中降维函数的参数是根据需求进行训练得到的。计算两个降维后的矩阵的乘积,即第一乘积矩阵,第一乘积矩阵中的各个点表示横坐标对应的点和纵坐标对应的点之间的特征相似度。如1表示关节点1,2表示关节点2,则第一乘积矩阵中的坐标(1,2)中的矩阵元素,表示关节点1和关节点2之间的特征相似度。对第一乘积矩阵进行归一化操作得到归一化矩阵,归一化操作,对数据进行归一化操作可以提高数据计算的精度,加快数据的收敛速度。将归一化后的第一乘积矩阵作为第二偏置矩阵。
步骤S204,获取参考邻接矩阵,计算参考邻接矩阵、第一偏置矩阵和第二偏置矩阵的和,得到目标邻接矩阵。
具体地,第一偏置矩阵、第二偏置矩阵和参考邻接矩阵具有相同的维度信息,计算参考邻接矩阵、第一偏置矩阵和第二偏置矩阵的和,即对相同位置的矩阵元素直接进行相加,得到目标邻接矩阵。
步骤S205,获取当前卷积层的卷积核。
具体地,每个卷积层包含多个卷积核,各个卷积层对应的卷积核数量可以相同或不同,各个卷积核可以相同也可以不相同。卷积核是用于对图像进行卷积运算,不同的卷积核可以提取不同的图像特征。采用目标邻接矩阵对输入特征图进行特征提取,得到特征图。
步骤S205,根据当前卷积层的卷积核、目标邻接矩阵和输入特征图生成目标输出特征图。
具体地,采用目标邻接矩阵对应输入特征图进行特征提取,得到对应的特征图,采用卷积核对根据目标路径矩阵提取的特征图进行卷积运算,将卷积运算得到的特征图作为目标输出特征图。采用目标邻接矩阵对输入特征图进行特征提取,能够提取到更为准确的特征。
在一个实施例中,输入特征图至少包括三个维度,目标邻接矩阵包括至少三个维度,步骤S205,包括:
步骤S2051,重塑输入特征图,得到重塑特征图。
在本具体实施例中,重塑特征图的第一维度为输入特征图的第一维度和第二维度的乘积。
具体地,重塑是指对输入特征图进行调整,使得第一维度和第二维度的乘积为重塑特征图的第一维度,如将包含三个维度的输入特征图调整成2个维度的重塑特征图,假设输入特征图为C×M×N,其中第一维度为C,第二维度为M,第三维度为N,则可以重塑图为C×M×N,第一维度为C和M的乘积CM,第二维度与输入特征图的第三维图相同,保持整个输入特征图的元素不变,总元素为C、M和N的乘积CMN。其中第一维度C为通道素,第二维度M为输入特征图的行数,第三维度N为输入特征图的列数。其中N代表的是人体关节点的数量,在Kinect中N定义为25。对矩阵进行重塑是为了方便运算。
步骤S2052,计算重塑特征图和目标邻接矩阵的各个通道的矩阵的乘积,得到各个通道的第二乘积矩阵。
具体地,重塑特征图的第二维度与目标邻接矩阵的各个通道的矩阵的第一维度相同,如重塑特征图为C×M×N,目标邻接矩阵为C×N×N,各个通道矩阵为N×N,则重塑特征图与各个通道的矩阵的乘积矩阵为CMN。
步骤S2053,反重塑各个通道的第二乘积矩阵,得到各个通道的反 重塑特征图。
具体地,反重塑是重塑的逆过程,如重塑是将三维矩阵转换为二位矩阵,则反重塑为将二维矩阵转换为三维矩阵,如各个通道的乘积矩阵为CM×N,则反重塑后得到的反重塑特征图为C×M×N的矩阵。
步骤S2054,根据各个通道的卷积核,根据各个通道的卷积核对反重塑特征图进行卷积运算,得到当前卷积层各个通道的目标特征图。
步骤S2055,对各个通道的目标特征图进行求和,得到当前卷积层的输出特征图,将当前卷积层的输出特征图作为目标输出特征图。
具体地,通过各个通道对应的卷积核,对反重塑特征图进行特征提取,得到各个卷积核对应的特征,由各个卷积核提取到的特征组成各个通道的目标特征图。计算各个通道的目标特征图的和,即对应位置的矩阵元素相加,得到输出特征图,该输出特征图为当前卷积层的输出特征图。
在一个实施例中,步骤S205,还包括:
步骤S2056,判断输出特征图的通道数是否与输入特征图的通道数一致。
步骤S2057,当一致时,将输入特征图对输出特征图的和,作为当前卷积层的目标输出特征图。
步骤S2058,当不一致时,对输入特征图进行卷积运算,得到与输出特征图的通道数一致的卷积特征图,将卷积特征图与输出特征图的和,作为目标输出特征图。
具体地,根据各个通道的卷积核和对应的额反重塑矩阵生成的输出特征图的各个通道矩阵,判断输出特征图是通道是否与输入特征图的通道数一致,当一致时,输入特征图与输出特征对应的位置的元素进行相加,得到目标输出特征图。当不一致时,对输入特征图进行卷积运算,得到与输出特征图具有相同通道的卷积特征图,计算卷积特征图与输出特征图相同位置的元素的和,得到目标输出特征图。通过判断输入特征图与输出特征图的通道数,根据通道数确定目标输出特征图,提高了目标输出特征图的准确性。
步骤S206,根据目标输出特征图,识别出图数据的识别结果。
具体地,将目标输出特征图输入已训练的卷积神经网络中的识别层, 通过识别层识别目标输出特征图对应的候选识别结果,从候选识别结果中选择识别概率最大的候选识别结果,作为目标识别结果,将目标识别结果作为图数据对应的识别结果。如识别类型包括拍手、跳跃、牵手三个类型时,其中拍手对应的识别概率为0.89,跳跃对应的识别概率为0.01,牵手对应的识别概率为0.1时,则图数据对应的识别结果为拍手。
在一个实施例中,当已训练的卷积神经网络中包括当前卷积层,和当前卷积层的下一个卷积层时,将目标输出特征图作为下一个卷积层的输入特征图,将下一卷积层作为当前卷积层,进入获取输入已训练的卷积神经网络的当前卷积层的输入特征图,直至已训练的卷积神经网络中的各个卷积层都完成时,输出最后一个卷积层的目标输出特征图,将最后一个卷积层的目标输出特征图输入识别层,得到图书对应的识别结果。具有相同网络结构的卷积层的数据处理流程相同。
上述图数据识别方法,获取输入已训练的卷积神经网络的当前卷积层的输入特征图,输入特征图为根据图像数据生成的特征图,获取当前卷积层的第一偏置矩阵,其中第一偏置矩阵为生成已训练的卷积神经网络时生成的矩阵,根据输入特征图生成第二偏置矩阵,获取参考邻接矩阵,计算参考邻接矩阵、第一偏置矩阵和第二偏置矩阵的和,得到目标邻接矩阵,获取当前卷积层的卷积核,根据当前卷积层的卷积核、目标邻接矩阵和输入特征图生成目标输出特征图,根据目标输出特征图,识别出图数据的识别结果。对现有的固定的邻接矩阵基础上增加可调整的偏置矩阵,偏置矩阵中的第一偏置矩阵为根据训练需求得到的,故能够好的表征需求所对应的特征,第二偏置矩阵根据任务需求确定参数,且根据输入的图数据生成,能够表征各个图数据的特征,从图数据本身和需求同时确定的偏置矩阵,能够更好的标准各个图数据的本身的特征,从而提高已训练的卷积神经网络的识别准确率。在一个实施例中,步骤S206,包括:
步骤S2061,当当前卷积层为已训练的卷积神经网络中的最后一个卷积层时,判断多个目标输出特征图中是否存在需要合并的目标特征图。
步骤S2062,当存在时,合并需要合并的目标输出特征图,得到合并特征图。
步骤S2063,当合并特征图包含全部目标输出特征图时,对合并特 征图进行识别,得到合并特征图对应的识别结果。
步骤S2064,当合并特征图包含全部目标输出特征图时,对合并特征图进行识别,得到合并特征图对应的识别结果,对未合并的目标输出特征图进行识别,得到未合并的目标输出特征图对应的识别结果。
步骤S2065,当不存在时,对各个目标输出特征图进行识别,得到各个目标输出特征图对应的识别结果。
具体地,当当前卷积层为为已训练的卷积神经网络中的最后一个卷积层时,表示卷积运算已经解释,提取到了图数据最终的目标输出特征图,根据最终的目标输出特征图进行识别之前,判断各个最终的目标输出特征图是否需要合并。需要进行合并是因为,有的行为需要多人才能完成,如牵手、打架等行为至少需要两个人才能够完成。对于个人可以完成的行为,则无需合并目标输出特征图,直接对目标输出特征图进行识别,将目标输出特征图的识别结果,作为图数据的识别结果。图数据的识别结果可以为一个或多个子识别结果,当图数据的输入包含个行为时,得到的识别结果包含多个行为,各个行为作为一个子识别结果。
当存在需要合并的目标输出特征图时,确定哪些目标输出特征图为需要合并的,如何合并的,将需要合并的目标输出特征图进行合并,直接对合并特征图,其中合并特征图可以为一个或多个,具体的合并结果与输入数据中的人体行为相关,如图数据中包括拍手、牵手、打架等多个类型的行为时,牵手和打架需要进行合并,则合并特征图包括两个。直接对合并特征图进行识别,得到合并特征图的识别结果。当存在部分合并,部分不合并的情况时,不合并的目标输出特征图直接进行识别,得到不合并的目标输出特征图对应的识别结果。
在一个具体的实施例中,参照图6,图6为一个实施例中卷积层的数据处理流程示意图。图中的fin为当前卷积层的输入特征图,fout为当前卷积层的输出特征图。输出特征图采用输入特征图的具体表示方式如公式(3)所示:
Figure PCTCN2019129708-appb-000003
式中,Ak为参考邻接矩阵中的第k个邻接矩阵,Bk为偏置矩阵中的第k个邻接矩阵,Ck为偏置矩阵中的第k个邻接矩阵,softmax(S)表示对矩阵S 进行归一化一算,Wk为卷积核的第k个参数,Kv为卷积核的大小,卷积核的大小可以自定义,如可以设置Kv=3或5。为第一降维函数,为第二降维函数。假设输入特征图的尺寸信息为Cin×T×N,其中Cin代表输入通道数,T代表图数据的帧数,N代表kinect定义的关节节点数,其中N=25。对输入特征图进行重塑得到CinT×N的重塑特征图,偏置矩阵Bk为训练卷积神经网络后得到的矩阵,Bk与Ak具有相同的尺寸信息,即为N×N,计算计算第一偏置矩阵Bk、第二篇值矩阵CK与参考邻接矩阵Ak的和,得到目标邻接矩阵的各个通道的矩阵,计算目标邻接矩阵各个通道的矩阵与重塑特征图的乘积,得到各个通道的第二乘积矩阵,并对各个通道的第二乘积矩阵进行反重塑,得到反重塑矩阵,获取个卷积核Wk,其,通过各个通道的卷积核对反重塑矩阵进行卷积运算,得到各个通道对应的输出特征图。判断输出特征图是否与输入特征图的通道数是否一致,当不一致时,通过残差网络res,其中残差网络中的卷积核大小为1×1。将输入特征图调整致与输出特征图的通道数一致的矩阵,计算调整后的输入特征图与输出特征图的和,得到目标输出特征图。当一致时,计算输入特征图与输出特征图的和,得到目标输出特征图。根据目标输出特征图对各个图数据的行为进行识别,得到对应的识别结果。
当上述特征图的生过程为训练卷积神经网络的数据处理过程,且当各个图数据对应的识别与图数据的标签中的类别不一致时,根据预设损失函数各个图数据对应的损失值,根据梯度回传算法回传损失值,得到卷积层的回传值,根据回传值更新对应的卷积层的各个通道的卷积核w的参数、第一偏置矩阵B和用于计算第二偏置矩阵C的映射函数的参数。
当上述特征图的生过程为训练后的已训练的卷积神经网络的数据处理过程时,根据目标输出特征图识别的结果作为图数据对应的识别结果。将图数据输入已训练的卷积神经网路,卷积神经网络包含多个卷积层和识别层,各个卷积层包括卷积核和目标邻接矩阵,通过各个卷积层的目标邻接矩阵对图数据进行特征提取,得到对应的图像特征图集合,通过卷积核对图像特征集合进行卷积运算,得到各个卷积层的目标输出特征图,识别层的上一个卷积层的目标输出特征图作为识别层的输入数据,根据各个图数据的目标输出特征图,识别出对应的人体行为类型。
图5为一个实施例中图数据识别方法的流程示意图。应该理解的是,虽然图5的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图5中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
在一个实施例中,如图7所示,提供了一种特图生成装置200,包括:
数据获取模块201,用于获取输入已训练的卷积神经网络的当前卷积层的输入特征图,输入特征图为根据图像数据生成的特征图,获取当前卷积层的第一偏置矩阵,其中第一偏置矩阵为生成已训练的卷积神经网络时生成的矩阵。
第二偏置矩阵生成模块202,用于根据输入特征图生成第二偏置矩阵。
目标邻接矩阵生模块203,用于获取参考邻接矩阵,计算参考邻接矩阵、第一偏置矩阵和第二偏置矩阵的和,得到目标邻接矩阵。
目标输出特征图生成模块204,用于获取当前卷积层的卷积核,根据当前卷积层的卷积核、目标邻接矩阵和输入特征图生成目标输出特征图。
识别模块205,用于根据目标输出特征图,识别出图数据的识别结果。
在一个实施例中,第二偏置矩阵生成模块具体用于采用已训练的卷积神经网络中的降维函数对输入特征图进行降维,得到降维矩阵,归一化降维矩阵,得到归一化矩阵,归一化矩阵为第二偏置矩阵。
在一个实施例中,第二偏置矩阵生成模块具体用于根据降维函数中的第一降维函数对输入特征图的各个通道的矩阵进行降维,得到各个通道的第一降维矩阵,根据降维函数中的第二降维函数对输入特征图的各个通道的矩阵进行降维,得到各个通道的第二降维矩阵,其中,降维函数包括两个,输入特征图至少包括三个维度,第一维度为通道数,计算各个通道的第一降维矩阵和第二降维 矩阵的乘积,得到各个通道的第一乘积矩阵,归一化各个通道的第一乘积矩阵,得到归一化矩阵对应的通道的矩阵。
在一个实施例中,上述图数据识别装置,还包括:
网络生成模块,用于生成已训练的卷积神经网络。其中网络生成模块,包括:
数据获取单元,用于获取包含多个训练图数据的训练集合,训练图数据携带标签信息。
特征提取单元,用于将训练图数据和标签信息输入初始卷积神经网络,通过初始卷积神经网络提取各个训练图数据的特征。
识别单元,用于根据各个训练图数据的特征,识别出各个训练图数据对应的识别结果。
损失值计算单元,用于按照预设损失函数计算各个训练图数据的识别结果和标签的损失值。
网络确定单元,用于当损失值小于或等于预设损失值时,得到已训练的卷积神经网络。
在一个实施例中,网络确定单元,还包括:
参数更新子单元,用于当损失值大于预设损失值时,根据损失值通过梯度回传算法更新初始卷积神经网络的网络参数。
网络确定子单元,用于采用更新了网络参数的初始卷积神经网络作为初始卷积神经网络,进入将训练图数据和标签信息输入初始卷积神经网络,直至按照预设损失函数计算各个训练图数据的识别结果和标签的损失值,小于或等于预设损失值时,得到已训练的卷积神经网络。
在一个实施例中,网络确定子单元具体通过梯度回传算法将损失值回传到任意一个卷积层时,得到各个卷积层的回传值,根据各个卷积层的回传值更新卷积层的网络参数,其中,初始卷积神经网络模型包括至少一个卷积层,网络参数包括初始降维函数的参数和初始偏置矩阵的参数。
图8示出了一个实施例中计算机设备的内部结构图。该计算机设备具体可以是图4中的终端110(或服务器120)。如图8所示,该计算机设备包括该计算机设备包括通过系统总线连接的处理器、存储器、网络接口、输入装置和 显示屏。其中,存储器包括非易失性存储介质和内存储器。该计算机设备的非易失性存储介质存储有操作系统,还可存储有计算机程序,该计算机程序被处理器执行时,可使得处理器实现图数据识别方法。该内存储器中也可储存有计算机程序,该计算机程序被处理器执行时,可使得处理器执行图数据识别方法。计算机设备的显示屏可以是液晶显示屏或者电子墨水显示屏,计算机设备的输入装置可以是显示屏上覆盖的触摸层,也可以是计算机设备外壳上设置的按键、轨迹球或触控板,还可以是外接的键盘、触控板或鼠标等。
本领域技术人员可以理解,图8中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
在一个实施例中,本申请提供的特图生成装置可以实现为一种计算机程序的形式,计算机程序可在如图8所示的计算机设备上运行。计算机设备的存储器中可存储组成该特图生成装置的各个程序模块,比如,图7所示的数据获取模块201、第二偏置矩阵生成模块202、目标邻接矩阵生模块203、目标输出特征图生成模块204和识别模块205。各个程序模块构成的计算机程序使得处理器执行本说明书中描述的本申请各个实施例的图数据识别方法中的步骤。
例如,图8所示的计算机设备可以通过如图7所示的特图生成装置中的数据获取模块201执行获取输入已训练的卷积神经网络的当前卷积层的输入特征图,输入特征图为根据图像数据生成的特征图,获取当前卷积层的第一偏置矩阵,其中第一偏置矩阵为生成已训练的卷积神经网络时生成的矩阵。计算机设备可以通过第二偏置矩阵生成模块202执行根据输入特征图生成第二偏置矩阵。计算机设备可以通过目标邻接矩阵生模块203执行获取参考邻接矩阵,计算参考邻接矩阵、第一偏置矩阵和第二偏置矩阵的和,得到目标邻接矩阵。计算机设备可以通过目标输出特征图生成模块204执行获取当前卷积层的卷积核,根据当前卷积层的卷积核、目标邻接矩阵和输入特征图生成目标输出特征图。计算机设备可以通过识别模块205执行根据目标输出特征图,识别出图数据的识别结果。
在一个实施例中,提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,处理器执行计算机程序时实 现以下步骤:获取输入已训练的卷积神经网络的当前卷积层的输入特征图,输入特征图为根据图像数据生成的特征图,获取当前卷积层的第一偏置矩阵,其中第一偏置矩阵为生成已训练的卷积神经网络时生成的矩阵,根据输入特征图生成第二偏置矩阵,获取参考邻接矩阵,计算参考邻接矩阵、第一偏置矩阵和第二偏置矩阵的和,得到目标邻接矩阵,获取当前卷积层的卷积核,根据当前卷积层的卷积核、目标邻接矩阵和输入特征图生成目标输出特征图,根据目标输出特征图,识别出图数据的识别结果。
在一个实施例中,根据输入特征图生成的第二偏置矩阵,包括:采用已训练的卷积神经网络中的降维函数对输入特征图进行降维,得到降维矩阵,归一化降维矩阵,得到归一化矩阵,归一化矩阵为第二偏置矩阵。
在一个实施例中,降维函数包括两个,输入特征图至少包括三个维度,其中,第一维度为通道数,包括:根据降维函数中的第一降维函数对输入特征图的各个通道的矩阵进行降维,得到各个通道的第一降维矩阵,根据降维函数中的第二降维函数对输入特征图的各个通道的矩阵进行降维,得到各个通道的第二降维矩阵,计算各个通道的第一降维矩阵和第二降维矩阵的乘积,得到各个通道的第一乘积矩阵,归一化各个通道的第一乘积矩阵,得到归一化矩阵对应的通道的矩阵。
在一个实施例中,处理器执行计算机程序时还实现以下步骤:生成已训练的卷积神经网络的步骤,包括:获取包含多个训练图数据的训练集合,训练图数据携带标签信息,将训练图数据和标签信息输入初始卷积神经网络,通过初始卷积神经网络提取各个训练图数据的特征,根据各个训练图数据的特征,识别出各个训练图数据对应的识别结果,按照预设损失函数计算各个训练图数据的识别结果和标签的损失值,当损失值小于或等于预设损失值时,得到已训练的卷积神经网络。
在一个实施例中,处理器执行计算机程序时还实现以下步骤:当损失值大于预设损失值时,根据损失值通过梯度回传算法更新初始卷积神经网络的网络参数,采用更新了网络参数的初始卷积神经网络作为初始卷积神经网络,进入将训练图数据和标签信息输入初始卷积神经网络,直至按照预设损失函数计算各个训练图数据的识别结果和标签的损失值,小于或等于预设损失值时,得到已 训练的卷积神经网络。
在一个实施例中,初始卷积神经网络模型包括至少一个卷积层,卷积层中包括初始偏置矩阵和初始降维函数,根据损失值通过梯度回传算法更新初始卷积神经网络的网络参数,包括:通过梯度回传算法将损失值回传到任意一个卷积层时,得到各个卷积层的回传值,根据各个卷积层的回传值更新初始降维函数的参数和初始偏置矩阵的参数。
在一个实施例中,根据目标输出特征图,识别出图数据对应的识别结果,包括:当当前卷积层为已训练的卷积神经网络中的最后一个卷积层时,判断多个目标输出特征图中是否存在需要合并的目标特征图,当存在时,合并需要合并的目标输出特征图,得到合并特征图,当合并特征图包含全部目标输出特征图时,对合并特征图进行识别,得到合并特征图对应的识别结果,当合并特征图包含全部目标输出特征图时,对合并特征图进行识别,得到合并特征图对应的识别结果,对未合并的目标输出特征图进行识别,得到未合并的目标输出特征图对应的识别结果,当不存在时,对各个目标输出特征图进行识别,得到各个目标输出特征图对应的识别结果。
在一个实施例中,提供了一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现以下步骤:获取输入已训练的卷积神经网络的当前卷积层的输入特征图,输入特征图为根据图像数据生成的特征图,获取当前卷积层的第一偏置矩阵,其中第一偏置矩阵为生成已训练的卷积神经网络时生成的矩阵,根据输入特征图生成第二偏置矩阵,获取参考邻接矩阵,计算参考邻接矩阵、第一偏置矩阵和第二偏置矩阵的和,得到目标邻接矩阵,获取当前卷积层的卷积核,根据当前卷积层的卷积核、目标邻接矩阵和输入特征图生成目标输出特征图,根据目标输出特征图,识别出图数据的识别结果。
在一个实施例中,根据输入特征图生成的第二偏置矩阵,包括:采用已训练的卷积神经网络中的降维函数对输入特征图进行降维,得到降维矩阵,归一化降维矩阵,得到归一化矩阵,归一化矩阵为第二偏置矩阵。
在一个实施例中,降维函数包括两个,输入特征图至少包括三个维度,其中,第一维度为通道数,包括:根据降维函数中的第一降维函数对输入特征图的各个通道的矩阵进行降维,得到各个通道的第一降维矩阵,根据降维函数 中的第二降维函数对输入特征图的各个通道的矩阵进行降维,得到各个通道的第二降维矩阵,计算各个通道的第一降维矩阵和第二降维矩阵的乘积,得到各个通道的第一乘积矩阵,归一化各个通道的第一乘积矩阵,得到归一化矩阵对应的通道的矩阵。
在一个实施例中,计算机程序被处理器执行时还实现以下步骤:生成已训练的卷积神经网络的步骤,包括:获取包含多个训练图数据的训练集合,训练图数据携带标签信息,将训练图数据和标签信息输入初始卷积神经网络,通过初始卷积神经网络提取各个训练图数据的特征,根据各个训练图数据的特征,识别出各个训练图数据对应的识别结果,按照预设损失函数计算各个训练图数据的识别结果和标签的损失值,当损失值小于或等于预设损失值时,得到已训练的卷积神经网络。
在一个实施例中,计算机程序被处理器执行时还实现以下步骤:当损失值大于预设损失值时,根据损失值通过梯度回传算法更新初始卷积神经网络的网络参数,采用更新了网络参数的初始卷积神经网络作为初始卷积神经网络,进入将训练图数据和标签信息输入初始卷积神经网络,直至按照预设损失函数计算各个训练图数据的识别结果和标签的损失值,小于或等于预设损失值时,得到已训练的卷积神经网络。
在一个实施例中,初始卷积神经网络模型包括至少一个卷积层,卷积层中包括初始偏置矩阵和初始降维函数,根据损失值通过梯度回传算法更新初始卷积神经网络的网络参数,包括:通过梯度回传算法将损失值回传到任意一个卷积层时,得到各个卷积层的回传值,根据各个卷积层的回传值更新初始降维函数的参数和初始偏置矩阵的参数。
在一个实施例中,根据目标输出特征图,识别出图数据对应的识别结果,包括:当当前卷积层为已训练的卷积神经网络中的最后一个卷积层时,判断多个目标输出特征图中是否存在需要合并的目标特征图,当存在时,合并需要合并的目标输出特征图,得到合并特征图,当合并特征图包含全部目标输出特征图时,对合并特征图进行识别,得到合并特征图对应的识别结果,当合并特征图包含全部目标输出特征图时,对合并特征图进行识别,得到合并特征图对应的识别结果,对未合并的目标输出特征图进行识别,得到未合并的目标输出特征图对 应的识别结果,当不存在时,对各个目标输出特征图进行识别,得到各个目标输出特征图对应的识别结果。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一非易失性计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
需要说明的是,在本文中,诸如“第一”和“第二”等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
以上所述仅是本发明的具体实施方式,使本领域技术人员能够理解或实现本发明。对这些实施例的多种修改对本领域的技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本发明的精神或范围的情况下,在其它实施例中实现。因此,本发明将不会被限制于本文所示的这些实施例,而是要符合与本文所申请的原理和新颖特点相一致的最宽的范围。

Claims (10)

  1. 一种图数据识别方法,其特征在于,所述方法包括:获取输入已训练的卷积神经网络的当前卷积层的输入特征图,所述输入特征图为根据图数据生成的特征图;
    获取所述当前卷积层的第一偏置矩阵,其中所述第一偏置矩阵为生成所述已训练的卷积神经网络时生成的矩阵;
    根据所述输入特征图生成第二偏置矩阵;
    获取参考邻接矩阵,计算所述参考邻接矩阵、所述第一偏置矩阵和所述第二偏置矩阵的和,得到目标邻接矩阵;
    获取所述当前卷积层的卷积核;
    根据所述当前卷积层的卷积核、所述目标邻接矩阵和所述输入特征图生成目标输出特征图;
    根据所述目标输出特征图,识别出所述图数据对应的识别结果。
  2. 根据权利要求1所时述的方法,其特征在于,所述根据所述输入特征图生成的第二偏置矩阵,包括:采用所述已训练的卷积神经网络中的降维函数对所述输入特征图进行降维,得到降维矩阵;
    归一化所述降维矩阵,得到归一化矩阵,所述归一化矩阵为所述第二偏置矩阵。
  3. 根据权利要求2所述的方法,其特征在于,所述降维函数包括两个,所述输入特征图至少包括三个维度,其中,第一维度为通道数,所述方法包括:根据所述降维函数中的第一降维函数对所述输入特征图的各个通道的矩阵进行降维,得到各个通道的第一降维矩阵;
    根据所述降维函数中的第二降维函数对所述输入特征图的各个通道的矩阵进行降维,得到各个所述通道的第二降维矩阵;
    计算各个通道的所述第一降维矩阵和第二降维矩阵的乘积,得到各个通道的第一乘积矩阵;
    归一化各个通道的第一乘积矩阵,得到所述归一化矩阵对应的通道的矩阵。
  4. 根据权利要求1所述的方法,其特征在于,生成所述已训练的卷积神经网络的步骤,包括:获取包含多个训练图数据的训练集合,所述训练图数据携带标签信息;
    将所述训练图数据和所述标签信息输入初始卷积神经网络,通过所述初始卷积神经网络提取各个所述训练图数据的特征;
    根据各个所述训练图数据的特征,识别出各个所述训练图数据对应的识别结果;
    按照预设损失函数计算各个所述训练图数据的识别结果和所述标签的损失值;
    当所述损失值小于或等于预设损失值时,得到所述已训练的卷积神经网络。
  5. 根据权利要求4所述的方法,其特征在于,所述方法还包括:当所述损失值大于所述预设损失值时,根据所述损失值通过梯度回传算法更新所述初始卷积神经网络的网络参数;
    采用更新了网络参数的初始卷积神经网络作为所述初始卷积神经网络,进入将所述训练图数据和所述标签信息输入初始卷积神经网络,直至所述按照预设损失函数计算各个所述训练图数据的识别结果和所述标签的损失值,小于或等于所述预设损失值时,得到所述已训练的卷积神经网络。
  6. 根据权利要求5所述的方法,其特征在于,所述初始卷积神经网络模型包括至少一个卷积层,所述卷积层中包括初始偏置矩阵和初始降维函数,所述根据所述损失值通过梯度回传算法更新所述初始卷积神经网络的网络参数,包括:通过所述梯度回传算法将所述损失值回传到任意一个所述卷积层时,得到各个所述卷积层的回传值;
    根据各个所述卷积层的回传值更新所述卷积层的网络参数,所述网络参数包括初始降维函数的参数和所述初始偏置矩阵的参数。
  7. 根据权利要求1至6中任一项所述的方法,其特征在于,所述根据所述目标输出特征图,识别出所述图数据对应的识别结果,包括:当所述当前卷积层为所述已训练的卷积神经网络中的最后一个卷积层时,判断多个所述目标输出特征图中是否存在需要合并的目标特征图;
    当存在时,合并需要合并的所述目标输出特征图,得到合并特征图;
    当所述合并特征图包含全部所述目标输出特征图时,对所述合并特征图进行识别,得到所述合并特征图对应的识别结果;
    当所述合并特征图包含全部所述目标输出特征图时,对所述合并特征图进行 识别,得到所述合并特征图对应的识别结果,对未合并的所述目标输出特征图进行识别,得到所述未合并的目标输出特征图对应的识别结果;
    当不存在时,对各个所述目标输出特征图进行识别,得到各个所述目标输出特征图对应的识别结果。
  8. 一种图数据识别装置,其特征在于,所述装置包括:数据获取模块,用于获取输入已训练的卷积神经网络的当前卷积层的输入特征图,所述输入特征图为根据图像数据生成的特征图,获取所述当前卷积层的第一偏置矩阵,其中所述第一偏置矩阵为生成所述已训练的卷积神经网络时生成的矩阵;
    第二偏置矩阵生成模块,用于根据所述输入特征图生成第二偏置矩阵;
    目标邻接矩阵生模块,用于获取参考邻接矩阵,计算所述参考邻接矩阵、所述第一偏置矩阵和所述第二偏置矩阵的和,得到目标邻接矩阵;
    目标输出特征图生成模块,用于获取所述当前卷积层的卷积核,根据所述当前卷积层的卷积核、所述目标邻接矩阵和所述输入特征图生成目标输出特征图;
    识别模块,用于根据所述目标输出特征图,识别出所述图数据的识别结果。
  9. 一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现权利要求1至7中任一项所述方法的步骤。
  10. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1至7中任一项所述的方法的步骤。
PCT/CN2019/129708 2019-06-11 2019-12-30 图数据识别方法、装置、计算机设备和存储介质 WO2020248581A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910503194.XA CN110378372A (zh) 2019-06-11 2019-06-11 图数据识别方法、装置、计算机设备和存储介质
CN201910503194.X 2019-06-11

Publications (1)

Publication Number Publication Date
WO2020248581A1 true WO2020248581A1 (zh) 2020-12-17

Family

ID=68250141

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/129708 WO2020248581A1 (zh) 2019-06-11 2019-12-30 图数据识别方法、装置、计算机设备和存储介质

Country Status (2)

Country Link
CN (1) CN110378372A (zh)
WO (1) WO2020248581A1 (zh)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112560712A (zh) * 2020-12-18 2021-03-26 西安电子科技大学 基于时间增强图卷积网络的行为识别方法、装置及介质
CN112633224A (zh) * 2020-12-30 2021-04-09 深圳云天励飞技术股份有限公司 一种社交关系识别方法、装置、电子设备及存储介质
CN112965062A (zh) * 2021-02-09 2021-06-15 西安电子科技大学 基于lstm-dam网络的雷达距离像目标识别方法
CN113239875A (zh) * 2021-06-01 2021-08-10 恒睿(重庆)人工智能技术研究院有限公司 人脸特征的获取方法、系统、装置及计算机可读存储介质
CN113269239A (zh) * 2021-05-13 2021-08-17 河南大学 一种基于多通道卷积神经网络的关系网络节点分类方法
CN113468980A (zh) * 2021-06-11 2021-10-01 浙江大华技术股份有限公司 一种人体行为识别方法及相关装置
CN113741459A (zh) * 2021-09-03 2021-12-03 阿波罗智能技术(北京)有限公司 确定训练样本的方法和自动驾驶模型的训练方法、装置
CN113761771A (zh) * 2021-09-16 2021-12-07 中国人民解放军国防科技大学 多孔材料吸声性能预测方法、装置、电子设备和存储介质
CN113887575A (zh) * 2021-09-13 2022-01-04 华南理工大学 一种基于自适应图卷积网络的图数据集增强方法
CN113971319A (zh) * 2021-10-12 2022-01-25 浙江腾腾电气有限公司 配置有精度补偿的稳压器及其补偿方法
CN114090651A (zh) * 2021-11-10 2022-02-25 哈尔滨工业大学(深圳) 基于双通道图神经网络自编码器的交通流异常数据判断方法
CN115564630A (zh) * 2022-09-28 2023-01-03 华能伊敏煤电有限责任公司 轮斗挖掘机的挖掘流量自动控制方法及其系统
CN116935363A (zh) * 2023-07-04 2023-10-24 东莞市微振科技有限公司 刀具识别方法、装置、电子设备及可读存储介质
CN117235584A (zh) * 2023-11-15 2023-12-15 之江实验室 图数据分类方法、装置、电子装置和存储介质
CN117251715A (zh) * 2023-11-17 2023-12-19 华芯程(杭州)科技有限公司 版图量测区域筛选方法、装置、电子设备及存储介质
CN117557244A (zh) * 2023-09-27 2024-02-13 国网江苏省电力有限公司信息通信分公司 基于知识图谱的电力运维警戒系统
CN113468980B (zh) * 2021-06-11 2024-05-31 浙江大华技术股份有限公司 一种人体行为识别方法及相关装置

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110378372A (zh) * 2019-06-11 2019-10-25 中国科学院自动化研究所南京人工智能芯片创新研究院 图数据识别方法、装置、计算机设备和存储介质
CN111967479A (zh) * 2020-07-27 2020-11-20 广东工业大学 基于卷积神经网络思想的图像目标识别方法
CN114707641A (zh) * 2022-03-23 2022-07-05 平安科技(深圳)有限公司 双视角图神经网络模型的训练方法、装置、设备及介质
CN115601617A (zh) * 2022-11-25 2023-01-13 安徽数智建造研究院有限公司(Cn) 基于半监督学习的带状脱空识别模型的训练方法和装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106372656A (zh) * 2016-08-30 2017-02-01 同观科技(深圳)有限公司 获取深度一次性学习模型的方法、图像识别方法及装置
CN108256544A (zh) * 2016-12-29 2018-07-06 深圳光启合众科技有限公司 图片分类方法和装置、机器人
US20190019056A1 (en) * 2015-11-30 2019-01-17 Pilot Ai Labs, Inc. System and method for improved general object detection using neural networks
CN110363086A (zh) * 2019-06-11 2019-10-22 中国科学院自动化研究所南京人工智能芯片创新研究院 图数据识别方法、装置、计算机设备和存储介质
CN110378372A (zh) * 2019-06-11 2019-10-25 中国科学院自动化研究所南京人工智能芯片创新研究院 图数据识别方法、装置、计算机设备和存储介质
CN110390259A (zh) * 2019-06-11 2019-10-29 中国科学院自动化研究所南京人工智能芯片创新研究院 图数据的识别方法、装置、计算机设备和存储介质

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8452108B2 (en) * 2008-06-25 2013-05-28 Gannon Technologies Group Llc Systems and methods for image recognition using graph-based pattern matching
KR101199492B1 (ko) * 2008-12-22 2012-11-09 한국전자통신연구원 광역 이동을 고려한 실시간 카메라 트래킹 장치 및 방법
CN101571948B (zh) * 2009-06-11 2011-10-19 西安电子科技大学 基于整体变分模型和神经网络的运动模糊图像恢复方法
CN105095833B (zh) * 2014-05-08 2019-03-15 中国科学院声学研究所 用于人脸识别的网络构建方法、识别方法及系统
US9619862B2 (en) * 2014-05-30 2017-04-11 Apple Inc. Raw camera noise reduction using alignment mapping
CN104036474B (zh) * 2014-06-12 2017-12-19 厦门美图之家科技有限公司 一种图像亮度和对比度的自动调节方法
CN106228142B (zh) * 2016-07-29 2019-02-15 西安电子科技大学 基于卷积神经网络和贝叶斯决策的人脸验证方法
CN106339753A (zh) * 2016-08-17 2017-01-18 中国科学技术大学 一种有效提升卷积神经网络稳健性的方法
US9943225B1 (en) * 2016-09-23 2018-04-17 International Business Machines Corporation Early prediction of age related macular degeneration by image reconstruction
CN107122396B (zh) * 2017-03-13 2019-10-29 西北大学 基于深度卷积神经网络的三维模型检索方法
CN109086652A (zh) * 2018-06-04 2018-12-25 平安科技(深圳)有限公司 手写字模型训练方法、汉字识别方法、装置、设备及介质
CN109063706A (zh) * 2018-06-04 2018-12-21 平安科技(深圳)有限公司 文字模型训练方法、文字识别方法、装置、设备及介质
CN108875827B (zh) * 2018-06-15 2022-04-12 拓元(广州)智慧科技有限公司 一种细粒度图像分类的方法及系统
CN108985366A (zh) * 2018-07-06 2018-12-11 武汉兰丁医学高科技有限公司 基于卷积深度网络的b超图像识别算法及系统

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190019056A1 (en) * 2015-11-30 2019-01-17 Pilot Ai Labs, Inc. System and method for improved general object detection using neural networks
CN106372656A (zh) * 2016-08-30 2017-02-01 同观科技(深圳)有限公司 获取深度一次性学习模型的方法、图像识别方法及装置
CN108256544A (zh) * 2016-12-29 2018-07-06 深圳光启合众科技有限公司 图片分类方法和装置、机器人
CN110363086A (zh) * 2019-06-11 2019-10-22 中国科学院自动化研究所南京人工智能芯片创新研究院 图数据识别方法、装置、计算机设备和存储介质
CN110378372A (zh) * 2019-06-11 2019-10-25 中国科学院自动化研究所南京人工智能芯片创新研究院 图数据识别方法、装置、计算机设备和存储介质
CN110390259A (zh) * 2019-06-11 2019-10-29 中国科学院自动化研究所南京人工智能芯片创新研究院 图数据的识别方法、装置、计算机设备和存储介质

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112560712B (zh) * 2020-12-18 2023-05-26 西安电子科技大学 基于时间增强图卷积网络的行为识别方法、装置及介质
CN112560712A (zh) * 2020-12-18 2021-03-26 西安电子科技大学 基于时间增强图卷积网络的行为识别方法、装置及介质
CN112633224A (zh) * 2020-12-30 2021-04-09 深圳云天励飞技术股份有限公司 一种社交关系识别方法、装置、电子设备及存储介质
CN112633224B (zh) * 2020-12-30 2024-03-26 深圳云天励飞技术股份有限公司 一种社交关系识别方法、装置、电子设备及存储介质
CN112965062A (zh) * 2021-02-09 2021-06-15 西安电子科技大学 基于lstm-dam网络的雷达距离像目标识别方法
CN112965062B (zh) * 2021-02-09 2024-02-20 西安电子科技大学 基于lstm-dam网络的雷达距离像目标识别方法
CN113269239B (zh) * 2021-05-13 2024-04-19 河南大学 一种基于多通道卷积神经网络的关系网络节点分类方法
CN113269239A (zh) * 2021-05-13 2021-08-17 河南大学 一种基于多通道卷积神经网络的关系网络节点分类方法
CN113239875A (zh) * 2021-06-01 2021-08-10 恒睿(重庆)人工智能技术研究院有限公司 人脸特征的获取方法、系统、装置及计算机可读存储介质
CN113239875B (zh) * 2021-06-01 2023-10-17 恒睿(重庆)人工智能技术研究院有限公司 人脸特征的获取方法、系统、装置及计算机可读存储介质
CN113468980B (zh) * 2021-06-11 2024-05-31 浙江大华技术股份有限公司 一种人体行为识别方法及相关装置
CN113468980A (zh) * 2021-06-11 2021-10-01 浙江大华技术股份有限公司 一种人体行为识别方法及相关装置
CN113741459A (zh) * 2021-09-03 2021-12-03 阿波罗智能技术(北京)有限公司 确定训练样本的方法和自动驾驶模型的训练方法、装置
CN113887575B (zh) * 2021-09-13 2024-04-05 华南理工大学 一种基于自适应图卷积网络的图数据集增强方法
CN113887575A (zh) * 2021-09-13 2022-01-04 华南理工大学 一种基于自适应图卷积网络的图数据集增强方法
CN113761771A (zh) * 2021-09-16 2021-12-07 中国人民解放军国防科技大学 多孔材料吸声性能预测方法、装置、电子设备和存储介质
CN113761771B (zh) * 2021-09-16 2024-05-28 中国人民解放军国防科技大学 多孔材料吸声性能预测方法、装置、电子设备和存储介质
CN113971319B (zh) * 2021-10-12 2023-04-18 浙江腾腾电气有限公司 配置有精度补偿的稳压器及其补偿方法
CN113971319A (zh) * 2021-10-12 2022-01-25 浙江腾腾电气有限公司 配置有精度补偿的稳压器及其补偿方法
CN114090651A (zh) * 2021-11-10 2022-02-25 哈尔滨工业大学(深圳) 基于双通道图神经网络自编码器的交通流异常数据判断方法
CN115564630A (zh) * 2022-09-28 2023-01-03 华能伊敏煤电有限责任公司 轮斗挖掘机的挖掘流量自动控制方法及其系统
CN116935363A (zh) * 2023-07-04 2023-10-24 东莞市微振科技有限公司 刀具识别方法、装置、电子设备及可读存储介质
CN116935363B (zh) * 2023-07-04 2024-02-23 东莞市微振科技有限公司 刀具识别方法、装置、电子设备及可读存储介质
CN117557244A (zh) * 2023-09-27 2024-02-13 国网江苏省电力有限公司信息通信分公司 基于知识图谱的电力运维警戒系统
CN117235584A (zh) * 2023-11-15 2023-12-15 之江实验室 图数据分类方法、装置、电子装置和存储介质
CN117235584B (zh) * 2023-11-15 2024-04-02 之江实验室 图数据分类方法、装置、电子装置和存储介质
CN117251715A (zh) * 2023-11-17 2023-12-19 华芯程(杭州)科技有限公司 版图量测区域筛选方法、装置、电子设备及存储介质
CN117251715B (zh) * 2023-11-17 2024-03-19 华芯程(杭州)科技有限公司 版图量测区域筛选方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
CN110378372A (zh) 2019-10-25

Similar Documents

Publication Publication Date Title
WO2020248581A1 (zh) 图数据识别方法、装置、计算机设备和存储介质
CN110580482B (zh) 图像分类模型训练、图像分类、个性化推荐方法及装置
CN110889325B (zh) 多任务面部动作识别模型训练和多任务面部动作识别方法
WO2017088432A1 (zh) 图像识别方法和装置
CN111126339A (zh) 手势识别方法、装置、计算机设备和存储介质
CN113343982B (zh) 多模态特征融合的实体关系提取方法、装置和设备
CN111160288A (zh) 手势关键点检测方法、装置、计算机设备和存储介质
CN110738650B (zh) 一种传染病感染识别方法、终端设备及存储介质
CN113486708A (zh) 人体姿态预估方法、模型训练方法、电子设备和存储介质
WO2021031704A1 (zh) 对象追踪方法、装置、计算机设备和存储介质
WO2023151237A1 (zh) 人脸位姿估计方法、装置、电子设备及存储介质
CN111062324A (zh) 人脸检测方法、装置、计算机设备和存储介质
JP2022542199A (ja) キーポイントの検出方法、装置、電子機器および記憶媒体
CN111507285A (zh) 人脸属性识别方法、装置、计算机设备和存储介质
Jiang et al. Consensus style centralizing auto-encoder for weak style classification
CN111709268A (zh) 一种深度图像中的基于人手结构指导的人手姿态估计方法和装置
CN113343981A (zh) 一种视觉特征增强的字符识别方法、装置和设备
US20220284990A1 (en) Method and system for predicting affinity between drug and target
CN112749723A (zh) 样本标注方法、装置、计算机设备和存储介质
CN109710924B (zh) 文本模型训练方法、文本识别方法、装置、设备及介质
CN109934926B (zh) 模型数据处理方法、装置、可读存储介质和设备
CN106778579A (zh) 一种基于累计属性的头部姿态估计方法
WO2022257433A1 (zh) 图像的特征图的处理方法及装置、存储介质、终端
CN115713769A (zh) 文本检测模型的训练方法、装置、计算机设备和存储介质
Piras et al. Transporting deformations of face emotions in the shape spaces: A comparison of different approaches

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19932932

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19932932

Country of ref document: EP

Kind code of ref document: A1