CN111709497B

CN111709497B - Information processing method and device and computer readable storage medium

Info

Publication number: CN111709497B
Application number: CN202010840896.XA
Authority: CN
Inventors: 苗书宇; 杜俊珑; 彭湃; 孙星; 郭晓威; 黄飞跃; 吴永坚; 黄小明
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-08-20
Filing date: 2020-08-20
Publication date: 2020-11-20
Anticipated expiration: 2040-08-20
Also published as: CN111709497A

Abstract

The embodiment of the application discloses an information processing method, an information processing device and a computer readable storage medium, wherein target training samples are respectively input into a first target detection model and a second target detection model to obtain first target characteristic graph information and second target characteristic graph information; performing graph convolution calculation on the first target characteristic graph information and the second target characteristic graph information respectively to obtain a first relation characteristic vector and a second relation characteristic vector between target objects; constructing a distillation loss function corresponding to the first relation characteristic vector and the second relation characteristic vector; adding the distillation loss function into the loss function of the second target detection model for joint training to obtain a second target detection model after the joint training; and carrying out object recognition on the image to be recognized based on the second target detection model after the joint training. Therefore, the learning capacity of the first target detection model for the relation between the target object identification frames is transferred to the second target detection model, and the information processing efficiency is improved.

Description

Information processing method and device and computer readable storage medium

Technical Field

The present application relates to the field of communications technologies, and in particular, to an information processing method and apparatus, and a computer-readable storage medium.

Background

With the development of computer vision, the application of Object Detection technology (Object Detection) is increasingly widespread in real scenes. Such as unmanned driving, smart traffic, smart city, etc., the core of the target detection technology is to quickly detect a target object contained in a video or image through a target detection model.

In the prior art, a target detection model with a complex network structure can be selected for target object detection, so that the target object detection method has the advantage of more accurate target object detection result, and a target detection model with a simple network structure can be selected for target object detection, so that the target object detection speed is higher.

In the research and practice process of the prior art, the inventor of the present application finds that, in the prior art, although the target detection model with a complex network structure has an accurate detection result, the detection speed is slow and occupies computing resources, and although the target detection model with a simple network structure has a fast detection speed, the detection accuracy is poor.

Disclosure of Invention

The embodiment of the application provides an information processing method, an information processing device and a computer-readable storage medium, which can improve the accuracy of target detection and further improve the information processing efficiency on the premise of ensuring the target detection speed.

In order to solve the above technical problem, an embodiment of the present application provides the following technical solutions:

an information processing method comprising:

respectively inputting the target training sample into a first target detection model and a second target detection model to obtain first target characteristic graph information and second target characteristic graph information;

performing graph convolution calculation on the first target characteristic graph information and the second target characteristic graph information respectively to obtain a first relation characteristic vector and a second relation characteristic vector between target objects;

constructing distillation loss functions corresponding to the first relation characteristic vector and the second relation characteristic vector;

adding the distillation loss function into the loss function of the second target detection model for joint training to obtain a second target detection model after the joint training;

and carrying out object recognition on the image to be recognized based on the second target detection model after the joint training.

An information processing apparatus comprising:

the input unit is used for inputting the target training samples into the first target detection model and the second target detection model respectively to obtain first target characteristic graph information and second target characteristic graph information;

the graph convolution unit is used for respectively carrying out graph convolution calculation on the first target characteristic graph information and the second target characteristic graph information to obtain a first relation characteristic vector and a second relation characteristic vector between target objects;

the construction unit is used for constructing distillation loss functions corresponding to the first relation characteristic vector and the second relation characteristic vector;

the training unit is used for adding the distillation loss function into the loss function of the second target detection model to perform joint training to obtain a second target detection model after the joint training;

and the recognition unit is used for carrying out object recognition on the image to be recognized based on the second target detection model after the joint training.

In some embodiments, the obtaining subunit is further configured to:

calculating generalized intersection and parallel ratio among the first target object identification frame characteristics in the first target characteristic diagram information to generate first distance matrix information;

and calculating the generalized intersection ratio among the second target object identification frame characteristics in the second target characteristic diagram information to generate second distance matrix information.

In some embodiments, the obtaining subunit is further configured to:

acquiring intersection ratio among the characteristics of the first target object identification frame in the first target characteristic diagram information;

acquiring a minimum external enclosure frame among the characteristics of a first target object identification frame in the first target characteristic diagram information;

calculating the absolute value of the difference value of the intersection between the minimum external connection bounding box and the first target object identification box in the first target characteristic diagram information;

obtaining the ratio of the absolute value of the difference value to the absolute value of the minimum external bounding box;

calculating a difference value between the intersection ratio and the ratio to obtain a generalized intersection ratio between the first target object identification frame characteristics in the first target characteristic diagram information, and generating first distance matrix information;

In some embodiments, the input unit includes:

the first input subunit is used for acquiring a target training sample, and inputting the target training sample into a first target detection model to obtain first characteristic diagram information;

the second input subunit is used for inputting the target training sample into a second target detection model to obtain second characteristic diagram information;

and the conversion subunit is used for performing dimension conversion on the first feature map information and the second feature map information to generate first target feature map information and second target feature map information.

In some embodiments, the conversion subunit is to:

carrying out dimension unified processing on the first feature map information and the second feature map information to obtain first feature map information and second feature map information of the same dimension;

performing dimension conversion on the first feature map information subjected to dimension unified processing to generate first target feature map information containing the quantity dimension of the first target object identification frame and the feature dimension of the first target object identification frame;

and performing dimension conversion on the second feature map information subjected to dimension unified processing to generate second target feature map information containing the dimension of the number of the second target object identification frames and the feature dimension of the second target object identification frames.

In some embodiments, the training unit is to:

multiplying the distillation loss function by a preset weight to obtain a target distillation loss function;

and performing joint training on the target distillation loss function and the loss function of the second target detection model to obtain the second target detection model after joint training.

In some embodiments, the building unit is configured to:

and calculating a loss value between the first relation characteristic vector and the second relation characteristic vector through a mean square error function, and constructing a corresponding distillation loss function.

According to the method, a target training sample is respectively input into a first target detection model and a second target detection model to obtain first target characteristic graph information and second target characteristic graph information; performing graph convolution calculation on the first target characteristic graph information and the second target characteristic graph information respectively to obtain a first relation characteristic vector and a second relation characteristic vector between target objects; constructing a distillation loss function corresponding to the first relation characteristic vector and the second relation characteristic vector; adding the distillation loss function into the loss function of the second target detection model for joint training to obtain a second target detection model after the joint training; and carrying out object recognition on the image to be recognized based on the second target detection model after the joint training. Therefore, the learning capability of the first target detection model on the relation between the target object identification frames is transferred to the second target detection model for supplementary learning, and the information processing efficiency is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic diagram of a scenario of an information processing system provided in an embodiment of the present application;

FIG. 2 is a schematic flow chart of an information processing method provided in an embodiment of the present application;

FIG. 3 is a flow chart of an information processing method according to an embodiment of the present disclosure;

fig. 4 is a schematic view of a scene of an information processing method according to an embodiment of the present application;

FIG. 5 is a schematic structural diagram of an information processing apparatus provided in an embodiment of the present application;

fig. 6 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The embodiment of the application provides an information processing method, an information processing device and a computer readable storage medium.

Referring to fig. 1, fig. 1 is a schematic view of a scenario of an information processing system according to an embodiment of the present application, including: the terminal a and the server (the information processing system may also include other terminals besides the terminal a, and the specific number of the terminals is not limited herein), the terminal a and the server may be connected through a communication network, which may include a wireless network and a wired network, wherein the wireless network includes one or more of a wireless wide area network, a wireless local area network, a wireless metropolitan area network, and a wireless personal area network. The network includes network entities such as routers, gateways, etc., which are not shown in the figure. The terminal a may perform information interaction with the server through a communication network, for example, the terminal a may send an image to be recognized, which needs to perform object recognition, to the server.

The information processing system may include an information processing apparatus, the information processing apparatus may be specifically integrated in a server, the server may be an independent physical server, may also be a server cluster or distributed system formed by a plurality of physical servers, and may also be a cloud server that provides basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, a middleware service, a domain name service, a security service, a CDN, and a big data and artificial intelligence platform. As shown in fig. 1, the server inputs a target training sample into a first target detection model and a second target detection model respectively to obtain first target feature map information and second target feature map information, performs graph convolution calculation on the first target feature map information and the second target feature map information respectively to obtain a first relation feature vector and a second relation feature vector between target objects, constructs a distillation loss function corresponding to the first relation feature vector and the second relation feature vector, adds the distillation loss function to a loss function of the second target detection model for joint training to obtain a second target detection model after the joint training, receives an image to be recognized sent by a terminal a, and performs object recognition on the image to be recognized based on the second target detection model after the joint training.

The terminal A in the information processing system can be provided with various applications required by users, such as an instant messaging application, a media application, a browser application and the like, and the terminal A can send an image to be recognized to a server for object recognition, automatic classification, image reading and the like based on the applications.

It should be noted that the scenario diagram of the information processing system shown in fig. 1 is only an example, and the information processing system and the scenario described in the embodiment of the present application are for more clearly illustrating the technical solution of the embodiment of the present application, and do not form a limitation on the technical solution provided in the embodiment of the present application.

The following are detailed below. The numbers in the following examples are not intended to limit the order of preference of the examples.

In the present embodiment, the description will be made from the perspective of an information processing apparatus, which may be specifically integrated in a computer device having a storage unit and a microprocessor mounted thereon and having an arithmetic capability, the computer device may be a server or a terminal, and the computer device is exemplified as a server in the present embodiment.

Referring to fig. 2, fig. 2 is a schematic flow chart of an information processing method according to an embodiment of the present disclosure. The information processing method includes:

in step 101, a target training sample is respectively input into a first target detection model and a second target detection model, so as to obtain first target feature map information and second target feature map information.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Computer Vision technology (CV) Computer Vision is a science for researching how to make a machine "see", and further refers to that a camera and a Computer are used to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. Computer vision technologies generally include technologies such as information processing, image Recognition, image semantic understanding, image retrieval, Optical Character Recognition (OCR), video processing, video semantic understanding, video content/behavior Recognition, three-dimensional object reconstruction, 3D technologies, virtual reality, augmented reality, synchronous positioning, map construction, and the like, and also include common biometric technologies such as face Recognition, fingerprint Recognition, and the like.

The scheme provided by the embodiment of the application relates to the technologies such as the computer vision technology of artificial intelligence and the like, and is specifically explained by the following embodiment:

it should be noted that the object detection model is capable of implementing object detection (object detection), and completes the framing of the identification detection area containing the object, and the object detection model can be mainly divided into 4 parts:

the basic convolutional networks (Conv layers) part is a convolutional neural network, such as 13 convolutional (Conv) layers +13 linear rectification function (relu) layers +4 pooling layer (Pooling) layers, and is mainly used for extracting feature map information (feature maps) in the image to be processed.

The method includes the steps of generating a Region generation network (RPN) for generating a mark candidate Region (regions), specifically, obtaining positive classification (positive) information and negative classification (negative) information through anchors (anchors) in feature map information classified by a normalization function (softmax), determining the positive classification information as the mark candidate Region, calculating border regression (bounding box regression) offset of the anchors, adjusting the mark candidate Region according to the border regression offset, obtaining a final target mark candidate Region (regions), and simultaneously removing target mark candidate regions which are too small and exceed the border, thereby realizing location frame selection of preset marks. In one embodiment, the target identification candidate region may be directly determined as the target object identification box.

And an interest pooling layer (ROI Pooling) which is used for collecting target identification candidate areas and feature map information, calculating out feature map information (generic features maps) with sizes meeting the conditions, and sending the feature map information to a subsequent layer for processing.

And the Classifier (Classifier) can comprise a full connection layer and a normalization processing layer, combines the regional feature map information through the full connection layer and the normalization processing layer, calculates a corresponding identification classification result of the regional feature map, finely adjusts the target identification candidate region according to the identification classification result, and determines the target identification candidate region after fine adjustment as an identification detection region.

The Knowledge Distillation (Knowledge Distillation) technology is combined in the embodiment of the application, dark Knowledge in a complex model (Teacher model) can be transferred to a simple model (Student model) through the Knowledge Distillation technology, the first target detection model can be the Teacher target detection model which is a model with a complex network structure, and the model parameter quantity is large, so that the recognition speed is low, but the accuracy is high. The second target detection model can be a Student model and is a model with a simple network structure, the model parameter quantity is small, and therefore the identification speed is high, but the accuracy is low.

Therefore, the first target detection model and the second target detection model in the embodiment of the application are both pre-trained target detection models, and have the capability of identifying a target object identification frame in an image, the target training sample may be composed of a large number of images, and the format of the image may be a multilingual plane (BitMaP, BMP) format or an image Interchange format (GIF), and the like.

In this embodiment of the application, the target training sample may be input into a feature extractor of a first target detection model and a feature extractor of a second target detection model to obtain first target feature map information of a Teacher model and second target feature map information of a Student model.

In an implementation methodIn the formula, the target feature map information may represent a target object identification frame in the image, the target object identification frame may be understood as a node in the image, and the target feature map information may be composed of two-dimensional vectors, for example

The B may identify the number of frames for the target object, and the C may identify the characteristics of the frames for each target object.

In some embodiments, the step of inputting the target training samples into the first target detection model and the second target detection model respectively to obtain the first target feature map information and the second target feature map information may include:

(1) acquiring a target training sample, and inputting the target training sample into a first target detection model to obtain first characteristic diagram information;

(2) inputting the target training sample into a second target detection model to obtain second characteristic diagram information;

(3) and performing dimension conversion on the first feature map information and the second feature map information to generate first target feature map information and second target feature map information.

The method comprises the steps of firstly obtaining a target training sample, inputting the target training sample into a first target detection model to obtain first feature map information of a Teacher model, and simultaneously inputting the target training sample into a second target detection model to obtain second feature map information of a Student model, wherein the target training sample can comprise 1000 images.

Because the feature dimensions of the feature map information are high and inconsistent, in order to facilitate subsequent knowledge distillation processing, dimension conversion needs to be performed on the first feature map information and the second feature map information, and the dimension conversion may include dimension reduction and dimension unification processing to generate the first target feature map information and the second target feature map information after the dimension conversion.

In some embodiments, the step of generating the first target feature map information and the second target feature map information by performing dimension conversion on the first feature map information and the second feature map information may include:

(1.1) carrying out dimension unified processing on the first feature map information and the second feature map information to obtain first feature map information and second feature map information of the same dimension;

(1.2) carrying out dimension conversion on the first feature map information subjected to dimension unified processing to generate first target feature map information containing the quantity dimension of the first target object identification frame and the feature dimension of the first target object identification frame;

and (1.3) performing dimension conversion on the second feature map information subjected to dimension unified processing to generate second target feature map information containing the dimension of the number of the second target object identification frames and the feature dimension of the second target object identification frames.

Because the feature dimensions of the first feature map information and the second feature map information are high, if the first feature map information and the second feature map information are directly calculated, the calculation amount is too large, so that the first feature map information and the second feature map information can be input into two adaptive convolution network layers to be subjected to dimension unified processing, nonlinear transformation is respectively performed, the data dimensions of the first feature map information and the second feature map information are reduced, the nonlinear information of the first feature map information and the second feature map information is increased, the first feature map information and the second feature map information of the same dimension are obtained, and subsequent knowledge distillation is conveniently constructed.

Further, in order to construct a graph structure, and calculate an object relationship between different target object identification frames, dimension conversion needs to be performed on the first feature map information after the unified processing, and the first feature map information is converted into first target feature map information including a first target object identification frame quantity dimension and a first target object identification frame feature dimension. And performing dimension conversion on the second feature map information subjected to dimension unified processing to generate second target feature map information containing the dimension of the number of the second target object identification frames and the feature dimension of the second target object identification frames. The first target feature map information and the second target feature map information may be identified as

B is a targetThe number of the body identification frames, C, is the characteristics of each target object identification frame.

In step 102, graph convolution calculation is performed on the first target characteristic graph information and the second target characteristic graph information respectively, and a first relation characteristic vector and a second relation characteristic vector between the target objects are obtained.

Wherein, the graph convolution calculation is realized by a graph convolution neural network (GCN), which has the same function as the convolution neural network and is a feature extractor, except that the graph convolution neural network can delicately extract features from the graph data (i.e. the first target feature graph information and the second target feature graph information), so that the machine can use the extracted features to perform node classification (node classification), edge prediction (link prediction) and the like on the graph data,

in the embodiment of the present application, since the recognition accuracy of the first target detection model and the recognition accuracy of the second target detection model for different target object identification frames are different, the accuracy of the object relationship between the target object identification frames in the first target feature map information is better than the accuracy of the object relationship between the target object identification frames in the second target feature map information.

Further, the first target feature map information can be input into a pre-trained graph convolutional neural network to obtain a first relation feature vector capable of representing the relation between the target objects in the first target feature map. And inputting the second target feature map information into a pre-trained graph convolution neural network to obtain a second relation feature vector capable of representing the relation between the target objects in the second target feature map, wherein the relation feature vector comprises the output of the relation between the identified target object identification frames, and the accuracy of the first relation feature vector on the relation between the target object identification frames is better than that of the second relation feature vector.

In some embodiments, the step of performing graph convolution calculation on the first target feature map information and the second target feature map information respectively to obtain a first relation feature vector and a second relation feature vector between the target objects may include:

(1) acquiring first distance matrix information between first target object identification frame characteristics in the first target characteristic diagram information and second distance matrix information between second target object identification frame characteristics in the second target characteristic diagram information;

(2) graph convolution processing is carried out on the basis of the first target feature graph information and the first distance matrix information to obtain a first relation feature vector;

(3) and performing graph convolution processing based on the second target characteristic graph information and the second distance matrix information to obtain a second relation characteristic vector.

In this case, each target feature map identification frame in the first target feature map information and the corresponding target object identification frame feature may be used as a node, and then the distance between each node in the first target feature map information may be calculated to generate the first distance matrix information, for example, so as to

That is, the first distance matrix information describes the distance information between each node. And taking each target characteristic map identification frame in the second target characteristic map information and the corresponding target object identification frame characteristic as a node, further calculating the distance between each node in the second target characteristic map information, and generating second distance matrix information.

Further, the first target feature map information of the characterization nodes and the first distance matrix information of the distances between the characterization nodes may be input into a map convolution model for map convolution processing, so as to obtain a first relation feature vector, and the first relation feature vector may represent a position relation between the first target detection model and the target object identification frame extracted by the first target detection model.

The second target feature map information of the characterization nodes and the second distance matrix information of the distances between the characterization nodes can be input into a map convolution model for map convolution processing to obtain a second relation feature vector, and the second relation feature vector can characterize the position relation between the second target detection model and the target object identification frame.

In some embodiments, the step of obtaining first distance matrix information between first target object identification frame features in the first target feature map information and second distance matrix information between second target object identification frame features in the second target feature map information may include:

(1.1) calculating Euclidean distances among the first target object identification frame characteristics in the first target characteristic diagram information to generate first distance matrix information;

and (1.2) calculating the Euclidean distance between the second target object identification frame characteristics in the second target characteristic diagram information to generate second distance matrix information.

Wherein the calculation can be based on the following calculation formula:

the

The Euclidean distance between the ith target object identification frame and the jth target object identification frame, FⁱIdentifying the characteristic information of the frame for the ith target object, F^jFor the feature information of the jth target object identification frame, the euclidean distance between the features of each first target object identification frame in the first target feature map information can be calculated through the formula, and first distance matrix information is generated.

Further, the euclidean distance between each second target object identification frame feature in the second target feature map information can be calculated through the above formula, and second distance matrix information is generated.

In step 103, a distillation loss function corresponding to the first relational feature vector and the second relational feature vector is constructed.

The recognition accuracy of the first relation feature vector to the relation between the target object identification frames is superior to that of the second relation feature vector, so that the dark knowledge of the first relation feature vector of the Teacher model needs to be transferred to the Student model, the ability of the Student model with a simple network to learn the recognition of the Teacher model to the target object identification frames in the images needs to be realized, namely, the distance between the first relation feature vector and the second relation feature vector needs to be as small as possible and as close as possible, and therefore the distillation loss function corresponding to the first relation feature vector and the second relation feature vector can be built.

In some embodiments, the step of constructing the distillation loss function corresponding to the first relational feature vector and the second relational feature vector may include: and calculating the loss value between the first relation characteristic vector and the second relation characteristic vector through a mean square error function, and constructing a corresponding distillation loss function.

The mean square error function may calculate an expected value of the square of the difference between the estimated value of the parameter and the parameter value, and may be denoted as MSE. Can be expressed by the following formula:

the L is_graphRepresents a distillation loss function of

Is a first relational feature vector, the

And calculating the loss value between the first relation characteristic vector and the second relation characteristic vector through the formula to construct a corresponding distillation loss function.

In step 104, the distillation loss function is added to the loss function of the second target detection model for joint training, so as to obtain the second target detection model after joint training.

In order to solve the above technical problems, in the embodiment of the present application, a distillation loss function may be added to the loss function of the second target detection model, and the capability of the first target detection model with a complex network structure to identify the object relationship is migrated to the first target detection model for joint training, so that the first target detection model with a simple network may be identified in combination with the relationship between objects of the complex network model, and more conforms to the relationship inference capability of the human brain to a certain extent.

For example, the mouse beside the notebook computer is likely to be greatly increased due to the existence of the notebook computer, and the relation between potential objects is learned by the first target detection model simple in network, so that the performance of the first target detection model can be greatly improved, the first target detection model simple in network is used to have a better detection effect, the first target detection model can also have a better detection effect under the advantage of rapidness, and the information processing efficiency is improved.

In some embodiments, the step of constructing the distillation loss function corresponding to the first relational feature vector and the second relational feature vector may include:

(1) multiplying the distillation loss function by a preset weight to obtain a target distillation loss function;

(2) and performing joint training on the target distillation loss function and the loss function of the second target detection model to obtain the second target detection model after joint training.

Please refer to the following formula:

the

As a loss function of the original second target detection model, L_graphRepresenting a distillation loss function, where α is a predetermined weight, for example, 0.1, multiplying the distillation loss function by the predetermined weight of 0.1 to obtain a target distillation loss function, and performing joint training on the target distillation loss function and the loss function of the second target detection model to obtain a jointThe trained second target detection model can be combined with the network complex model to identify the learning ability of the relationship between the objects, and can have a better detection effect under the advantage of high speed.

In step 105, object recognition is performed on the image to be recognized based on the jointly trained second target detection model.

The second target detection model after the joint training can be combined with the network complex model to recognize the learning capacity of the relationship between the objects, so that when the object recognition is carried out on the image to be recognized based on the second target detection model after the joint training, the second target detection model can have a better target object detection effect under the advantage of quick recognition.

As can be seen from the above, in the embodiment of the present application, the first target feature map information and the second target feature map information are obtained by inputting the target training samples into the first target detection model and the second target detection model, respectively; performing graph convolution calculation on the first target characteristic graph information and the second target characteristic graph information respectively to obtain a first relation characteristic vector and a second relation characteristic vector between target objects; constructing a distillation loss function corresponding to the first relation characteristic vector and the second relation characteristic vector; adding the distillation loss function into the loss function of the second target detection model for joint training to obtain a second target detection model after the joint training; and carrying out object recognition on the image to be recognized based on the second target detection model after the joint training. Therefore, the learning capability of the first target detection model on the relation between the target object identification frames is transferred to the second target detection model for supplementary learning, and the information processing efficiency is improved.

The method described in connection with the above embodiments will be described in further detail below by way of example.

In the present embodiment, an example will be described in which the information processing apparatus is specifically integrated in a server.

Referring to fig. 3 and fig. 4 together, fig. 3 is another schematic flow chart of the information processing method according to the embodiment of the present application, and fig. 4 is a schematic scene diagram of the information processing method according to the embodiment of the present application. The method flow can comprise the following steps:

in step 201, the server obtains a training sample, and performs cutting, expanding, and turning processing on the training sample to generate a corresponding target training sample.

The training sample obtained by the server may be a PASCAL VOC2012 training val data set and an MS COCO training val35k data set, and the training sample is composed of a plurality of images.

In order to enrich the training samples, the training samples can be cut, enlarged and turned, the training samples are expanded, more target training samples 1 than the training samples are generated, the target training samples 1 carry label information, and the label information can be used for labeling a target object identification frame.

In step 202, the server obtains a target training sample, inputs the target training sample into a first target detection model to obtain first feature map information, and inputs the target training sample into a second target detection model to obtain second feature map information.

The server obtains a target training sample 1, inputs the target training sample into a first target detection model (Teacher model) 2 and a second target detection model (Student model) 3, and obtains first feature map information

And second profile information

。

The network complexity of the first target detection model is higher than that of the second target detection model, so that the identification accuracy of the first target detection model on the target object of the target training sample 1 is higher than that of the second target detection model, and therefore the first characteristic diagram information

The feature description of the target object is better than the second feature map information

。

In step 203, the server performs dimension unification processing on the first feature map information and the second feature map information to obtain first feature map information and second feature map information of the same dimension, performs dimension conversion on the first feature map information subjected to the dimension unification processing, and generates first target feature map information including a first target object identification frame number dimension and a first target object identification frame feature dimension.

Wherein, due to the first characteristic diagram information

And second profile information

If the feature dimension of (1) is higher, if the first feature map information is directly matched

And second profile information

The calculation results in excessive calculation amount, so the first characteristic diagram information can be firstly calculated

And second profile information

Respectively inputting two self-adaptive convolution network layers, and performing nonlinear transformation to obtain first feature map information of the same dimension

And second profile information

The non-line of reducing data dimension and increasing characteristic diagram information can be reachedSex information, and information of the first characteristic diagram

And second profile information

The method has the same data dimensionality and is convenient for constructing a subsequent graph model.

It should be noted that the graph model G can be expressed as

In this embodiment, the node information may be feature information of a target object identification frame in the feature map information, and the E is an edge of the map structure, that is, distance information between the feature information of the target object identification frame, where the closer the distance information is, the more similar the distance information is, the closer the distance information is, the more dissimilar the distance information is, and the more dissimilar the distance information is.

Therefore, dimension transformation can be performed on the first feature map information subjected to dimension unified processing, current multi-dimensional information is transformed into two-dimensional information, and first target feature map information is obtained, is the node of the map structure obtained by the first target detection model identification, and can be represented as

The number of target object identification frames B is the number of target object identification frames C is the feature of each target object identification frame.

In step 204, the server performs dimension conversion on the second feature map information after the dimension unification processing, and generates second target feature map information including a second target object identification frame number dimension and a second target object identification frame feature dimension.

The server may perform dimension transformation on the second feature map information subjected to dimension unification processing, transform the current multi-dimensional information into two-dimensional information, and obtain second target feature map information, where the second target feature map information is a node of a map structure identified by the second target detection model and is also a node of the map structure identified by the second target detection modelIs shown as

In step 205, the server calculates a generalized intersection ratio between the first target object identification frame features in the first target feature map information to generate first distance matrix information, and calculates a generalized intersection ratio between the second target object identification frame features in the second target feature map information to generate second distance matrix information.

In some embodiments, the step of calculating a generalized intersection ratio between the first target object identification frame features in the first target feature map information and generating first distance matrix information may include:

(1) acquiring the intersection ratio between the characteristics of the first target object identification frame in the first target characteristic diagram information;

(2) acquiring a minimum external enclosure frame among the characteristics of the first target object identification frame in the first target characteristic diagram information;

(3) calculating the absolute value of the difference value of the intersection of the minimum external connection surrounding frame and the first target object identification frame characteristics in the first target characteristic diagram information;

(4) obtaining the ratio of the absolute value of the difference value to the absolute value of the minimum external bounding box;

(5) and calculating the difference value between the intersection ratio and the ratio to obtain the generalized intersection ratio between the first target object identification frame characteristics in the first target characteristic diagram information, and generating first distance matrix information.

Please refer to the following formula:

IoU is the cross-over ratio, M and N may be different target object identification box characteristics, which are the most common indicators in target detection

Which represents the intersection of the two boxes,

representing the union of the two boxes, please continue to refer to the following equation:

the GIoU (Generalized Intersection over Union, GIoU) is a Generalized Intersection ratio, the C is a minimum bounding box formed by two boxes, through the above calculation method, a GIoU value between every two target object identification boxes can be obtained, the GIoU value can represent a distance value between the two boxes, that is, an Intersection ratio IoU between first target object identification box features in the first target feature map can be obtained first, a minimum bounding box C between first target object identification box features in the first target feature map information is obtained, an absolute value of a difference between the minimum bounding box C and an Intersection between the first target object identification box features in the first target feature map information is calculated, a ratio of the absolute value of the difference to an absolute value of the minimum bounding box is obtained, a difference between the Intersection ratio and the ratio is calculated finally, a Generalized Intersection ratio between first target object identification box features in the first target feature map information is obtained, first distance matrix information is generated.

Similarly, the generalized intersection ratio between the second target object identification frames in the second target characteristic diagram information is calculated according to the calculation method, and second distance matrix information is generated, and the distance matrix information can be expressed as

And representing the distance information between each target object identification frame and other target object identification frames, namely the side information of the graph structure.

In step 206, the server performs graph convolution processing based on the first target feature graph information and the first distance matrix information to obtain a first relationship feature vector.

In some embodiments, the step of performing graph convolution processing based on the first target feature graph information and the first distance matrix information to obtain the first relationship feature vector may include:

(1) acquiring first distance matrix information, and summing the first distance matrix information and an identity matrix to obtain first matrix information;

(2) obtaining a diagonal node degree matrix of the first matrix information;

(3) performing symmetrical normalization processing on the first matrix information based on the first matrix information and the diagonal node degree matrix to obtain first target matrix information;

(4) and calculating the first target matrix information, the first parameter matrix and the first target characteristic graph information through the activation function to obtain a first relation characteristic vector.

Please refer to the following formula:

the P is an activation function, such as a Sigmoid function or Relu function

Can be matrix information expressed as

Where A is the first distance matrix information

Or second distance matrix information

Where I is an identity matrix and is a square matrix, the elements on the diagonal from the top left to the bottom right (called the main diagonal) are all 1, and the other elements are 0

Is composed of

The diagonal node degree matrix is a sum of dividing each row of A by row, and is used for performing corresponding regularization treatment in order to prevent overlarge output after the AH matrix is changed, wherein F is the first target characteristic diagram information

Or second object feature map information

W is a first parameter matrix or a second parameter matrix, and is similar to network parameters and can be adjusted, first distance matrix information is obtained through the formula, and the first distance matrix information is obtained

Summing with the identity matrix I to obtain first matrix information

Obtaining the first matrix information

Diagonal node degree matrix of

By passing

For the first distance matrix information

Performing a symmetrical normalization process to obtain first target matrix information, and finally performing activation function P on the first target matrix information, the first parameter matrix W and the first target characteristic diagram information

And (4) calculating to obtain a first relation feature vector (namely the Teacher relation feature vector).

In step 207, the server performs graph convolution processing based on the second target feature graph information and the second distance matrix information to obtain a second relationship feature vector.

With continuing reference to the formula in step 206, the second target feature map information and the second distance matrix information may be input into the formula for graph convolution processing to obtain a second relationship feature vector (i.e., a Student relationship feature vector), and the recognition accuracy of the first relationship feature vector on the relationship between the target object identification frames is better than that of the second relationship feature vector.

In step 208, the server calculates a loss value between the first relational feature vector and the second relational feature vector by a mean square error function, and constructs a corresponding distillation loss function.

The learning capability of the first relationship feature vector of the Teacher model for the target object identification frame needs to be transferred to the Student model, that is, the distance between the first relationship feature vector and the second relationship feature vector needs to be as small as possible, and the first relationship feature vector and the second relationship feature vector are as close as possible, so that the following formula is referred to together:

the L is_graphRepresents a distillation loss function of

Is a first relational feature vector, the

Calculating the loss value between the first relation characteristic vector and the second relation characteristic vector by the formula to construct a corresponding distillation loss function 4, wherein the distillation loss function can make the relation expression of the first relation characteristic vector and the second relation characteristic vectorThe relational expressions of the relational feature vectors are as close as possible.

In step 209, the server multiplies the distillation loss function by a preset weight to obtain a target distillation loss function, and performs joint training on the target distillation loss function and a loss function of the second target detection model to obtain a second target detection model after the joint training.

Please refer to the following formula:

the

As a loss function of the original second target detection model, L_graphAnd expressing a distillation loss function 4, wherein alpha is a preset weight, for example, 0.1, multiplying the distillation loss function by the preset weight 0.1 to obtain a target distillation loss function, and performing joint training on the target distillation loss function and the loss function of the second target detection model 3 to obtain a second target detection model after joint training, wherein the Student model 3 after joint training can be combined with the Teacher model 2 to perform recognition processing on the learning capability of the relationship between objects, and can have a better detection effect under the advantage of rapidness.

In an embodiment, the second target detection model after the joint training may be tested, the test image is input into the second target detection model after the joint training, and the joint training is stopped when the accuracy of the object recognition result is higher than the preset threshold.

In step 210, the server performs object recognition on the image to be recognized based on the jointly trained second target detection model.

The Student model 3 after the joint training can be combined with the Teacher model 2 to recognize the learning capacity of the relationship between the objects, so that when the object recognition is carried out on the image to be recognized based on the Student model 3 after the joint training, the method can have the advantage of rapid recognition and can also have a better target object detection effect.

In order to better implement the information processing method provided by the embodiment of the present application, the embodiment of the present application further provides a device based on the information processing method. The terms are the same as those in the above-described information processing method, and details of implementation may refer to the description in the method embodiment.

Referring to fig. 5, fig. 5 is a schematic structural diagram of an information processing apparatus according to an embodiment of the present disclosure, where the information processing apparatus may include an input unit 301, a graph volume unit 302, a construction unit 303, a training unit 304, a recognition unit 305, and the like.

The input unit 301 is configured to input the target training sample into the first target detection model and the second target detection model respectively to obtain first target feature map information and second target feature map information.

In some embodiments, the input unit 301 may include:

In some embodiments, the conversion subunit is configured to: carrying out dimension unified processing on the first feature map information and the second feature map information to obtain first feature map information and second feature map information of the same dimension; performing dimension conversion on the first feature map information subjected to dimension unified processing to generate first target feature map information containing the quantity dimension of the first target object identification frame and the feature dimension of the first target object identification frame; and performing dimension conversion on the second feature map information subjected to dimension unified processing to generate second target feature map information containing the dimension of the number of the second target object identification frames and the feature dimension of the second target object identification frames.

And a graph convolution unit 302, configured to perform graph convolution calculation on the first target feature map information and the second target feature map information respectively, so as to obtain a first relationship feature vector and a second relationship feature vector between the target objects.

In some embodiments, the graph convolution unit includes:

an obtaining subunit, configured to obtain first distance matrix information between first target object identification frame features in the first target feature map information and second distance matrix information between second target object identification frame features in the second target feature map information;

the first graph convolution subunit is used for performing graph convolution processing on the basis of the first target characteristic graph information and the first distance matrix information to obtain a first relation characteristic vector;

and the second graph convolution subunit is used for performing graph convolution processing on the basis of the second target characteristic graph information and the second distance matrix information to obtain a second relation characteristic vector.

In some embodiments, the obtaining subunit is configured to: calculating Euclidean distance between the first target object identification frame characteristics in the first target characteristic diagram information to generate first distance matrix information; and calculating Euclidean distance between the second target object identification frame characteristics in the second target characteristic diagram information to generate second distance matrix information.

In some embodiments, the obtaining subunit is further configured to: calculating generalized intersection ratio between the first target object identification frame characteristics in the first target characteristic diagram information to generate first distance matrix information; and calculating the generalized intersection ratio among the second target object identification frame characteristics in the second target characteristic diagram information to generate second distance matrix information.

In some embodiments, the obtaining subunit is further configured to: acquiring the intersection ratio between the characteristics of the first target object identification frame in the first target characteristic diagram information; acquiring a minimum external enclosure frame among the characteristics of the first target object identification frame in the first target characteristic diagram information; calculating the absolute value of the difference value of the intersection of the minimum external connection surrounding frame and the first target object identification frame characteristics in the first target characteristic diagram information; obtaining the ratio of the absolute value of the difference value to the absolute value of the minimum external bounding box; calculating the difference value between the intersection ratio and the ratio to obtain the generalized intersection ratio between the first target object identification frame characteristics in the first target characteristic diagram information, and generating first distance matrix information; and calculating the generalized intersection ratio among the second target object identification frame characteristics in the second target characteristic diagram information to generate second distance matrix information.

In some embodiments, the first graph convolution subunit is to: acquiring first distance matrix information, and summing the first distance matrix information and an identity matrix to obtain first matrix information; obtaining a diagonal node degree matrix of the first matrix information; performing symmetrical normalization processing on the first matrix information based on the first matrix information and the diagonal node degree matrix to obtain first target matrix information; and calculating the first target matrix information, the first parameter matrix and the first target characteristic graph information through the activation function to obtain a first relation characteristic vector.

A constructing unit 303, configured to construct a distillation loss function corresponding to the first relation feature vector and the second relation feature vector.

In some embodiments, the constructing unit 303 is configured to: and calculating the loss value between the first relation characteristic vector and the second relation characteristic vector through a mean square error function, and constructing a corresponding distillation loss function.

A training unit 304, configured to add the distillation loss function to the loss function of the second target detection model for joint training, so as to obtain a second target detection model after joint training.

In some embodiments, the training unit 304 is configured to: multiplying the distillation loss function by a preset weight to obtain a target distillation loss function; and performing joint training on the target distillation loss function and the loss function of the second target detection model to obtain the second target detection model after joint training.

And the identifying unit 305 is configured to perform object identification on the image to be identified based on the jointly trained second target detection model.

In some embodiments, the information processing apparatus further includes an expansion unit configured to: obtaining a training sample; and cutting, expanding and turning over the training samples to generate corresponding target training samples.

The specific implementation of each unit can refer to the previous embodiment, and is not described herein again.

As can be seen from the above, in the embodiment of the present application, the input unit 301 inputs the target training samples into the first target detection model and the second target detection model respectively, so as to obtain the first target feature map information and the second target feature map information; the graph convolution unit 302 performs graph convolution calculation on the first target characteristic graph information and the second target characteristic graph information respectively to obtain a first relation characteristic vector and a second relation characteristic vector between target objects; the construction unit 303 constructs a distillation loss function corresponding to the first relation feature vector and the second relation feature vector; the training unit 304 adds the distillation loss function to the loss function of the second target detection model for joint training to obtain a second target detection model after the joint training; the recognition unit 305 performs object recognition on the image to be recognized based on the jointly trained second target detection model. Therefore, the learning capability of the first target detection model on the relation between the target object identification frames is transferred to the second target detection model for supplementary learning, and the information processing efficiency is improved.

The embodiment of the present application further provides a computer device, as shown in fig. 6, which shows a schematic structural diagram of a server according to the embodiment of the present application, specifically:

the computer device may include components such as a processor 401 of one or more processing cores, memory 402 of one or more computer-readable storage media, a power supply 403, and an input unit 404. Those skilled in the art will appreciate that the computer device configuration illustrated in FIG. 6 does not constitute a limitation of computer devices, and may include more or fewer components than those illustrated, or some components may be combined, or a different arrangement of components. Wherein:

the processor 401 is a control center of the computer device, connects various parts of the entire computer device using various interfaces and lines, and performs various functions of the computer device and processes data by running or executing software programs and/or modules stored in the memory 402 and calling data stored in the memory 402, thereby monitoring the computer device as a whole. Optionally, processor 401 may include one or more processing cores; optionally, the processor 401 may integrate an application processor and a modem processor, wherein the application processor mainly handles operating systems, user interfaces, application programs, and the like, and the modem processor mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 401.

The memory 402 may be used to store software programs and modules, and the processor 401 executes various functional applications and data processing by operating the software programs and modules stored in the memory 402. The memory 402 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to the use of the server, and the like. Further, the memory 402 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 402 may also include a memory controller to provide the processor 401 access to the memory 402.

The computer device further comprises a power supply 403 for supplying power to the respective components, and optionally, the power supply 403 may be logically connected to the processor 401 through a power management system, so that functions of managing charging, discharging, power consumption, and the like are implemented through the power management system. The power supply 403 may also include any component of one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and the like.

The computer device may also include an input unit 404, which input unit 404 may be used to receive input numeric or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.

Although not shown, the computer device may further include a display unit and the like, which are not described in detail herein. Specifically, in this embodiment, the processor 401 in the computer device loads the executable file corresponding to the process of one or more application programs into the memory 402 according to the following instructions, and the processor 401 runs the application program stored in the memory 402, so as to implement the various method steps provided by the foregoing embodiments, as follows:

respectively inputting the target training sample into a first target detection model and a second target detection model to obtain first target characteristic graph information and second target characteristic graph information; performing graph convolution calculation on the first target characteristic graph information and the second target characteristic graph information respectively to obtain a first relation characteristic vector and a second relation characteristic vector between target objects; constructing a distillation loss function corresponding to the first relation characteristic vector and the second relation characteristic vector; adding the distillation loss function into the loss function of the second target detection model for joint training to obtain a second target detection model after the joint training; and carrying out object recognition on the image to be recognized based on the second target detection model after the joint training.

In the above embodiments, the descriptions of the embodiments have respective emphasis, and parts that are not described in detail in a certain embodiment may refer to the above detailed description of the information processing method, and are not described herein again.

As can be seen from the above, the computer device according to the embodiment of the present application may obtain first target feature map information and second target feature map information by inputting the target training samples into the first target detection model and the second target detection model, respectively; performing graph convolution calculation on the first target characteristic graph information and the second target characteristic graph information respectively to obtain a first relation characteristic vector and a second relation characteristic vector between target objects; constructing a distillation loss function corresponding to the first relation characteristic vector and the second relation characteristic vector; adding the distillation loss function into the loss function of the second target detection model for joint training to obtain a second target detection model after the joint training; and carrying out object recognition on the image to be recognized based on the second target detection model after the joint training. Therefore, the learning capability of the first target detection model on the relation between the target object identification frames is transferred to the second target detection model for supplementary learning, and the information processing efficiency is improved.

It will be understood by those skilled in the art that all or part of the steps of the methods of the above embodiments may be performed by instructions or by associated hardware controlled by the instructions, which may be stored in a computer readable storage medium and loaded and executed by a processor.

To this end, embodiments of the present application provide a computer-readable storage medium, in which a plurality of instructions are stored, and the instructions can be loaded by a processor to execute the steps in any one of the information processing methods provided in the embodiments of the present application. For example, the instructions may perform the steps of:

According to an aspect of the application, a computer program product or computer program is provided, comprising computer instructions, the computer instructions being stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method provided in the various alternative implementations provided by the embodiments described above.

The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.

Wherein the computer-readable storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.

Since the instructions stored in the computer-readable storage medium can execute the steps in any information processing method provided in the embodiments of the present application, the beneficial effects that can be achieved by any information processing method provided in the embodiments of the present application can be achieved, and detailed descriptions are omitted here for the details, see the foregoing embodiments.

The foregoing detailed description is directed to an information processing method, an information processing apparatus, and a computer-readable storage medium provided in the embodiments of the present application, and specific examples are applied in the present application to explain the principles and implementations of the present application, and the descriptions of the foregoing embodiments are only used to help understand the method and the core ideas of the present application; meanwhile, for those skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. An information processing method characterized by comprising:

acquiring first distance matrix information between first target object identification frame characteristics in the first target characteristic diagram information and second distance matrix information between second target object identification frame characteristics in the second target characteristic diagram information;

performing graph convolution processing on the first target characteristic graph information and the first distance matrix information to obtain a first relation characteristic vector;

performing graph convolution processing based on the second target characteristic graph information and second distance matrix information to obtain a second relation characteristic vector;

performing joint training on the target distillation loss function and the loss function of the second target detection model to obtain a second target detection model after joint training;

2. The information processing method according to claim 1, wherein the step of acquiring first distance matrix information between first target object identification frame features in the first target feature map information and second distance matrix information between second target object identification frame features in the second target feature map information includes:

calculating Euclidean distance between the first target object identification frame characteristics in the first target characteristic diagram information to generate first distance matrix information;

and calculating Euclidean distance between the characteristics of the second target object identification frame in the second target characteristic diagram information to generate second distance matrix information.

3. The information processing method according to claim 1, wherein the step of performing graph convolution processing based on the first target feature graph information and first distance matrix information to obtain a first relational feature vector includes:

acquiring first distance matrix information, and summing the first distance matrix information and an identity matrix to obtain first matrix information;

obtaining a diagonal node degree matrix of the first matrix information;

performing symmetrical normalization processing on the first matrix information based on the first matrix information and the diagonal node degree matrix to obtain first target matrix information;

and calculating the first target matrix information, the first parameter matrix and the first target characteristic graph information through the activation function to obtain a first relation characteristic vector.

4. The information processing method according to claim 1, wherein the step of acquiring first distance matrix information between first target object identification frame features in the first target feature map information and second distance matrix information between second target object identification frame features in the second target feature map information includes:

5. The information processing method according to claim 4, wherein the step of calculating the generalized intersection ratio between the first target object identification frame features in the first target feature map information and generating first distance matrix information includes:

and calculating the difference value between the intersection ratio and the ratio to obtain the generalized intersection ratio between the first target object identification frame characteristics in the first target characteristic diagram information, and generating first distance matrix information.

6. The information processing method according to any one of claims 1 to 5, wherein the step of inputting the target training samples into the first target detection model and the second target detection model, respectively, to obtain the first target feature map information and the second target feature map information includes:

acquiring a target training sample, and inputting the target training sample into a first target detection model to obtain first characteristic diagram information;

inputting the target training sample into a second target detection model to obtain second characteristic diagram information;

and performing dimension conversion on the first feature map information and the second feature map information to generate first target feature map information and second target feature map information.

7. The information processing method according to claim 6, wherein the step of performing dimension conversion on the first feature map information and the second feature map information to generate first target feature map information and second target feature map information includes:

8. The information processing method according to any one of claims 1 to 5, wherein the step of constructing the distillation loss function corresponding to the first relational feature vector and the second relational feature vector includes:

9. An information processing apparatus characterized by comprising:

the graph convolution unit includes:

the second graph convolution subunit is used for performing graph convolution processing on the basis of the second target characteristic graph information and the second distance matrix information to obtain a second relation characteristic vector;

the training unit is configured to:

10. The processing apparatus as claimed in claim 9, wherein the obtaining subunit is configured to:

11. The processing apparatus as claimed in claim 9, wherein the first graph convolution subunit is configured to:

obtaining a diagonal node degree matrix of the first matrix information;

12. A computer-readable storage medium storing a plurality of instructions adapted to be loaded by a processor to perform the steps of the information processing method according to any one of claims 1 to 8.