CN111709497B - Information processing method and device and computer readable storage medium - Google Patents

Information processing method and device and computer readable storage medium Download PDF

Info

Publication number
CN111709497B
CN111709497B CN202010840896.XA CN202010840896A CN111709497B CN 111709497 B CN111709497 B CN 111709497B CN 202010840896 A CN202010840896 A CN 202010840896A CN 111709497 B CN111709497 B CN 111709497B
Authority
CN
China
Prior art keywords
target
information
feature map
detection model
object identification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010840896.XA
Other languages
Chinese (zh)
Other versions
CN111709497A (en
Inventor
苗书宇
杜俊珑
彭湃
孙星
郭晓威
黄飞跃
吴永坚
黄小明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202010840896.XA priority Critical patent/CN111709497B/en
Publication of CN111709497A publication Critical patent/CN111709497A/en
Application granted granted Critical
Publication of CN111709497B publication Critical patent/CN111709497B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the application discloses an information processing method, an information processing device and a computer readable storage medium, wherein target training samples are respectively input into a first target detection model and a second target detection model to obtain first target characteristic graph information and second target characteristic graph information; performing graph convolution calculation on the first target characteristic graph information and the second target characteristic graph information respectively to obtain a first relation characteristic vector and a second relation characteristic vector between target objects; constructing a distillation loss function corresponding to the first relation characteristic vector and the second relation characteristic vector; adding the distillation loss function into the loss function of the second target detection model for joint training to obtain a second target detection model after the joint training; and carrying out object recognition on the image to be recognized based on the second target detection model after the joint training. Therefore, the learning capacity of the first target detection model for the relation between the target object identification frames is transferred to the second target detection model, and the information processing efficiency is improved.

Description

Information processing method and device and computer readable storage medium
Technical Field
The present application relates to the field of communications technologies, and in particular, to an information processing method and apparatus, and a computer-readable storage medium.
Background
With the development of computer vision, the application of Object Detection technology (Object Detection) is increasingly widespread in real scenes. Such as unmanned driving, smart traffic, smart city, etc., the core of the target detection technology is to quickly detect a target object contained in a video or image through a target detection model.
In the prior art, a target detection model with a complex network structure can be selected for target object detection, so that the target object detection method has the advantage of more accurate target object detection result, and a target detection model with a simple network structure can be selected for target object detection, so that the target object detection speed is higher.
In the research and practice process of the prior art, the inventor of the present application finds that, in the prior art, although the target detection model with a complex network structure has an accurate detection result, the detection speed is slow and occupies computing resources, and although the target detection model with a simple network structure has a fast detection speed, the detection accuracy is poor.
Disclosure of Invention
The embodiment of the application provides an information processing method, an information processing device and a computer-readable storage medium, which can improve the accuracy of target detection and further improve the information processing efficiency on the premise of ensuring the target detection speed.
In order to solve the above technical problem, an embodiment of the present application provides the following technical solutions:
an information processing method comprising:
respectively inputting the target training sample into a first target detection model and a second target detection model to obtain first target characteristic graph information and second target characteristic graph information;
performing graph convolution calculation on the first target characteristic graph information and the second target characteristic graph information respectively to obtain a first relation characteristic vector and a second relation characteristic vector between target objects;
constructing distillation loss functions corresponding to the first relation characteristic vector and the second relation characteristic vector;
adding the distillation loss function into the loss function of the second target detection model for joint training to obtain a second target detection model after the joint training;
and carrying out object recognition on the image to be recognized based on the second target detection model after the joint training.
An information processing apparatus comprising:
the input unit is used for inputting the target training samples into the first target detection model and the second target detection model respectively to obtain first target characteristic graph information and second target characteristic graph information;
the graph convolution unit is used for respectively carrying out graph convolution calculation on the first target characteristic graph information and the second target characteristic graph information to obtain a first relation characteristic vector and a second relation characteristic vector between target objects;
the construction unit is used for constructing distillation loss functions corresponding to the first relation characteristic vector and the second relation characteristic vector;
the training unit is used for adding the distillation loss function into the loss function of the second target detection model to perform joint training to obtain a second target detection model after the joint training;
and the recognition unit is used for carrying out object recognition on the image to be recognized based on the second target detection model after the joint training.
In some embodiments, the obtaining subunit is further configured to:
calculating generalized intersection and parallel ratio among the first target object identification frame characteristics in the first target characteristic diagram information to generate first distance matrix information;
and calculating the generalized intersection ratio among the second target object identification frame characteristics in the second target characteristic diagram information to generate second distance matrix information.
In some embodiments, the obtaining subunit is further configured to:
acquiring intersection ratio among the characteristics of the first target object identification frame in the first target characteristic diagram information;
acquiring a minimum external enclosure frame among the characteristics of a first target object identification frame in the first target characteristic diagram information;
calculating the absolute value of the difference value of the intersection between the minimum external connection bounding box and the first target object identification box in the first target characteristic diagram information;
obtaining the ratio of the absolute value of the difference value to the absolute value of the minimum external bounding box;
calculating a difference value between the intersection ratio and the ratio to obtain a generalized intersection ratio between the first target object identification frame characteristics in the first target characteristic diagram information, and generating first distance matrix information;
and calculating the generalized intersection ratio among the second target object identification frame characteristics in the second target characteristic diagram information to generate second distance matrix information.
In some embodiments, the input unit includes:
the first input subunit is used for acquiring a target training sample, and inputting the target training sample into a first target detection model to obtain first characteristic diagram information;
the second input subunit is used for inputting the target training sample into a second target detection model to obtain second characteristic diagram information;
and the conversion subunit is used for performing dimension conversion on the first feature map information and the second feature map information to generate first target feature map information and second target feature map information.
In some embodiments, the conversion subunit is to:
carrying out dimension unified processing on the first feature map information and the second feature map information to obtain first feature map information and second feature map information of the same dimension;
performing dimension conversion on the first feature map information subjected to dimension unified processing to generate first target feature map information containing the quantity dimension of the first target object identification frame and the feature dimension of the first target object identification frame;
and performing dimension conversion on the second feature map information subjected to dimension unified processing to generate second target feature map information containing the dimension of the number of the second target object identification frames and the feature dimension of the second target object identification frames.
In some embodiments, the training unit is to:
multiplying the distillation loss function by a preset weight to obtain a target distillation loss function;
and performing joint training on the target distillation loss function and the loss function of the second target detection model to obtain the second target detection model after joint training.
In some embodiments, the building unit is configured to:
and calculating a loss value between the first relation characteristic vector and the second relation characteristic vector through a mean square error function, and constructing a corresponding distillation loss function.
According to the method, a target training sample is respectively input into a first target detection model and a second target detection model to obtain first target characteristic graph information and second target characteristic graph information; performing graph convolution calculation on the first target characteristic graph information and the second target characteristic graph information respectively to obtain a first relation characteristic vector and a second relation characteristic vector between target objects; constructing a distillation loss function corresponding to the first relation characteristic vector and the second relation characteristic vector; adding the distillation loss function into the loss function of the second target detection model for joint training to obtain a second target detection model after the joint training; and carrying out object recognition on the image to be recognized based on the second target detection model after the joint training. Therefore, the learning capability of the first target detection model on the relation between the target object identification frames is transferred to the second target detection model for supplementary learning, and the information processing efficiency is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic diagram of a scenario of an information processing system provided in an embodiment of the present application;
FIG. 2 is a schematic flow chart of an information processing method provided in an embodiment of the present application;
FIG. 3 is a flow chart of an information processing method according to an embodiment of the present disclosure;
fig. 4 is a schematic view of a scene of an information processing method according to an embodiment of the present application;
FIG. 5 is a schematic structural diagram of an information processing apparatus provided in an embodiment of the present application;
fig. 6 is a schematic structural diagram of a server according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The embodiment of the application provides an information processing method, an information processing device and a computer readable storage medium.
Referring to fig. 1, fig. 1 is a schematic view of a scenario of an information processing system according to an embodiment of the present application, including: the terminal a and the server (the information processing system may also include other terminals besides the terminal a, and the specific number of the terminals is not limited herein), the terminal a and the server may be connected through a communication network, which may include a wireless network and a wired network, wherein the wireless network includes one or more of a wireless wide area network, a wireless local area network, a wireless metropolitan area network, and a wireless personal area network. The network includes network entities such as routers, gateways, etc., which are not shown in the figure. The terminal a may perform information interaction with the server through a communication network, for example, the terminal a may send an image to be recognized, which needs to perform object recognition, to the server.
The information processing system may include an information processing apparatus, the information processing apparatus may be specifically integrated in a server, the server may be an independent physical server, may also be a server cluster or distributed system formed by a plurality of physical servers, and may also be a cloud server that provides basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, a middleware service, a domain name service, a security service, a CDN, and a big data and artificial intelligence platform. As shown in fig. 1, the server inputs a target training sample into a first target detection model and a second target detection model respectively to obtain first target feature map information and second target feature map information, performs graph convolution calculation on the first target feature map information and the second target feature map information respectively to obtain a first relation feature vector and a second relation feature vector between target objects, constructs a distillation loss function corresponding to the first relation feature vector and the second relation feature vector, adds the distillation loss function to a loss function of the second target detection model for joint training to obtain a second target detection model after the joint training, receives an image to be recognized sent by a terminal a, and performs object recognition on the image to be recognized based on the second target detection model after the joint training.
The terminal A in the information processing system can be provided with various applications required by users, such as an instant messaging application, a media application, a browser application and the like, and the terminal A can send an image to be recognized to a server for object recognition, automatic classification, image reading and the like based on the applications.
It should be noted that the scenario diagram of the information processing system shown in fig. 1 is only an example, and the information processing system and the scenario described in the embodiment of the present application are for more clearly illustrating the technical solution of the embodiment of the present application, and do not form a limitation on the technical solution provided in the embodiment of the present application.
The following are detailed below. The numbers in the following examples are not intended to limit the order of preference of the examples.
In the present embodiment, the description will be made from the perspective of an information processing apparatus, which may be specifically integrated in a computer device having a storage unit and a microprocessor mounted thereon and having an arithmetic capability, the computer device may be a server or a terminal, and the computer device is exemplified as a server in the present embodiment.
Referring to fig. 2, fig. 2 is a schematic flow chart of an information processing method according to an embodiment of the present disclosure. The information processing method includes:
in step 101, a target training sample is respectively input into a first target detection model and a second target detection model, so as to obtain first target feature map information and second target feature map information.
Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
Computer Vision technology (CV) Computer Vision is a science for researching how to make a machine "see", and further refers to that a camera and a Computer are used to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. Computer vision technologies generally include technologies such as information processing, image Recognition, image semantic understanding, image retrieval, Optical Character Recognition (OCR), video processing, video semantic understanding, video content/behavior Recognition, three-dimensional object reconstruction, 3D technologies, virtual reality, augmented reality, synchronous positioning, map construction, and the like, and also include common biometric technologies such as face Recognition, fingerprint Recognition, and the like.
The scheme provided by the embodiment of the application relates to the technologies such as the computer vision technology of artificial intelligence and the like, and is specifically explained by the following embodiment:
it should be noted that the object detection model is capable of implementing object detection (object detection), and completes the framing of the identification detection area containing the object, and the object detection model can be mainly divided into 4 parts:
the basic convolutional networks (Conv layers) part is a convolutional neural network, such as 13 convolutional (Conv) layers +13 linear rectification function (relu) layers +4 pooling layer (Pooling) layers, and is mainly used for extracting feature map information (feature maps) in the image to be processed.
The method includes the steps of generating a Region generation network (RPN) for generating a mark candidate Region (regions), specifically, obtaining positive classification (positive) information and negative classification (negative) information through anchors (anchors) in feature map information classified by a normalization function (softmax), determining the positive classification information as the mark candidate Region, calculating border regression (bounding box regression) offset of the anchors, adjusting the mark candidate Region according to the border regression offset, obtaining a final target mark candidate Region (regions), and simultaneously removing target mark candidate regions which are too small and exceed the border, thereby realizing location frame selection of preset marks. In one embodiment, the target identification candidate region may be directly determined as the target object identification box.
And an interest pooling layer (ROI Pooling) which is used for collecting target identification candidate areas and feature map information, calculating out feature map information (generic features maps) with sizes meeting the conditions, and sending the feature map information to a subsequent layer for processing.
And the Classifier (Classifier) can comprise a full connection layer and a normalization processing layer, combines the regional feature map information through the full connection layer and the normalization processing layer, calculates a corresponding identification classification result of the regional feature map, finely adjusts the target identification candidate region according to the identification classification result, and determines the target identification candidate region after fine adjustment as an identification detection region.
The Knowledge Distillation (Knowledge Distillation) technology is combined in the embodiment of the application, dark Knowledge in a complex model (Teacher model) can be transferred to a simple model (Student model) through the Knowledge Distillation technology, the first target detection model can be the Teacher target detection model which is a model with a complex network structure, and the model parameter quantity is large, so that the recognition speed is low, but the accuracy is high. The second target detection model can be a Student model and is a model with a simple network structure, the model parameter quantity is small, and therefore the identification speed is high, but the accuracy is low.
Therefore, the first target detection model and the second target detection model in the embodiment of the application are both pre-trained target detection models, and have the capability of identifying a target object identification frame in an image, the target training sample may be composed of a large number of images, and the format of the image may be a multilingual plane (BitMaP, BMP) format or an image Interchange format (GIF), and the like.
In this embodiment of the application, the target training sample may be input into a feature extractor of a first target detection model and a feature extractor of a second target detection model to obtain first target feature map information of a Teacher model and second target feature map information of a Student model.
In an implementation methodIn the formula, the target feature map information may represent a target object identification frame in the image, the target object identification frame may be understood as a node in the image, and the target feature map information may be composed of two-dimensional vectors, for example
Figure 913201DEST_PATH_IMAGE001
The B may identify the number of frames for the target object, and the C may identify the characteristics of the frames for each target object.
In some embodiments, the step of inputting the target training samples into the first target detection model and the second target detection model respectively to obtain the first target feature map information and the second target feature map information may include:
(1) acquiring a target training sample, and inputting the target training sample into a first target detection model to obtain first characteristic diagram information;
(2) inputting the target training sample into a second target detection model to obtain second characteristic diagram information;
(3) and performing dimension conversion on the first feature map information and the second feature map information to generate first target feature map information and second target feature map information.
The method comprises the steps of firstly obtaining a target training sample, inputting the target training sample into a first target detection model to obtain first feature map information of a Teacher model, and simultaneously inputting the target training sample into a second target detection model to obtain second feature map information of a Student model, wherein the target training sample can comprise 1000 images.
Because the feature dimensions of the feature map information are high and inconsistent, in order to facilitate subsequent knowledge distillation processing, dimension conversion needs to be performed on the first feature map information and the second feature map information, and the dimension conversion may include dimension reduction and dimension unification processing to generate the first target feature map information and the second target feature map information after the dimension conversion.
In some embodiments, the step of generating the first target feature map information and the second target feature map information by performing dimension conversion on the first feature map information and the second feature map information may include:
(1.1) carrying out dimension unified processing on the first feature map information and the second feature map information to obtain first feature map information and second feature map information of the same dimension;
(1.2) carrying out dimension conversion on the first feature map information subjected to dimension unified processing to generate first target feature map information containing the quantity dimension of the first target object identification frame and the feature dimension of the first target object identification frame;
and (1.3) performing dimension conversion on the second feature map information subjected to dimension unified processing to generate second target feature map information containing the dimension of the number of the second target object identification frames and the feature dimension of the second target object identification frames.
Because the feature dimensions of the first feature map information and the second feature map information are high, if the first feature map information and the second feature map information are directly calculated, the calculation amount is too large, so that the first feature map information and the second feature map information can be input into two adaptive convolution network layers to be subjected to dimension unified processing, nonlinear transformation is respectively performed, the data dimensions of the first feature map information and the second feature map information are reduced, the nonlinear information of the first feature map information and the second feature map information is increased, the first feature map information and the second feature map information of the same dimension are obtained, and subsequent knowledge distillation is conveniently constructed.
Further, in order to construct a graph structure, and calculate an object relationship between different target object identification frames, dimension conversion needs to be performed on the first feature map information after the unified processing, and the first feature map information is converted into first target feature map information including a first target object identification frame quantity dimension and a first target object identification frame feature dimension. And performing dimension conversion on the second feature map information subjected to dimension unified processing to generate second target feature map information containing the dimension of the number of the second target object identification frames and the feature dimension of the second target object identification frames. The first target feature map information and the second target feature map information may be identified as
Figure 699892DEST_PATH_IMAGE001
B is a targetThe number of the body identification frames, C, is the characteristics of each target object identification frame.
In step 102, graph convolution calculation is performed on the first target characteristic graph information and the second target characteristic graph information respectively, and a first relation characteristic vector and a second relation characteristic vector between the target objects are obtained.
Wherein, the graph convolution calculation is realized by a graph convolution neural network (GCN), which has the same function as the convolution neural network and is a feature extractor, except that the graph convolution neural network can delicately extract features from the graph data (i.e. the first target feature graph information and the second target feature graph information), so that the machine can use the extracted features to perform node classification (node classification), edge prediction (link prediction) and the like on the graph data,
in the embodiment of the present application, since the recognition accuracy of the first target detection model and the recognition accuracy of the second target detection model for different target object identification frames are different, the accuracy of the object relationship between the target object identification frames in the first target feature map information is better than the accuracy of the object relationship between the target object identification frames in the second target feature map information.
Further, the first target feature map information can be input into a pre-trained graph convolutional neural network to obtain a first relation feature vector capable of representing the relation between the target objects in the first target feature map. And inputting the second target feature map information into a pre-trained graph convolution neural network to obtain a second relation feature vector capable of representing the relation between the target objects in the second target feature map, wherein the relation feature vector comprises the output of the relation between the identified target object identification frames, and the accuracy of the first relation feature vector on the relation between the target object identification frames is better than that of the second relation feature vector.
In some embodiments, the step of performing graph convolution calculation on the first target feature map information and the second target feature map information respectively to obtain a first relation feature vector and a second relation feature vector between the target objects may include:
(1) acquiring first distance matrix information between first target object identification frame characteristics in the first target characteristic diagram information and second distance matrix information between second target object identification frame characteristics in the second target characteristic diagram information;
(2) graph convolution processing is carried out on the basis of the first target feature graph information and the first distance matrix information to obtain a first relation feature vector;
(3) and performing graph convolution processing based on the second target characteristic graph information and the second distance matrix information to obtain a second relation characteristic vector.
In this case, each target feature map identification frame in the first target feature map information and the corresponding target object identification frame feature may be used as a node, and then the distance between each node in the first target feature map information may be calculated to generate the first distance matrix information, for example, so as to
Figure 418449DEST_PATH_IMAGE002
That is, the first distance matrix information describes the distance information between each node. And taking each target characteristic map identification frame in the second target characteristic map information and the corresponding target object identification frame characteristic as a node, further calculating the distance between each node in the second target characteristic map information, and generating second distance matrix information.
Further, the first target feature map information of the characterization nodes and the first distance matrix information of the distances between the characterization nodes may be input into a map convolution model for map convolution processing, so as to obtain a first relation feature vector, and the first relation feature vector may represent a position relation between the first target detection model and the target object identification frame extracted by the first target detection model.
The second target feature map information of the characterization nodes and the second distance matrix information of the distances between the characterization nodes can be input into a map convolution model for map convolution processing to obtain a second relation feature vector, and the second relation feature vector can characterize the position relation between the second target detection model and the target object identification frame.
In some embodiments, the step of obtaining first distance matrix information between first target object identification frame features in the first target feature map information and second distance matrix information between second target object identification frame features in the second target feature map information may include:
(1.1) calculating Euclidean distances among the first target object identification frame characteristics in the first target characteristic diagram information to generate first distance matrix information;
and (1.2) calculating the Euclidean distance between the second target object identification frame characteristics in the second target characteristic diagram information to generate second distance matrix information.
Wherein the calculation can be based on the following calculation formula:
Figure 869634DEST_PATH_IMAGE003
the
Figure 648234DEST_PATH_IMAGE004
The Euclidean distance between the ith target object identification frame and the jth target object identification frame, FiIdentifying the characteristic information of the frame for the ith target object, FjFor the feature information of the jth target object identification frame, the euclidean distance between the features of each first target object identification frame in the first target feature map information can be calculated through the formula, and first distance matrix information is generated.
Further, the euclidean distance between each second target object identification frame feature in the second target feature map information can be calculated through the above formula, and second distance matrix information is generated.
In step 103, a distillation loss function corresponding to the first relational feature vector and the second relational feature vector is constructed.
The recognition accuracy of the first relation feature vector to the relation between the target object identification frames is superior to that of the second relation feature vector, so that the dark knowledge of the first relation feature vector of the Teacher model needs to be transferred to the Student model, the ability of the Student model with a simple network to learn the recognition of the Teacher model to the target object identification frames in the images needs to be realized, namely, the distance between the first relation feature vector and the second relation feature vector needs to be as small as possible and as close as possible, and therefore the distillation loss function corresponding to the first relation feature vector and the second relation feature vector can be built.
In some embodiments, the step of constructing the distillation loss function corresponding to the first relational feature vector and the second relational feature vector may include: and calculating the loss value between the first relation characteristic vector and the second relation characteristic vector through a mean square error function, and constructing a corresponding distillation loss function.
The mean square error function may calculate an expected value of the square of the difference between the estimated value of the parameter and the parameter value, and may be denoted as MSE. Can be expressed by the following formula:
Figure 187800DEST_PATH_IMAGE005
the L isgraphRepresents a distillation loss function of
Figure 710048DEST_PATH_IMAGE006
Is a first relational feature vector, the
Figure 18670DEST_PATH_IMAGE007
And calculating the loss value between the first relation characteristic vector and the second relation characteristic vector through the formula to construct a corresponding distillation loss function.
In step 104, the distillation loss function is added to the loss function of the second target detection model for joint training, so as to obtain the second target detection model after joint training.
In order to solve the above technical problems, in the embodiment of the present application, a distillation loss function may be added to the loss function of the second target detection model, and the capability of the first target detection model with a complex network structure to identify the object relationship is migrated to the first target detection model for joint training, so that the first target detection model with a simple network may be identified in combination with the relationship between objects of the complex network model, and more conforms to the relationship inference capability of the human brain to a certain extent.
For example, the mouse beside the notebook computer is likely to be greatly increased due to the existence of the notebook computer, and the relation between potential objects is learned by the first target detection model simple in network, so that the performance of the first target detection model can be greatly improved, the first target detection model simple in network is used to have a better detection effect, the first target detection model can also have a better detection effect under the advantage of rapidness, and the information processing efficiency is improved.
In some embodiments, the step of constructing the distillation loss function corresponding to the first relational feature vector and the second relational feature vector may include:
(1) multiplying the distillation loss function by a preset weight to obtain a target distillation loss function;
(2) and performing joint training on the target distillation loss function and the loss function of the second target detection model to obtain the second target detection model after joint training.
Please refer to the following formula:
Figure 968171DEST_PATH_IMAGE008
the
Figure 463875DEST_PATH_IMAGE009
As a loss function of the original second target detection model, LgraphRepresenting a distillation loss function, where α is a predetermined weight, for example, 0.1, multiplying the distillation loss function by the predetermined weight of 0.1 to obtain a target distillation loss function, and performing joint training on the target distillation loss function and the loss function of the second target detection model to obtain a jointThe trained second target detection model can be combined with the network complex model to identify the learning ability of the relationship between the objects, and can have a better detection effect under the advantage of high speed.
In step 105, object recognition is performed on the image to be recognized based on the jointly trained second target detection model.
The second target detection model after the joint training can be combined with the network complex model to recognize the learning capacity of the relationship between the objects, so that when the object recognition is carried out on the image to be recognized based on the second target detection model after the joint training, the second target detection model can have a better target object detection effect under the advantage of quick recognition.
As can be seen from the above, in the embodiment of the present application, the first target feature map information and the second target feature map information are obtained by inputting the target training samples into the first target detection model and the second target detection model, respectively; performing graph convolution calculation on the first target characteristic graph information and the second target characteristic graph information respectively to obtain a first relation characteristic vector and a second relation characteristic vector between target objects; constructing a distillation loss function corresponding to the first relation characteristic vector and the second relation characteristic vector; adding the distillation loss function into the loss function of the second target detection model for joint training to obtain a second target detection model after the joint training; and carrying out object recognition on the image to be recognized based on the second target detection model after the joint training. Therefore, the learning capability of the first target detection model on the relation between the target object identification frames is transferred to the second target detection model for supplementary learning, and the information processing efficiency is improved.
The method described in connection with the above embodiments will be described in further detail below by way of example.
In the present embodiment, an example will be described in which the information processing apparatus is specifically integrated in a server.
Referring to fig. 3 and fig. 4 together, fig. 3 is another schematic flow chart of the information processing method according to the embodiment of the present application, and fig. 4 is a schematic scene diagram of the information processing method according to the embodiment of the present application. The method flow can comprise the following steps:
in step 201, the server obtains a training sample, and performs cutting, expanding, and turning processing on the training sample to generate a corresponding target training sample.
The training sample obtained by the server may be a PASCAL VOC2012 training val data set and an MS COCO training val35k data set, and the training sample is composed of a plurality of images.
In order to enrich the training samples, the training samples can be cut, enlarged and turned, the training samples are expanded, more target training samples 1 than the training samples are generated, the target training samples 1 carry label information, and the label information can be used for labeling a target object identification frame.
In step 202, the server obtains a target training sample, inputs the target training sample into a first target detection model to obtain first feature map information, and inputs the target training sample into a second target detection model to obtain second feature map information.
The server obtains a target training sample 1, inputs the target training sample into a first target detection model (Teacher model) 2 and a second target detection model (Student model) 3, and obtains first feature map information
Figure 789814DEST_PATH_IMAGE010
And second profile information
Figure 952942DEST_PATH_IMAGE011
The network complexity of the first target detection model is higher than that of the second target detection model, so that the identification accuracy of the first target detection model on the target object of the target training sample 1 is higher than that of the second target detection model, and therefore the first characteristic diagram information
Figure 73345DEST_PATH_IMAGE010
The feature description of the target object is better than the second feature map information
Figure 587503DEST_PATH_IMAGE011
In step 203, the server performs dimension unification processing on the first feature map information and the second feature map information to obtain first feature map information and second feature map information of the same dimension, performs dimension conversion on the first feature map information subjected to the dimension unification processing, and generates first target feature map information including a first target object identification frame number dimension and a first target object identification frame feature dimension.
Wherein, due to the first characteristic diagram information
Figure 451554DEST_PATH_IMAGE010
And second profile information
Figure 469188DEST_PATH_IMAGE011
If the feature dimension of (1) is higher, if the first feature map information is directly matched
Figure 26071DEST_PATH_IMAGE010
And second profile information
Figure 761946DEST_PATH_IMAGE011
The calculation results in excessive calculation amount, so the first characteristic diagram information can be firstly calculated
Figure 429688DEST_PATH_IMAGE010
And second profile information
Figure 33320DEST_PATH_IMAGE011
Respectively inputting two self-adaptive convolution network layers, and performing nonlinear transformation to obtain first feature map information of the same dimension
Figure 761105DEST_PATH_IMAGE010
And second profile information
Figure 984276DEST_PATH_IMAGE011
The non-line of reducing data dimension and increasing characteristic diagram information can be reachedSex information, and information of the first characteristic diagram
Figure 190129DEST_PATH_IMAGE010
And second profile information
Figure 182356DEST_PATH_IMAGE011
The method has the same data dimensionality and is convenient for constructing a subsequent graph model.
It should be noted that the graph model G can be expressed as
Figure 81042DEST_PATH_IMAGE012
In this embodiment, the node information may be feature information of a target object identification frame in the feature map information, and the E is an edge of the map structure, that is, distance information between the feature information of the target object identification frame, where the closer the distance information is, the more similar the distance information is, the closer the distance information is, the more dissimilar the distance information is, and the more dissimilar the distance information is.
Therefore, dimension transformation can be performed on the first feature map information subjected to dimension unified processing, current multi-dimensional information is transformed into two-dimensional information, and first target feature map information is obtained, is the node of the map structure obtained by the first target detection model identification, and can be represented as
Figure 525930DEST_PATH_IMAGE013
The number of target object identification frames B is the number of target object identification frames C is the feature of each target object identification frame.
In step 204, the server performs dimension conversion on the second feature map information after the dimension unification processing, and generates second target feature map information including a second target object identification frame number dimension and a second target object identification frame feature dimension.
The server may perform dimension transformation on the second feature map information subjected to dimension unification processing, transform the current multi-dimensional information into two-dimensional information, and obtain second target feature map information, where the second target feature map information is a node of a map structure identified by the second target detection model and is also a node of the map structure identified by the second target detection modelIs shown as
Figure 535474DEST_PATH_IMAGE013
The number of target object identification frames B is the number of target object identification frames C is the feature of each target object identification frame.
In step 205, the server calculates a generalized intersection ratio between the first target object identification frame features in the first target feature map information to generate first distance matrix information, and calculates a generalized intersection ratio between the second target object identification frame features in the second target feature map information to generate second distance matrix information.
In some embodiments, the step of calculating a generalized intersection ratio between the first target object identification frame features in the first target feature map information and generating first distance matrix information may include:
(1) acquiring the intersection ratio between the characteristics of the first target object identification frame in the first target characteristic diagram information;
(2) acquiring a minimum external enclosure frame among the characteristics of the first target object identification frame in the first target characteristic diagram information;
(3) calculating the absolute value of the difference value of the intersection of the minimum external connection surrounding frame and the first target object identification frame characteristics in the first target characteristic diagram information;
(4) obtaining the ratio of the absolute value of the difference value to the absolute value of the minimum external bounding box;
(5) and calculating the difference value between the intersection ratio and the ratio to obtain the generalized intersection ratio between the first target object identification frame characteristics in the first target characteristic diagram information, and generating first distance matrix information.
Please refer to the following formula:
Figure 647786DEST_PATH_IMAGE014
IoU is the cross-over ratio, M and N may be different target object identification box characteristics, which are the most common indicators in target detection
Figure 451794DEST_PATH_IMAGE015
Which represents the intersection of the two boxes,
Figure 649557DEST_PATH_IMAGE016
representing the union of the two boxes, please continue to refer to the following equation:
Figure 197213DEST_PATH_IMAGE017
the GIoU (Generalized Intersection over Union, GIoU) is a Generalized Intersection ratio, the C is a minimum bounding box formed by two boxes, through the above calculation method, a GIoU value between every two target object identification boxes can be obtained, the GIoU value can represent a distance value between the two boxes, that is, an Intersection ratio IoU between first target object identification box features in the first target feature map can be obtained first, a minimum bounding box C between first target object identification box features in the first target feature map information is obtained, an absolute value of a difference between the minimum bounding box C and an Intersection between the first target object identification box features in the first target feature map information is calculated, a ratio of the absolute value of the difference to an absolute value of the minimum bounding box is obtained, a difference between the Intersection ratio and the ratio is calculated finally, a Generalized Intersection ratio between first target object identification box features in the first target feature map information is obtained, first distance matrix information is generated.
Similarly, the generalized intersection ratio between the second target object identification frames in the second target characteristic diagram information is calculated according to the calculation method, and second distance matrix information is generated, and the distance matrix information can be expressed as
Figure 164032DEST_PATH_IMAGE018
And representing the distance information between each target object identification frame and other target object identification frames, namely the side information of the graph structure.
In step 206, the server performs graph convolution processing based on the first target feature graph information and the first distance matrix information to obtain a first relationship feature vector.
In some embodiments, the step of performing graph convolution processing based on the first target feature graph information and the first distance matrix information to obtain the first relationship feature vector may include:
(1) acquiring first distance matrix information, and summing the first distance matrix information and an identity matrix to obtain first matrix information;
(2) obtaining a diagonal node degree matrix of the first matrix information;
(3) performing symmetrical normalization processing on the first matrix information based on the first matrix information and the diagonal node degree matrix to obtain first target matrix information;
(4) and calculating the first target matrix information, the first parameter matrix and the first target characteristic graph information through the activation function to obtain a first relation characteristic vector.
Please refer to the following formula:
Figure 138942DEST_PATH_IMAGE019
the P is an activation function, such as a Sigmoid function or Relu function
Figure 824001DEST_PATH_IMAGE020
Can be matrix information expressed as
Figure 375680DEST_PATH_IMAGE021
Where A is the first distance matrix information
Figure 197006DEST_PATH_IMAGE022
Or second distance matrix information
Figure 608396DEST_PATH_IMAGE022
Where I is an identity matrix and is a square matrix, the elements on the diagonal from the top left to the bottom right (called the main diagonal) are all 1, and the other elements are 0
Figure 515172DEST_PATH_IMAGE023
Is composed of
Figure 404630DEST_PATH_IMAGE024
The diagonal node degree matrix is a sum of dividing each row of A by row, and is used for performing corresponding regularization treatment in order to prevent overlarge output after the AH matrix is changed, wherein F is the first target characteristic diagram information
Figure 346042DEST_PATH_IMAGE025
Or second object feature map information
Figure 662753DEST_PATH_IMAGE025
W is a first parameter matrix or a second parameter matrix, and is similar to network parameters and can be adjusted, first distance matrix information is obtained through the formula, and the first distance matrix information is obtained
Figure 56826DEST_PATH_IMAGE022
Summing with the identity matrix I to obtain first matrix information
Figure 484396DEST_PATH_IMAGE020
Obtaining the first matrix information
Figure 280314DEST_PATH_IMAGE020
Diagonal node degree matrix of
Figure 33506DEST_PATH_IMAGE023
By passing
Figure 914874DEST_PATH_IMAGE026
For the first distance matrix information
Figure 146136DEST_PATH_IMAGE022
Performing a symmetrical normalization process to obtain first target matrix information, and finally performing activation function P on the first target matrix information, the first parameter matrix W and the first target characteristic diagram information
Figure 796560DEST_PATH_IMAGE025
And (4) calculating to obtain a first relation feature vector (namely the Teacher relation feature vector).
In step 207, the server performs graph convolution processing based on the second target feature graph information and the second distance matrix information to obtain a second relationship feature vector.
With continuing reference to the formula in step 206, the second target feature map information and the second distance matrix information may be input into the formula for graph convolution processing to obtain a second relationship feature vector (i.e., a Student relationship feature vector), and the recognition accuracy of the first relationship feature vector on the relationship between the target object identification frames is better than that of the second relationship feature vector.
In step 208, the server calculates a loss value between the first relational feature vector and the second relational feature vector by a mean square error function, and constructs a corresponding distillation loss function.
The learning capability of the first relationship feature vector of the Teacher model for the target object identification frame needs to be transferred to the Student model, that is, the distance between the first relationship feature vector and the second relationship feature vector needs to be as small as possible, and the first relationship feature vector and the second relationship feature vector are as close as possible, so that the following formula is referred to together:
Figure 452145DEST_PATH_IMAGE027
the L isgraphRepresents a distillation loss function of
Figure 820809DEST_PATH_IMAGE028
Is a first relational feature vector, the
Figure 855761DEST_PATH_IMAGE029
Calculating the loss value between the first relation characteristic vector and the second relation characteristic vector by the formula to construct a corresponding distillation loss function 4, wherein the distillation loss function can make the relation expression of the first relation characteristic vector and the second relation characteristic vectorThe relational expressions of the relational feature vectors are as close as possible.
In step 209, the server multiplies the distillation loss function by a preset weight to obtain a target distillation loss function, and performs joint training on the target distillation loss function and a loss function of the second target detection model to obtain a second target detection model after the joint training.
Please refer to the following formula:
Figure 360692DEST_PATH_IMAGE030
the
Figure 455687DEST_PATH_IMAGE031
As a loss function of the original second target detection model, LgraphAnd expressing a distillation loss function 4, wherein alpha is a preset weight, for example, 0.1, multiplying the distillation loss function by the preset weight 0.1 to obtain a target distillation loss function, and performing joint training on the target distillation loss function and the loss function of the second target detection model 3 to obtain a second target detection model after joint training, wherein the Student model 3 after joint training can be combined with the Teacher model 2 to perform recognition processing on the learning capability of the relationship between objects, and can have a better detection effect under the advantage of rapidness.
In an embodiment, the second target detection model after the joint training may be tested, the test image is input into the second target detection model after the joint training, and the joint training is stopped when the accuracy of the object recognition result is higher than the preset threshold.
In step 210, the server performs object recognition on the image to be recognized based on the jointly trained second target detection model.
The Student model 3 after the joint training can be combined with the Teacher model 2 to recognize the learning capacity of the relationship between the objects, so that when the object recognition is carried out on the image to be recognized based on the Student model 3 after the joint training, the method can have the advantage of rapid recognition and can also have a better target object detection effect.
As can be seen from the above, in the embodiment of the present application, the first target feature map information and the second target feature map information are obtained by inputting the target training samples into the first target detection model and the second target detection model, respectively; performing graph convolution calculation on the first target characteristic graph information and the second target characteristic graph information respectively to obtain a first relation characteristic vector and a second relation characteristic vector between target objects; constructing a distillation loss function corresponding to the first relation characteristic vector and the second relation characteristic vector; adding the distillation loss function into the loss function of the second target detection model for joint training to obtain a second target detection model after the joint training; and carrying out object recognition on the image to be recognized based on the second target detection model after the joint training. Therefore, the learning capability of the first target detection model on the relation between the target object identification frames is transferred to the second target detection model for supplementary learning, and the information processing efficiency is improved.
In order to better implement the information processing method provided by the embodiment of the present application, the embodiment of the present application further provides a device based on the information processing method. The terms are the same as those in the above-described information processing method, and details of implementation may refer to the description in the method embodiment.
Referring to fig. 5, fig. 5 is a schematic structural diagram of an information processing apparatus according to an embodiment of the present disclosure, where the information processing apparatus may include an input unit 301, a graph volume unit 302, a construction unit 303, a training unit 304, a recognition unit 305, and the like.
The input unit 301 is configured to input the target training sample into the first target detection model and the second target detection model respectively to obtain first target feature map information and second target feature map information.
In some embodiments, the input unit 301 may include:
the first input subunit is used for acquiring a target training sample, and inputting the target training sample into a first target detection model to obtain first characteristic diagram information;
the second input subunit is used for inputting the target training sample into a second target detection model to obtain second characteristic diagram information;
and the conversion subunit is used for performing dimension conversion on the first feature map information and the second feature map information to generate first target feature map information and second target feature map information.
In some embodiments, the conversion subunit is configured to: carrying out dimension unified processing on the first feature map information and the second feature map information to obtain first feature map information and second feature map information of the same dimension; performing dimension conversion on the first feature map information subjected to dimension unified processing to generate first target feature map information containing the quantity dimension of the first target object identification frame and the feature dimension of the first target object identification frame; and performing dimension conversion on the second feature map information subjected to dimension unified processing to generate second target feature map information containing the dimension of the number of the second target object identification frames and the feature dimension of the second target object identification frames.
And a graph convolution unit 302, configured to perform graph convolution calculation on the first target feature map information and the second target feature map information respectively, so as to obtain a first relationship feature vector and a second relationship feature vector between the target objects.
In some embodiments, the graph convolution unit includes:
an obtaining subunit, configured to obtain first distance matrix information between first target object identification frame features in the first target feature map information and second distance matrix information between second target object identification frame features in the second target feature map information;
the first graph convolution subunit is used for performing graph convolution processing on the basis of the first target characteristic graph information and the first distance matrix information to obtain a first relation characteristic vector;
and the second graph convolution subunit is used for performing graph convolution processing on the basis of the second target characteristic graph information and the second distance matrix information to obtain a second relation characteristic vector.
In some embodiments, the obtaining subunit is configured to: calculating Euclidean distance between the first target object identification frame characteristics in the first target characteristic diagram information to generate first distance matrix information; and calculating Euclidean distance between the second target object identification frame characteristics in the second target characteristic diagram information to generate second distance matrix information.
In some embodiments, the obtaining subunit is further configured to: calculating generalized intersection ratio between the first target object identification frame characteristics in the first target characteristic diagram information to generate first distance matrix information; and calculating the generalized intersection ratio among the second target object identification frame characteristics in the second target characteristic diagram information to generate second distance matrix information.
In some embodiments, the obtaining subunit is further configured to: acquiring the intersection ratio between the characteristics of the first target object identification frame in the first target characteristic diagram information; acquiring a minimum external enclosure frame among the characteristics of the first target object identification frame in the first target characteristic diagram information; calculating the absolute value of the difference value of the intersection of the minimum external connection surrounding frame and the first target object identification frame characteristics in the first target characteristic diagram information; obtaining the ratio of the absolute value of the difference value to the absolute value of the minimum external bounding box; calculating the difference value between the intersection ratio and the ratio to obtain the generalized intersection ratio between the first target object identification frame characteristics in the first target characteristic diagram information, and generating first distance matrix information; and calculating the generalized intersection ratio among the second target object identification frame characteristics in the second target characteristic diagram information to generate second distance matrix information.
In some embodiments, the first graph convolution subunit is to: acquiring first distance matrix information, and summing the first distance matrix information and an identity matrix to obtain first matrix information; obtaining a diagonal node degree matrix of the first matrix information; performing symmetrical normalization processing on the first matrix information based on the first matrix information and the diagonal node degree matrix to obtain first target matrix information; and calculating the first target matrix information, the first parameter matrix and the first target characteristic graph information through the activation function to obtain a first relation characteristic vector.
A constructing unit 303, configured to construct a distillation loss function corresponding to the first relation feature vector and the second relation feature vector.
In some embodiments, the constructing unit 303 is configured to: and calculating the loss value between the first relation characteristic vector and the second relation characteristic vector through a mean square error function, and constructing a corresponding distillation loss function.
A training unit 304, configured to add the distillation loss function to the loss function of the second target detection model for joint training, so as to obtain a second target detection model after joint training.
In some embodiments, the training unit 304 is configured to: multiplying the distillation loss function by a preset weight to obtain a target distillation loss function; and performing joint training on the target distillation loss function and the loss function of the second target detection model to obtain the second target detection model after joint training.
And the identifying unit 305 is configured to perform object identification on the image to be identified based on the jointly trained second target detection model.
In some embodiments, the information processing apparatus further includes an expansion unit configured to: obtaining a training sample; and cutting, expanding and turning over the training samples to generate corresponding target training samples.
The specific implementation of each unit can refer to the previous embodiment, and is not described herein again.
As can be seen from the above, in the embodiment of the present application, the input unit 301 inputs the target training samples into the first target detection model and the second target detection model respectively, so as to obtain the first target feature map information and the second target feature map information; the graph convolution unit 302 performs graph convolution calculation on the first target characteristic graph information and the second target characteristic graph information respectively to obtain a first relation characteristic vector and a second relation characteristic vector between target objects; the construction unit 303 constructs a distillation loss function corresponding to the first relation feature vector and the second relation feature vector; the training unit 304 adds the distillation loss function to the loss function of the second target detection model for joint training to obtain a second target detection model after the joint training; the recognition unit 305 performs object recognition on the image to be recognized based on the jointly trained second target detection model. Therefore, the learning capability of the first target detection model on the relation between the target object identification frames is transferred to the second target detection model for supplementary learning, and the information processing efficiency is improved.
The embodiment of the present application further provides a computer device, as shown in fig. 6, which shows a schematic structural diagram of a server according to the embodiment of the present application, specifically:
the computer device may include components such as a processor 401 of one or more processing cores, memory 402 of one or more computer-readable storage media, a power supply 403, and an input unit 404. Those skilled in the art will appreciate that the computer device configuration illustrated in FIG. 6 does not constitute a limitation of computer devices, and may include more or fewer components than those illustrated, or some components may be combined, or a different arrangement of components. Wherein:
the processor 401 is a control center of the computer device, connects various parts of the entire computer device using various interfaces and lines, and performs various functions of the computer device and processes data by running or executing software programs and/or modules stored in the memory 402 and calling data stored in the memory 402, thereby monitoring the computer device as a whole. Optionally, processor 401 may include one or more processing cores; optionally, the processor 401 may integrate an application processor and a modem processor, wherein the application processor mainly handles operating systems, user interfaces, application programs, and the like, and the modem processor mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 401.
The memory 402 may be used to store software programs and modules, and the processor 401 executes various functional applications and data processing by operating the software programs and modules stored in the memory 402. The memory 402 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to the use of the server, and the like. Further, the memory 402 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 402 may also include a memory controller to provide the processor 401 access to the memory 402.
The computer device further comprises a power supply 403 for supplying power to the respective components, and optionally, the power supply 403 may be logically connected to the processor 401 through a power management system, so that functions of managing charging, discharging, power consumption, and the like are implemented through the power management system. The power supply 403 may also include any component of one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and the like.
The computer device may also include an input unit 404, which input unit 404 may be used to receive input numeric or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.
Although not shown, the computer device may further include a display unit and the like, which are not described in detail herein. Specifically, in this embodiment, the processor 401 in the computer device loads the executable file corresponding to the process of one or more application programs into the memory 402 according to the following instructions, and the processor 401 runs the application program stored in the memory 402, so as to implement the various method steps provided by the foregoing embodiments, as follows:
respectively inputting the target training sample into a first target detection model and a second target detection model to obtain first target characteristic graph information and second target characteristic graph information; performing graph convolution calculation on the first target characteristic graph information and the second target characteristic graph information respectively to obtain a first relation characteristic vector and a second relation characteristic vector between target objects; constructing a distillation loss function corresponding to the first relation characteristic vector and the second relation characteristic vector; adding the distillation loss function into the loss function of the second target detection model for joint training to obtain a second target detection model after the joint training; and carrying out object recognition on the image to be recognized based on the second target detection model after the joint training.
In the above embodiments, the descriptions of the embodiments have respective emphasis, and parts that are not described in detail in a certain embodiment may refer to the above detailed description of the information processing method, and are not described herein again.
As can be seen from the above, the computer device according to the embodiment of the present application may obtain first target feature map information and second target feature map information by inputting the target training samples into the first target detection model and the second target detection model, respectively; performing graph convolution calculation on the first target characteristic graph information and the second target characteristic graph information respectively to obtain a first relation characteristic vector and a second relation characteristic vector between target objects; constructing a distillation loss function corresponding to the first relation characteristic vector and the second relation characteristic vector; adding the distillation loss function into the loss function of the second target detection model for joint training to obtain a second target detection model after the joint training; and carrying out object recognition on the image to be recognized based on the second target detection model after the joint training. Therefore, the learning capability of the first target detection model on the relation between the target object identification frames is transferred to the second target detection model for supplementary learning, and the information processing efficiency is improved.
It will be understood by those skilled in the art that all or part of the steps of the methods of the above embodiments may be performed by instructions or by associated hardware controlled by the instructions, which may be stored in a computer readable storage medium and loaded and executed by a processor.
To this end, embodiments of the present application provide a computer-readable storage medium, in which a plurality of instructions are stored, and the instructions can be loaded by a processor to execute the steps in any one of the information processing methods provided in the embodiments of the present application. For example, the instructions may perform the steps of:
respectively inputting the target training sample into a first target detection model and a second target detection model to obtain first target characteristic graph information and second target characteristic graph information; performing graph convolution calculation on the first target characteristic graph information and the second target characteristic graph information respectively to obtain a first relation characteristic vector and a second relation characteristic vector between target objects; constructing a distillation loss function corresponding to the first relation characteristic vector and the second relation characteristic vector; adding the distillation loss function into the loss function of the second target detection model for joint training to obtain a second target detection model after the joint training; and carrying out object recognition on the image to be recognized based on the second target detection model after the joint training.
According to an aspect of the application, a computer program product or computer program is provided, comprising computer instructions, the computer instructions being stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method provided in the various alternative implementations provided by the embodiments described above.
The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.
Wherein the computer-readable storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.
Since the instructions stored in the computer-readable storage medium can execute the steps in any information processing method provided in the embodiments of the present application, the beneficial effects that can be achieved by any information processing method provided in the embodiments of the present application can be achieved, and detailed descriptions are omitted here for the details, see the foregoing embodiments.
The foregoing detailed description is directed to an information processing method, an information processing apparatus, and a computer-readable storage medium provided in the embodiments of the present application, and specific examples are applied in the present application to explain the principles and implementations of the present application, and the descriptions of the foregoing embodiments are only used to help understand the method and the core ideas of the present application; meanwhile, for those skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (12)

1. An information processing method characterized by comprising:
respectively inputting the target training sample into a first target detection model and a second target detection model to obtain first target characteristic graph information and second target characteristic graph information;
acquiring first distance matrix information between first target object identification frame characteristics in the first target characteristic diagram information and second distance matrix information between second target object identification frame characteristics in the second target characteristic diagram information;
performing graph convolution processing on the first target characteristic graph information and the first distance matrix information to obtain a first relation characteristic vector;
performing graph convolution processing based on the second target characteristic graph information and second distance matrix information to obtain a second relation characteristic vector;
constructing distillation loss functions corresponding to the first relation characteristic vector and the second relation characteristic vector;
multiplying the distillation loss function by a preset weight to obtain a target distillation loss function;
performing joint training on the target distillation loss function and the loss function of the second target detection model to obtain a second target detection model after joint training;
and carrying out object recognition on the image to be recognized based on the second target detection model after the joint training.
2. The information processing method according to claim 1, wherein the step of acquiring first distance matrix information between first target object identification frame features in the first target feature map information and second distance matrix information between second target object identification frame features in the second target feature map information includes:
calculating Euclidean distance between the first target object identification frame characteristics in the first target characteristic diagram information to generate first distance matrix information;
and calculating Euclidean distance between the characteristics of the second target object identification frame in the second target characteristic diagram information to generate second distance matrix information.
3. The information processing method according to claim 1, wherein the step of performing graph convolution processing based on the first target feature graph information and first distance matrix information to obtain a first relational feature vector includes:
acquiring first distance matrix information, and summing the first distance matrix information and an identity matrix to obtain first matrix information;
obtaining a diagonal node degree matrix of the first matrix information;
performing symmetrical normalization processing on the first matrix information based on the first matrix information and the diagonal node degree matrix to obtain first target matrix information;
and calculating the first target matrix information, the first parameter matrix and the first target characteristic graph information through the activation function to obtain a first relation characteristic vector.
4. The information processing method according to claim 1, wherein the step of acquiring first distance matrix information between first target object identification frame features in the first target feature map information and second distance matrix information between second target object identification frame features in the second target feature map information includes:
calculating generalized intersection and parallel ratio among the first target object identification frame characteristics in the first target characteristic diagram information to generate first distance matrix information;
and calculating the generalized intersection ratio among the second target object identification frame characteristics in the second target characteristic diagram information to generate second distance matrix information.
5. The information processing method according to claim 4, wherein the step of calculating the generalized intersection ratio between the first target object identification frame features in the first target feature map information and generating first distance matrix information includes:
acquiring intersection ratio among the characteristics of the first target object identification frame in the first target characteristic diagram information;
acquiring a minimum external enclosure frame among the characteristics of a first target object identification frame in the first target characteristic diagram information;
calculating the absolute value of the difference value of the intersection between the minimum external connection bounding box and the first target object identification box in the first target characteristic diagram information;
obtaining the ratio of the absolute value of the difference value to the absolute value of the minimum external bounding box;
and calculating the difference value between the intersection ratio and the ratio to obtain the generalized intersection ratio between the first target object identification frame characteristics in the first target characteristic diagram information, and generating first distance matrix information.
6. The information processing method according to any one of claims 1 to 5, wherein the step of inputting the target training samples into the first target detection model and the second target detection model, respectively, to obtain the first target feature map information and the second target feature map information includes:
acquiring a target training sample, and inputting the target training sample into a first target detection model to obtain first characteristic diagram information;
inputting the target training sample into a second target detection model to obtain second characteristic diagram information;
and performing dimension conversion on the first feature map information and the second feature map information to generate first target feature map information and second target feature map information.
7. The information processing method according to claim 6, wherein the step of performing dimension conversion on the first feature map information and the second feature map information to generate first target feature map information and second target feature map information includes:
carrying out dimension unified processing on the first feature map information and the second feature map information to obtain first feature map information and second feature map information of the same dimension;
performing dimension conversion on the first feature map information subjected to dimension unified processing to generate first target feature map information containing the quantity dimension of the first target object identification frame and the feature dimension of the first target object identification frame;
and performing dimension conversion on the second feature map information subjected to dimension unified processing to generate second target feature map information containing the dimension of the number of the second target object identification frames and the feature dimension of the second target object identification frames.
8. The information processing method according to any one of claims 1 to 5, wherein the step of constructing the distillation loss function corresponding to the first relational feature vector and the second relational feature vector includes:
and calculating a loss value between the first relation characteristic vector and the second relation characteristic vector through a mean square error function, and constructing a corresponding distillation loss function.
9. An information processing apparatus characterized by comprising:
the input unit is used for inputting the target training samples into the first target detection model and the second target detection model respectively to obtain first target characteristic graph information and second target characteristic graph information;
the graph convolution unit is used for respectively carrying out graph convolution calculation on the first target characteristic graph information and the second target characteristic graph information to obtain a first relation characteristic vector and a second relation characteristic vector between target objects;
the graph convolution unit includes:
an obtaining subunit, configured to obtain first distance matrix information between first target object identification frame features in the first target feature map information and second distance matrix information between second target object identification frame features in the second target feature map information;
the first graph convolution subunit is used for performing graph convolution processing on the basis of the first target characteristic graph information and the first distance matrix information to obtain a first relation characteristic vector;
the second graph convolution subunit is used for performing graph convolution processing on the basis of the second target characteristic graph information and the second distance matrix information to obtain a second relation characteristic vector;
the construction unit is used for constructing distillation loss functions corresponding to the first relation characteristic vector and the second relation characteristic vector;
the training unit is used for adding the distillation loss function into the loss function of the second target detection model to perform joint training to obtain a second target detection model after the joint training;
the training unit is configured to:
multiplying the distillation loss function by a preset weight to obtain a target distillation loss function;
performing joint training on the target distillation loss function and the loss function of the second target detection model to obtain a second target detection model after joint training;
and the recognition unit is used for carrying out object recognition on the image to be recognized based on the second target detection model after the joint training.
10. The processing apparatus as claimed in claim 9, wherein the obtaining subunit is configured to:
calculating Euclidean distance between the first target object identification frame characteristics in the first target characteristic diagram information to generate first distance matrix information;
and calculating Euclidean distance between the characteristics of the second target object identification frame in the second target characteristic diagram information to generate second distance matrix information.
11. The processing apparatus as claimed in claim 9, wherein the first graph convolution subunit is configured to:
acquiring first distance matrix information, and summing the first distance matrix information and an identity matrix to obtain first matrix information;
obtaining a diagonal node degree matrix of the first matrix information;
performing symmetrical normalization processing on the first matrix information based on the first matrix information and the diagonal node degree matrix to obtain first target matrix information;
and calculating the first target matrix information, the first parameter matrix and the first target characteristic graph information through the activation function to obtain a first relation characteristic vector.
12. A computer-readable storage medium storing a plurality of instructions adapted to be loaded by a processor to perform the steps of the information processing method according to any one of claims 1 to 8.
CN202010840896.XA 2020-08-20 2020-08-20 Information processing method and device and computer readable storage medium Active CN111709497B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010840896.XA CN111709497B (en) 2020-08-20 2020-08-20 Information processing method and device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010840896.XA CN111709497B (en) 2020-08-20 2020-08-20 Information processing method and device and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN111709497A CN111709497A (en) 2020-09-25
CN111709497B true CN111709497B (en) 2020-11-20

Family

ID=72547346

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010840896.XA Active CN111709497B (en) 2020-08-20 2020-08-20 Information processing method and device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN111709497B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112232397A (en) * 2020-09-30 2021-01-15 上海眼控科技股份有限公司 Knowledge distillation method and device of image classification model and computer equipment
CN112308150B (en) * 2020-11-02 2022-04-15 平安科技(深圳)有限公司 Target detection model training method and device, computer equipment and storage medium
CN112651500B (en) * 2020-12-30 2021-12-28 深圳金三立视频科技股份有限公司 Method for generating quantization model and terminal
CN113762051B (en) * 2021-05-13 2024-05-28 腾讯科技(深圳)有限公司 Model training method, image detection device, storage medium and equipment
CN113255915B8 (en) * 2021-05-20 2024-02-06 深圳思谋信息科技有限公司 Knowledge distillation method, device, equipment and medium based on structured example graph
CN113515656B (en) * 2021-07-06 2022-10-11 天津大学 Multi-view target identification and retrieval method and device based on incremental learning
CN113221017B (en) * 2021-07-08 2021-10-29 智者四海(北京)技术有限公司 Rough arrangement method and device and storage medium
CN113635310B (en) * 2021-10-18 2022-01-11 中国科学院自动化研究所 Model migration method and device
CN115645972B (en) * 2022-09-13 2024-04-16 安徽理工大学 Extraction equipment for chemical pharmacy

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108846355A (en) * 2018-06-11 2018-11-20 腾讯科技(深圳)有限公司 Image processing method, face identification method, device and computer equipment
CN110659665A (en) * 2019-08-02 2020-01-07 深圳力维智联技术有限公司 Model construction method of different-dimensional features and image identification method and device
CN110674714A (en) * 2019-09-13 2020-01-10 东南大学 Human face and human face key point joint detection method based on transfer learning
CN111062951A (en) * 2019-12-11 2020-04-24 华中科技大学 Knowledge distillation method based on semantic segmentation intra-class feature difference

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
RU2719028C1 (en) * 2016-09-07 2020-04-16 Электа, Инк. System and method for learning models of plans of radiotherapeutic treatment with prediction of dose distribution of radiotherapy

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108846355A (en) * 2018-06-11 2018-11-20 腾讯科技(深圳)有限公司 Image processing method, face identification method, device and computer equipment
CN110659665A (en) * 2019-08-02 2020-01-07 深圳力维智联技术有限公司 Model construction method of different-dimensional features and image identification method and device
CN110674714A (en) * 2019-09-13 2020-01-10 东南大学 Human face and human face key point joint detection method based on transfer learning
CN111062951A (en) * 2019-12-11 2020-04-24 华中科技大学 Knowledge distillation method based on semantic segmentation intra-class feature difference

Also Published As

Publication number Publication date
CN111709497A (en) 2020-09-25

Similar Documents

Publication Publication Date Title
CN111709497B (en) Information processing method and device and computer readable storage medium
CN112131978B (en) Video classification method and device, electronic equipment and storage medium
CN111324774B (en) Video duplicate removal method and device
CN111242844B (en) Image processing method, device, server and storage medium
CN110555481A (en) Portrait style identification method and device and computer readable storage medium
CN113704531A (en) Image processing method, image processing device, electronic equipment and computer readable storage medium
CN111242019A (en) Video content detection method and device, electronic equipment and storage medium
CN112418302A (en) Task prediction method and device
CN111079833A (en) Image recognition method, image recognition device and computer-readable storage medium
CN114283351A (en) Video scene segmentation method, device, equipment and computer readable storage medium
CN112906730B (en) Information processing method, device and computer readable storage medium
CN114445633A (en) Image processing method, apparatus and computer-readable storage medium
CN111652329B (en) Image classification method and device, storage medium and electronic equipment
CN115223020A (en) Image processing method, image processing device, electronic equipment and readable storage medium
CN115131849A (en) Image generation method and related device
CN111080746A (en) Image processing method, image processing device, electronic equipment and storage medium
CN111046655B (en) Data processing method and device and computer readable storage medium
CN112633425B (en) Image classification method and device
CN110111365B (en) Training method and device based on deep learning and target tracking method and device
CN113128526B (en) Image recognition method and device, electronic equipment and computer-readable storage medium
CN114611692A (en) Model training method, electronic device, and storage medium
Peng et al. Geometric scene parsing with hierarchical lstm
CN113537267A (en) Method and device for generating countermeasure sample, storage medium and electronic equipment
CN112668608A (en) Image identification method and device, electronic equipment and storage medium
CN110163049B (en) Face attribute prediction method, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40028561

Country of ref document: HK