CN112989088A - Visual relation example learning method based on reinforcement learning - Google Patents
Visual relation example learning method based on reinforcement learning Download PDFInfo
- Publication number
- CN112989088A CN112989088A CN202110152379.8A CN202110152379A CN112989088A CN 112989088 A CN112989088 A CN 112989088A CN 202110152379 A CN202110152379 A CN 202110152379A CN 112989088 A CN112989088 A CN 112989088A
- Authority
- CN
- China
- Prior art keywords
- visual
- agent
- search
- action
- instance
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Abstract
The visual relationship is usually represented as a triple < subject, predictor, object >, which contains two objects subject and object and the interaction predictor between them. The visual relation learning is a bridge between the low-level image perception task and the high-level image cognition task, and belongs to the intermediate-level image understanding task. Visual relationship instance learning is the problem of determining two object instances involved in each visual relationship given an image and corresponding set of visual relationships. The problem is modeled into a sequence decision process when two agents search on images by two example search boxes of subject and object, so that the visual relation example learning method based on deep reinforcement learning is provided. For a given test image and an associated visual relationship set, the example frames corresponding to the subject and the object in each visual relationship can be quickly and accurately found.
Description
Technical Field
The invention belongs to the technical field of computer application, relates to deep learning, visual relation and reinforcement learning, and particularly relates to a visual relation example learning method based on reinforcement learning.
Background
A long-standing goal in the field of computer vision is to have an agent understand human natural language sufficiently to enable it to perform specific tasks in a visual environment. Currently, in computer vision tasks, understanding of image content can be divided into perception and cognition levels, and an object detection task belongs to the perception level and can learn the relationship between low-level visual appearance and high-level text semantics in an image. However, in order to more fully understand the content expressed by the image, the interactive relationship between the objects in the image, i.e. the learning of the visual relationship, must be further understood, which belongs to the understanding of the content of the image in the cognitive level.
There are a large number of images and associated texts on the internet, from which a set of visual relationships describing the image content can be extracted, and learning of these visual relationships is essential for a thorough understanding of the image content. In recent years, we have witnessed the widespread application of visual relationship learning in a series of image understanding tasks, including image description generation, image retrieval, image synthesis, scene graph generation, visual reasoning, and visual question and answer. A visual relationship typically consists of two objects, a subject and an object, and an interaction between them, typically expressed as a triple < subject, predictor, object >, e.g. < person-edge-bike >. Visual relationship learning requires not only identifying the class and bounding-box in which to locate objects in a given image, but also indicating the interaction between each pair of objects. That is, the visual relationship learning is a bridge connecting low-level image perception tasks (object detection, image classification, etc.) and high-level image recognition tasks (image description generation, visual question answering, etc.), and belongs to a moderate-level image understanding task. Visual relationship instance learning is the problem of determining two object instances involved in each visual relationship given an image and corresponding set of visual relationships.
Existing visual relationship learning models can be divided into two categories: (1) a combined model; (2) and (5) separating the model. The joint model treats a visual relationship triplet as a category and then learns the classifier. Example (b)For example, Plummer et al learn a CCA model based on features of different combinations of subjects, objects and joint regions between them, and then classify each visual relationship using a ranking SVM. However, since visual relationships usually exhibit long-tailed distributions, the joint model has the drawbacks of being large in scale and weak in generalization. In addition, when the number of object classes is N and the number of interaction relation classes is K, the learning complexity is O (N)2K) In that respect The separation model respectively trains classifiers for learning aiming at each component in the visual relation triple, so that the learning complexity is reduced to O (N + K). Lu et al predict the interaction between pairs of objects using visual features of the pairs of objects and linguistic prior knowledge. Zhang et al propose to consider predicate as a translation vector between subject and object, i.e., s + p ≈ o, and then map the visual features of the paired objects to a low-dimensional relationship space to construct a classification model VTransE of visual relationships. Based on spatial features and statistical dependencies between subjects, predicates, and objects, Dai et al utilizes a network of depth relationships to predict visual relationships between pairs of objects. In addition, the context information is captured by using a graph neural network, and Xu et al classify visual relations by establishing a message iterative transfer model.
However, the above methods do not solve the problem of example confusion in the visual relationship learning. As shown in fig. 1, given a set of images and associated visual relationships, how to correctly find and output an example box of two object subjects and objects in each visual relationship. Since there are often multiple instances of objects belonging to the same class in an image in the case where the object class has been specified, an instance confusion problem in visual relationship learning is caused.
Disclosure of Invention
In order to solve the problem of example confusion in visual relationship learning, the invention provides a visual relationship example learning method based on reinforcement learning, based on a deep reinforcement learning framework, the visual relationship example learning is modeled into a sequence decision problem that two objects, namely a subject and an object, involved in each visual relationship are searched in an image by two agents S-agent and O-agent, the state, the action and the reward in the problem are defined, and for a given test image and an associated visual relationship set, the model learned by the method can quickly and accurately find the example frames corresponding to the subject and the object in each visual relationship, so that the understanding capability of a cognitive level on image contents is greatly improved.
In order to achieve the purpose, the invention adopts the following technical means:
a visual relation example learning method based on reinforcement learning comprises the following steps:
step 2, at each moment, the S-agent and the O-agent respectively execute the transformation action aiming at the subject and the object instance search boxes, so as to generate the search box at the next moment, then obtain the corresponding reward, and judge whether the search is terminated;
step 3, storing the state of the current moment, the action taken at the current moment, the obtained reward, the state of the next moment and a judgment mark for judging whether the search is terminated into an experience playback pool;
and 4, repeating the steps 1-3 until the experience playback pool reaches the minimum number capable of being sampled, randomly sampling a part of samples from the experience playback pool at the moment, respectively training the current Q networks and parameters of the S-agent and the O-agent, and respectively updating the parameters of the target Q networks of the S-agent and the O-agent by using the parameters of the current Q networks at regular intervals.
The visual relationship example learning method based on reinforcement learning can be used for finding each example in the appointed visual relationship in the visual environment, and can also apply the learned visual relationship and the corresponding example frame to image text related tasks such as image text mutual search, visual reasoning, visual question answering and the like.
The existing method needs to use the result of the object detection algorithm as a candidate example frame of two objects in each visual relationship, but the object detection algorithm is usually erroneous, and thus may affect the performance of the visual relationship learning to some extent. In the invention, the two agents start to search the example frames of the two objects in each visual relationship from the upper left corner of the image continuously, and the example frame with the labeling error is not used.
The existing method needs to evaluate a large number of candidate example frames in the process of generating the object example frame by using an object detection algorithm, thereby causing unnecessary calculation expense, the invention aims to learn the optimal strategy of searching the correct example frame, namely the shortest path, and the finally generated model can accurately find two object example frames involved in each visual relationship of the image by searching the minimum number of candidate example frames.
The method is based on deep reinforcement learning, and models an example positioning problem of an object related to each visual relation in a given image into a sequence decision problem when two agents search on the image, wherein S-agent of a search subject object and O-agent of the search subject start from the upper left corner of the image, and a specific action is executed according to the current state at each moment until the search is terminated to output an example frame corresponding to the two objects subject and object.
When an image and a visual relation set are given, the method can quickly and accurately find two object example frames in each visual relation by the shortest search path.
Drawings
FIG. 1 gives an image and a corresponding set of visual relationships.
FIG. 2 is a block diagram of example learning of visual relationships, with solid black and white boxes representing example search boxes for S-agent and O-agent, respectively, at the current time, and dashed boxes representing the next search box to jump to after performing the action at the current time.
FIG. 3 is a diagram of 9 predefined transformation actions, the implementation box representing the search box at the current time, the dashed box the search box at the next time, Left and Right representing horizontal Left and Right movement, respectively, Up and Down representing vertical Up and Down movement, Bigger and Smaller scaling operations, Taller and Fatter scaling operations, and terminal terminating the search operation.
FIG. 4 is an example box of visual relationship example learning, eventually learning two objects in each visual relationship, given a set of images and corresponding visual relationships.
Detailed Description
The embodiments of the present invention will be described in detail below with reference to the drawings and examples.
The invention relates to a visual relation example learning method based on reinforcement learning, which comprises the following steps:
In the invention, S-agent and O-agent respectively start searching from the upper left corner of the current image.
Specifically, the image represents an interactive environment, two agents continuously interact with the environment to form an Episode, and each Episode processes a different image. Defining a state vector StIn the form:
in the formula, v (I)t) Is the image I being processed at the current moment ttThe visual feature vector of (a) is,instance search boxes representing object at the present moment, i.e.Is the coordinate of the upper left corner thereof,is the width and length of the same,representing example search boxesThe visual feature vector of (a) is,representing a historical motion vector formed by concatenating the motion vectors of the agent S-agent at the last 10 moments,instance search boxes representing object objects at the current time, i.e.Is the coordinate of the upper left corner thereof,is the width and length of the same,representing example search boxesThe visual feature vector of (a) is,represents a historical motion vector, w (e), formed by concatenating the motion vectors of the agent O-agent at the past 10 timest) Is a visual relation e generated by a Skip-through language model and related to the time ttThe semantic of (2) is embedded into the vector,representing example search boxesAndthe spatial relationship feature vector between them, which is defined as follows:
wherein the content of the first and second substances,andrespectively, the intersection and union between two example search boxes, and we use the GMM model to combine the 6-dimensional vector because the vector dimension is too small to adequately capture the slight differences between different spatial relationshipsThe vector discretized into 400 dimensions serves as the final spatial relationship feature vector between the two example search boxes.
And 2, at each moment, respectively executing transformation actions aiming at the subject and object instance search boxes by the S-agent and the O-agent, respectively selecting and executing a specific action according to the respective current state, so as to jump the instance search box at the current moment to the instance search box at the next moment and obtain corresponding rewards, and then jumping to the next state until the search is terminated.
FIG. 2 is a frame diagram of example learning of visual relationships, which is inputted as an image and a corresponding set of visual relationships, and two agents S-agent and O-agent are two different DQN networks respectively.
Specifically, the transformation action may be defined as a 9-dimensional vector, the element of the vector is 1 to represent that the action is performed, and 0 is not performed, and the defined 9-dimensional action vectors respectively correspond to the following 9 actions: horizontal Right movement (Right), horizontal Left movement (Left), vertical Up movement (Up), vertical Down movement (Down), zoom in (Bigger), zoom out (Smaller), change height ratio size (teller), change width ratio size (Fatter), and Terminate search (Terminate), as shown in fig. 3.
At each time, the S-agent performs a selected transformation action such that the width and height of the instance search box for the subject change as follows:
similarly, at each time, the O-agent performs a selected transformation action such that the width and height of the instance search box for an object change as follows:
wherein, alpha is ∈ [0, 1 ]]Is a change parameter, e.g., the agent S-agent performs a Right action so that the instance search box for the subject is moved fromIs transformed intoAndwidth and height variations of instance search box for subject, respectively;Andthe width and height variation of the instance search box with respect to object, respectively.
S-agent performs an action such that the instance search box for subject is moved from the current search boxJump to next search boxTime of day, award obtainedIs defined as follows:
similarly, the O-agent performs an action such that the instance search box for object is from the current search boxJump to next search boxAwarding of prizesIs defined as:
wherein, gsGround-route, g, representing subjectoGround-route, sign (·) representing an object instance is a symbolic function, IOU (·) represents one of the two regionsCross-over ratio between them, i.e.
In particular, S-agent and O-agent obtain rewards after performing a termination search actionAndare respectively defined as:
where η is the reward for terminating the action and τ is the IoU threshold for terminating the action.
Step 3, storing the state transition of each moment, namely the state of the current moment, the action taken at the current moment, the obtained reward, the state of the next moment and a judgment mark for judging whether the search is terminated into an experience playback pool; in the invention, S-agent sets two Q networks respectively, namely the current Q network Q for action selectionsAnd target Q network Q 'for target value calculation'sO-agent also sets two Q networks respectively, i.e. the current Q network Q for action selectionoAnd target Q network Q 'for target value calculation'o。
And 4, the experience playback pool is used for storing historical data, the steps 1-3 are repeated until the data volume loaded in the experience playback pool reaches the minimum sampling number, a part of samples are randomly sampled from the experience playback pool, and the current Q network Q of the S-agent is respectively trainedsAnd its parameters, O-agent's current Q network QoAnd its parameters, respectively utilizing current Q network Q at regular intervalssUpdate S-agent target Q network Q 'of'sUsing the current Q netLuo QoUpdate S-agent target Q network Q 'of'oThe parameter (c) of (c).
The result of the example learning of the visual relationship is shown in fig. 4, and a large number of learned visual relationships and two corresponding object examples thereof can be further applied to high-level image understanding tasks such as visual question answering, visual reasoning, image description generation and the like.
In one embodiment of the invention, the training set data is N samplesWherein the content of the first and second substances,representing the ith image IiContains miA visual relationship, i.e.Representative image IiThe (j) th visual relationship of (c),andrepresenting the category of the two objects that interact,representing a category of interaction between two objects. The specific steps of this embodiment are as follows:
step 1):
initialization: experience playback pool D with capacity M and minimum samplable number Z, current Q network Q of two agents S-agent and O-agents,QoAnd its parameter thetas,θoTarget Q network Q 'of two Agents's,Q′oAnd parameters thereofThe number of iteration rounds T, IoU threshold τ,the method comprises the steps of terminating action reward eta, exploration rate epsilon, change parameter alpha, learning rate beta, attenuation factor gamma, sample number n of batch gradient decline and target Q network parameter updating frequency C.
Step 2):
when t is 1, from sample X1Starting, initializing state S1Actions for two agents to randomly select search boxes for subject and object instances with probabilities ε, respectivelyAndotherwise, selecting the current optimal action according to the model:
as,aorepresent the optional action sets for S-agent and O-agent, respectively. Two agents each performing a selected actionAndthereby jumping to a new state St+1And separately awardedAnd whether the state is _ end is terminatedtWill be provided withAnd storing the data into an experience playback pool D.
Step 3):
performing steps 1) -2) until the experience playback poolThe number of samples stored in the memory reaches the minimum sampling number Z, and then n samples are sampled from the empirical playback pool Dj 1, 2.. n, calculating a target reward value for the s-agent to perform the action:
using a mean square error loss functionUpdating parameter theta by inverse gradient propagation through neural networks. Calculating a target reward value for the o-agent to perform the action:
using a mean square error loss functionUpdating parameter theta by inverse gradient propagation through neural networko. Updating parameters of the target Q network every C rounds:
step 4):
when the test data set is input, the two agents s-agent and o-agent output an example frame of subject and object corresponding to each visual relation in the image when the search is terminated.
The specific example of the invention learning into images based on given image and text relationships can be applied to many vision-related tasks, for example, a robot performing a navigation task in a visual environment needs to execute the following instructions "send this cup to a woman standing in front of a car and having an umbrella", then the robot needs to find each example of the following visual relationships in the visual environment: < woman-on-umbrella >, < woman-standing in front-car > and then the cup-sending operation can be completed. In addition, the learned visual relationship and the corresponding example frame can be applied to image text related tasks such as image text searching, visual reasoning, visual question answering and the like, for example, in image text searching, the input search text is < person-playing-football >, so that a plurality of pictures containing 'person playing football' can be quickly acquired, and the interactive action relationship of 'person' and 'football' is not found by searching two objects of 'person' and 'football' and then searching the obtained results, wherein the pictures contain two objects of 'person' and 'football', and the interactive action relationship of 'playing' is formed between the 'person' and the 'football'.
Claims (5)
1. A visual relation example learning method based on reinforcement learning is characterized by comprising the following steps:
step 1, inputting training set data, obtaining each image and a corresponding visual relation set, and connecting the following vectors in series to form a state vector: the method comprises the steps that visual features of the whole image, visual features of two object instance search boxes, historical action vectors of two agents, spatial relation features between the object instance search boxes and text features obtained by coding a current visual relation through a Skip-through language model, the two agents are an S-agent used for searching the object instance search box and an O-agent used for searching the object instance search box, and the historical action vectors are formed by connecting action vectors executed at the last 10 moments in series;
step 2, at each moment, the S-agent and the O-agent respectively execute the transformation action aiming at the subject and the object instance search boxes, so as to generate the search box at the next moment, then obtain the corresponding reward, and judge whether the search is terminated;
step 3, storing the state of the current moment, the action taken at the current moment, the obtained reward, the state of the next moment and a judgment mark for judging whether the search is terminated into an experience playback pool;
and 4, repeating the steps 1-3 until the experience playback pool reaches the minimum number capable of being sampled, randomly sampling a part of samples from the experience playback pool, respectively training the current Q networks and parameters of the S-agent and the O-agent, and respectively updating the parameters of the target Q networks of the S-agent and the O-agent by using the parameters of the current Q network at regular intervals.
2. The reinforcement learning-based visual relationship instance learning method according to claim 1, wherein in the step 1, a state vector S is definedtIn the form:
wherein, v (I)t) Is the image I being processed at the current moment ttThe visual feature vector of (a) is,instance search boxes representing object at the present moment, i.e.Is the coordinate of the upper left corner thereof,is the width and length of the same,representing example search boxesThe visual feature vector of (a) is,representing a historical motion vector formed by concatenating the motion vectors of the agent S-agent at the last 10 moments,instance search boxes representing object objects at the current time, i.e.Is the coordinate of the upper left corner thereof,is the width and length of the same,representing example search boxesThe visual feature vector of (a) is,represents a historical motion vector, w (e), formed by concatenating the motion vectors of the agent O-agent at the past 10 timest) Is a visual relation e generated by a Skip-through language model and related to the time ttThe semantic of (2) is embedded into the vector,representing example search boxesAndthe spatial relationship feature vector between them, which is defined as follows:
3. The reinforcement learning-based visual relationship instance learning method according to claim 2, wherein the GMM model is utilized to learn the visual relationship instanceThe vector discretized into 400 dimensions serves as the final spatial relationship feature vector between the two example search boxes.
4. The method according to claim 3, wherein in step 2, the transformation action is defined as a 9-dimensional vector, the element of the vector is 1 to represent that the action is performed, and 0 is not performed, and the defined 9-dimensional action vectors respectively correspond to the following 9 actions: horizontal right and left movement, vertical up and down movement, zoom in and out, change height ratio size, change width ratio size, and terminate the search;
at each time, the S-agent performs a selected transformation action such that the width and height of the instance search box for the subject change as follows:
at each time, the O-agent performs a selected transformation action such that the width and height of the instance search box for an object change as follows:
5. The reinforcement learning-based visual relationship example learning method according to claim 4, wherein in the step 2, the S-agent performs an action to make the example search box about the subject from the current search boxJump to next search boxTime of day, award obtainedIs defined as follows:
the O-agent performs an action that causes the instance search box for the object to be moved from the current search boxJump to next search boxAwarding of prizesIs defined as:
wherein, gsGround-route, g, representing subjectoGround-route, sign () representing an object instance is a sign function, IOU () represents the intersection ratio between two regions, i.e.
Rewards obtained by S-agent and O-agent after executing termination search actionAndare respectively defined as:
where η is the reward for terminating the action and τ is the IoU threshold for terminating the action.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110152379.8A CN112989088B (en) | 2021-02-04 | 2021-02-04 | Visual relation example learning method based on reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110152379.8A CN112989088B (en) | 2021-02-04 | 2021-02-04 | Visual relation example learning method based on reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112989088A true CN112989088A (en) | 2021-06-18 |
CN112989088B CN112989088B (en) | 2023-03-21 |
Family
ID=76346704
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110152379.8A Active CN112989088B (en) | 2021-02-04 | 2021-02-04 | Visual relation example learning method based on reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112989088B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113554129A (en) * | 2021-09-22 | 2021-10-26 | 航天宏康智能科技(北京)有限公司 | Scene graph generation method and generation device |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107463881A (en) * | 2017-07-07 | 2017-12-12 | 中山大学 | A kind of character image searching method based on depth enhancing study |
WO2019035771A1 (en) * | 2017-08-17 | 2019-02-21 | National University Of Singapore | Video visual relation detection methods and systems |
CN111783852A (en) * | 2020-06-16 | 2020-10-16 | 北京工业大学 | Self-adaptive image description generation method based on deep reinforcement learning |
US20200334545A1 (en) * | 2019-04-19 | 2020-10-22 | Adobe Inc. | Facilitating changes to online computing environment by assessing impacts of actions using a knowledge base representation |
CN112256904A (en) * | 2020-09-21 | 2021-01-22 | 天津大学 | Image retrieval method based on visual description sentences |
-
2021
- 2021-02-04 CN CN202110152379.8A patent/CN112989088B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107463881A (en) * | 2017-07-07 | 2017-12-12 | 中山大学 | A kind of character image searching method based on depth enhancing study |
WO2019035771A1 (en) * | 2017-08-17 | 2019-02-21 | National University Of Singapore | Video visual relation detection methods and systems |
US20200334545A1 (en) * | 2019-04-19 | 2020-10-22 | Adobe Inc. | Facilitating changes to online computing environment by assessing impacts of actions using a knowledge base representation |
CN111783852A (en) * | 2020-06-16 | 2020-10-16 | 北京工业大学 | Self-adaptive image description generation method based on deep reinforcement learning |
CN112256904A (en) * | 2020-09-21 | 2021-01-22 | 天津大学 | Image retrieval method based on visual description sentences |
Non-Patent Citations (2)
Title |
---|
CAO, QIANWEN ET AL: "3-D Relation Network for visual relation recognition in videos", 《NEUROCOMPUTING》 * |
丁文博等: "深度学习的视觉关系检测方法研究进展", 《科技创新导报》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113554129A (en) * | 2021-09-22 | 2021-10-26 | 航天宏康智能科技(北京)有限公司 | Scene graph generation method and generation device |
Also Published As
Publication number | Publication date |
---|---|
CN112989088B (en) | 2023-03-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110609891B (en) | Visual dialog generation method based on context awareness graph neural network | |
CN109934261B (en) | Knowledge-driven parameter propagation model and few-sample learning method thereof | |
CN109993102B (en) | Similar face retrieval method, device and storage medium | |
US20160350653A1 (en) | Dynamic Memory Network | |
Kaluri et al. | An enhanced framework for sign gesture recognition using hidden Markov model and adaptive histogram technique. | |
US20200065560A1 (en) | Signal retrieval apparatus, method, and program | |
CN110377707B (en) | Cognitive diagnosis method based on depth item reaction theory | |
CN114495129B (en) | Character detection model pre-training method and device | |
CN112527993A (en) | Cross-media hierarchical deep video question-answer reasoning framework | |
US20200218932A1 (en) | Method and system for classification of data | |
CN112668608A (en) | Image identification method and device, electronic equipment and storage medium | |
CN115130591A (en) | Cross supervision-based multi-mode data classification method and device | |
CN115270752A (en) | Template sentence evaluation method based on multilevel comparison learning | |
CN112989088B (en) | Visual relation example learning method based on reinforcement learning | |
CN113420552B (en) | Biomedical multi-event extraction method based on reinforcement learning | |
CN113240033B (en) | Visual relation detection method and device based on scene graph high-order semantic structure | |
CN111914949B (en) | Zero sample learning model training method and device based on reinforcement learning | |
CN116452895B (en) | Small sample image classification method, device and medium based on multi-mode symmetrical enhancement | |
CN114565804A (en) | NLP model training and recognizing system | |
Wu et al. | Question-driven multiple attention (dqma) model for visual question answer | |
CN113821610A (en) | Information matching method, device, equipment and storage medium | |
CN115066690A (en) | Search normalization-activation layer architecture | |
Voruganti | Visual question answering with external knowledge | |
CN116129333B (en) | Open set action recognition method based on semantic exploration | |
CN114936297B (en) | Video question-answering method based on priori knowledge and object sensitivity |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |