CN116994098A - Large model prompt learning method based on category attribute knowledge enhancement - Google Patents
Large model prompt learning method based on category attribute knowledge enhancement Download PDFInfo
- Publication number
- CN116994098A CN116994098A CN202311261605.1A CN202311261605A CN116994098A CN 116994098 A CN116994098 A CN 116994098A CN 202311261605 A CN202311261605 A CN 202311261605A CN 116994098 A CN116994098 A CN 116994098A
- Authority
- CN
- China
- Prior art keywords
- attribute
- category
- prompt
- image
- generating
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 23
- 230000000007 visual effect Effects 0.000 claims abstract description 46
- 238000012360 testing method Methods 0.000 claims abstract description 37
- 230000010354 integration Effects 0.000 claims abstract description 24
- 238000012549 training Methods 0.000 claims abstract description 17
- 238000013528 artificial neural network Methods 0.000 claims description 3
- 230000008569 process Effects 0.000 claims description 3
- 238000002372 labelling Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 210000002569 neuron Anatomy 0.000 description 3
- 241000208140 Acer Species 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- KLPWJLBORRMFGK-UHFFFAOYSA-N Molindone Chemical compound O=C1C=2C(CC)=C(C)NC=2CCC1CN1CCOCC1 KLPWJLBORRMFGK-UHFFFAOYSA-N 0.000 description 1
- 210000001015 abdomen Anatomy 0.000 description 1
- 230000003213 activating effect Effects 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 239000012633 leachable Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 229940028394 moban Drugs 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 239000013598 vector Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Multimedia (AREA)
- Medical Informatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- General Engineering & Computer Science (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a large model prompt learning method based on category attribute knowledge enhancement, which comprises the steps of acquiring an image recognition training data set, and generating visual attribute sets of various categories through manual annotation or using ChatGPT; generating an attribute perceivable prompting background through an attribute integration module; respectively placing the training image and the corresponding prompt sentence carrying visual attribute information into an image encoder and a text encoder to obtain image characteristics and text classification weights; comparing and learning the image characteristics and the text classification weight, and updating parameters of the attribute integration module through the comparison and learning to obtain the attribute integration module after training; generating a most obvious visual attribute set of each test category according to the category space of the test task; and loading the images to be tested and the visual attribute sets of the test categories into the model, and calculating the text classification weight and the similarity of the image features, wherein the text pair with the maximum similarity is the prediction result. The invention has the advantages of strong zero sample recognition capability, strong expandability and the like.
Description
Technical Field
The invention relates to a large model prompt learning method based on category attribute knowledge enhancement, belonging to the technical field of computer vision and migration learning.
Background
The contextual cue learning is derived from natural language processing. The context hint based paradigm formalizes the tasks of downstream natural language processing as mask language modeling problems and introduces pre-trained language models (e.g., BERT and GPT) to generate results by employing appropriate hint contexts Wen Moban. Compared to traditional fine-tuning paradigms, the hint-based paradigm can bridge the gap between downstream tasks and pre-training tasks. The example of contextual hints inspires the computer vision field by adapting pre-trained visual models (e.g., vision Transformer and Swin transducer) to downstream tasks through hint learning. Radford et al show that a context hint based paradigm can enable CLIP zero sample prediction. However, it often takes a significant amount of time to find and design the appropriate hint templates for the target task. In practice, the prompt engineering is through manual trial and error and careful design. To this end, zhou et al propose a CoOp method to train a leachable hint on the downstream dataset, hint learning being the conversion of hint context into a set of continuous vectors that are end-to-end optimized for downstream tasks. Lu et al further promote the diversity of CoOp cues from a modeling perspective. Khattak et al propose the MaPLe method in CLIP to improve alignment between vision and language embedding.
The existing visual language large model prompt learning method generally causes that the learned prompt background is over-fitted with training data, so that the performance of the large model on a zero sample recognition task is degraded, and the large model particularly has poor performance on a fine-granularity zero sample image recognition task.
Disclosure of Invention
In order to overcome the defects in the prior art, the invention aims to provide a large model prompt learning method based on category attribute knowledge enhancement.
The technical scheme provided by the invention for solving the technical problems is as follows: a large model prompt learning method based on category attribute knowledge enhancement comprises the following steps:
s1, acquiring an image recognition training data set and generating a visual attribute set of each category;
s2, generating attribute perceivable prompt background by the visual attribute sets of each category through an attribute integration module, and combining the attribute perceivable prompt background with the name of the corresponding categoryc i Together forming a prompt sentence carrying visual attribute information of related categories;
s3, respectively placing the training images and the corresponding types of prompt sentences carrying visual attribute information into an image encoder and a text encoder to obtain image characteristics and text classification weights;
s4, comparing and learning the image characteristics and the text classification weight, and updating parameters of the attribute integration module through the comparison and learning to obtain the attribute integration module after training;
s5, generating a visual attribute set of each test category through manual annotation or using ChatGPT according to the category space of the test task;
s6, generating attribute perceivable prompt backgrounds of all the test categories through the trained attribute integration module by using the visual attribute sets of all the test categories, and then combining the attribute perceivable prompt backgrounds of all the test categories with the names of all the test categoriesc i Forming prompt sentences of each test category, which carry visual attribute information of related categories;
s7, generating image features of the image to be tested by the image encoder;
s8, generating text classification weights of the test categories through a text encoder by using prompt sentences of the test categories, wherein the prompt sentences carry visual attribute information of the related categories;
and S9, calculating the similarity of the image characteristics of the image to be tested and the text classification weights of all the test categories, wherein the text pair with the maximum similarity is the prediction result.
The further technical scheme is that in the step S1, the visual attribute set is generated through manual annotation or ChatGPT.
The further technical scheme is that the visual attribute set is the most obvious visual attribute set of each category.
The specific process of generating the attribute perceivable prompting background is as follows: generating M most significant visual attribute sets of each category through manual annotation or ChatGPT, categoryiThe set of attributes of (1) is expressed asThrough a learnable attribute integration module +.>Generating a class-specific reminder background->WhereinThe method comprises the steps of carrying out a first treatment on the surface of the Attribute perceptible cue background->Name of corresponding categoryc i Prompt sentence carrying visual attribute information of related category is formed together>。
Further technical scheme is that the attribute integration module is a double-layer fully-connected neural network, and the dimension of the hidden layer is set to 512.
The further technical scheme is that in the step S3, the image encoder is implemented by res net or ViT, and the text encoder is implemented by a transducer.
The invention has the following beneficial effects:
1. zero samples are strong in recognition capability. The invention can perceive the prompt background through learning the attribute, and can effectively improve the recognition capability of the model to the unknown class. Experiments were performed using three sets of zero sample recognition benchmark tasks AWA2, CUB, SUN, training on known classes of each dataset, testing on unknown classes. Compared with CoOp, the performance of the method is obviously improved on the zero sample image recognition task. For example, on the conventional zero sample recognition task, the invention can raise AWA2 by 0.27%, CUB by 13.47% and SUN by 8.50%; on the generalized zero sample recognition task, the invention can raise AWA2 by 1.64%, CUB by 17.17% and SUN by 10.80%.
2. And the expandability is strong. The category attribute knowledge enhancement mechanism of the present invention can be applied to other prompt learning frameworks (e.g., maPLe) to conduct experiments without adjusting the hyper-parameters of the original prompt learning framework.
3. The traditional manual attribute labeling approach may be replaced with ChatGPT queries. Specifically, we first query ChatGPT for various attributes by using the following templates: "Give me 16 noun attributes for [ class ], each attribute just one word," then uses the category attributes of the query to generate an attribute-aware hint context that performs in a manner comparable to manually labeling attributes.
Drawings
FIG. 1 is a flow chart of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made apparent and fully in view of the accompanying drawings, in which some, but not all embodiments of the invention are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in fig. 1, the large model prompt learning method based on category attribute knowledge enhancement of the present invention includes:
step one, acquiring an image recognition training data set, and generating a most obvious visual attribute set of each category through manual annotation or using ChatGPT;
the visual attribute set refers to the most obvious visual attribute set of each category, such as white-belly sea parcels in a bird data set contain obvious visual features: white abdomen and white chest;
step two, generating attribute perceivable prompt backgrounds from visual attribute sets of all categories through an attribute integration module;
attribute perceivable prompt background generates M most obvious visual attribute sets of each category through manual annotation or ChatGPT, and attribute set of category iRepresented asThrough a learnable attribute integration module +.>Generating a class-specific reminder background->Wherein->The method comprises the steps of carrying out a first treatment on the surface of the Attribute perceptible cue background->Name of corresponding categoryc i Together form a prompt sentence carrying visual attribute information about the category +.>;
The attribute integration module adopts a double-layer full-connection structure (Linear-ReLU-Linear), and the dimension of the hidden layer is set to 512;
prompting background in the inventionAccording to the dynamic change of different categories, the model can sense the visual attribute information of each category, the mobility among the categories is enhanced, and the recognition capability learned on the training category is migrated to the test category.
Step three, respectively putting the training images and the corresponding types of prompt sentences carrying visual attribute information into an image encoder and a text encoder to obtain image characteristics and text classification weights, wherein the text encoder is realized by a transducer, and the image encoder is realized by ResNet or ViT;
in order to realize effective learning of parameters and prevent catastrophic forgetting of a large model in the training process, parameters of a text encoder and an image encoder are frozen, and only parameters of an attribute integration module are updated. The model only needs to collect a small amount of samples for each class, so that the calculation load can be reduced, and the label collection and labeling work can be reduced;
the prompt background carrying the visual attribute information is helpful for the model to identify unknown test categories;
step four, comparing and learning the image characteristics and the text classification weight, and updating parameters of the attribute integration module through the comparison and learning to obtain the attribute integration module after training;
the invention adopts a contrast learning strategy and calculates model loss by using cross entropykClassification weight of classImage featurexContrast learning;
the contrast learning formula is as follows:
wherein:p(y=i|x) Is shown inxWhen it is image characteristic, it predicts labelyEqual to the firstiProbability of individual category;sim() Representing cosine similarity;Krepresenting the number of categories;τrepresenting a temperature parameter;jrepresent the firstjA category;wirefers to the firstiClassification weight of the category;wjrefers to the firstjClassification weights for the individual categories;xrepresenting image features;
the above formula would be substituted into the cross entropy function calculation model penalty to update the attribute integration module parameters.
The aim of contrast learning is to make the images and the prompt sentences of the belonging classes approach each other in the feature space, and the distances of the prompt sentences of other classes in the feature space are increased, so that the true label score of each image class is maximized;
generating a most obvious visual attribute set of each test category through manual annotation or using ChatGPT according to the category space of the test task;
step six, generating attribute perceivable prompt background of each test category through the trained attribute integration module by the visual attribute set of each test category, and then testing each testAttributes of the classes may perceptibly suggest context and names of the test classesc i Forming prompt sentences of each test category, which carry visual attribute information of related categories;
step seven, generating image characteristics of an image to be tested by an image encoder;
step eight, generating text classification weights of all test categories through a text encoder by using prompt sentences of all test categories, wherein the prompt sentences carry visual attribute information of related categories;
and step nine, calculating the similarity of the image characteristics of the image to be tested and the text classification weights of all the test categories, wherein the test category corresponding to the maximum similarity is the prediction result.
The data sets in the invention comprise three data sets of AWA2, CUB and SUN, wherein AWA2 is a coarse-grained data set, and CUB and SUN are fine-grained data sets. For example: coarse-grained data sets contain a collection of images of all animal categories, while fine-grained data sets contain a collection of images of a certain animal refinement category.
Attribute integration module: the model uses a double-layer Linear layer (Linear-ReLU-Linear), and the ReLU activation function plays an extremely important role in a neural network and can be used for activating neurons so that the neurons react to input signals and fully utilize the nonlinear transformation characteristics of the neurons.
The present invention is not limited to the above-mentioned embodiments, but is not limited to the above-mentioned embodiments, and any person skilled in the art can make some changes or modifications to the equivalent embodiments without departing from the scope of the technical solution of the present invention, but any simple modification, equivalent changes and modifications to the above-mentioned embodiments according to the technical substance of the present invention are still within the scope of the technical solution of the present invention.
Claims (6)
1. The large model prompt learning method based on category attribute knowledge enhancement is characterized by comprising the following steps:
s1, acquiring an image recognition training data set and generating a visual attribute set of each category;
s2, generating attribute perceivable prompt background by the visual attribute sets of each category through an attribute integration module, and combining the attribute perceivable prompt background with the name of the corresponding categoryc i Together forming a prompt sentence carrying visual attribute information of related categories;
s3, respectively placing the training images and the corresponding types of prompt sentences carrying visual attribute information into an image encoder and a text encoder to obtain image characteristics and text classification weights;
s4, comparing and learning the image characteristics and the text classification weight, and updating parameters of the attribute integration module through the comparison and learning to obtain the attribute integration module after training;
s5, generating a visual attribute set of each test category through manual annotation or using ChatGPT according to the category space of the test task;
s6, generating attribute perceivable prompt backgrounds of all the test categories through the trained attribute integration module by using the visual attribute sets of all the test categories, and then combining the attribute perceivable prompt backgrounds of all the test categories with the names of all the test categoriesc i Forming prompt sentences of each test category, which carry visual attribute information of related categories;
s7, generating image features of the image to be tested by the image encoder;
s8, generating text classification weights of the test categories through a text encoder by using prompt sentences of the test categories, wherein the prompt sentences carry visual attribute information of the related categories;
and S9, calculating the similarity of the image characteristics of the image to be tested and the text classification weights of all the test categories, wherein the text pair with the maximum similarity is the prediction result.
2. The large model hint learning method based on category attribute knowledge enhancement according to claim 1, wherein the visual attribute set in step S1 is generated by manual annotation or ChatGPT.
3. The large model hint learning method based on category attribute knowledge enhancement of claim 1 wherein the set of visual attributes is the most prominent set of visual attributes for each category.
4. The large model prompt learning method based on category attribute knowledge enhancement according to claim 3, wherein the specific process of generating the attribute perceivable prompt background is: generating M most significant visual attribute sets of each category through manual annotation or ChatGPT, categoryiThe set of attributes of (1) is expressed asThrough a learning attribute integration moduleGenerating a category-specific attribute perceptible cue background +.>Wherein->The method comprises the steps of carrying out a first treatment on the surface of the Attribute perceptible cue background->Name of corresponding categoryc i Prompt sentence carrying visual attribute information of related category is formed together>。
5. The large model prompt learning method based on category attribute knowledge enhancement according to claim 4, wherein the attribute integration module is a double-layer fully connected neural network, and a hidden layer dimension is set to 512.
6. The large model hint learning method based on category attribute knowledge enhancement according to claim 1, wherein the image encoder in step S3 is implemented by res net or ViT, and the text encoder is implemented by a transducer.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311261605.1A CN116994098B (en) | 2023-09-27 | 2023-09-27 | Large model prompt learning method based on category attribute knowledge enhancement |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311261605.1A CN116994098B (en) | 2023-09-27 | 2023-09-27 | Large model prompt learning method based on category attribute knowledge enhancement |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116994098A true CN116994098A (en) | 2023-11-03 |
CN116994098B CN116994098B (en) | 2023-12-05 |
Family
ID=88527011
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311261605.1A Active CN116994098B (en) | 2023-09-27 | 2023-09-27 | Large model prompt learning method based on category attribute knowledge enhancement |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116994098B (en) |
Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190244059A1 (en) * | 2018-02-06 | 2019-08-08 | Hrl Laboratories, Llc | Machine vision system for recognizing novel objects |
CN114972795A (en) * | 2021-12-30 | 2022-08-30 | 昆明理工大学 | National clothing image subtitle generation method combining attribute detection and visual perception |
CN115311389A (en) * | 2022-08-05 | 2022-11-08 | 西北大学 | Multi-mode visual prompting technology representation learning method based on pre-training model |
CN115631365A (en) * | 2022-09-29 | 2023-01-20 | 浙江大学 | Cross-modal contrast zero sample learning method fusing knowledge graph |
CN115758998A (en) * | 2022-11-24 | 2023-03-07 | 华润数字科技有限公司 | Metaphor recognition method, electronic device, and computer-readable storage medium |
US20230075862A1 (en) * | 2021-09-08 | 2023-03-09 | Samsung Electronics Co., Ltd. | Supervised contrastive learning for visual grounding |
US20230154146A1 (en) * | 2021-11-16 | 2023-05-18 | Salesforce.Com, Inc. | Systems and methods for video and language pre-training |
CN116259075A (en) * | 2023-01-16 | 2023-06-13 | 安徽大学 | Pedestrian attribute identification method based on prompt fine tuning pre-training large model |
US20230230198A1 (en) * | 2022-01-14 | 2023-07-20 | Adobe Inc. | Utilizing a generative neural network to interactively create and modify digital images based on natural language feedback |
CN116469110A (en) * | 2023-04-18 | 2023-07-21 | 平安科技(深圳)有限公司 | Image classification method, device, electronic equipment and computer readable storage medium |
CN116468725A (en) * | 2023-06-13 | 2023-07-21 | 北京航空航天大学杭州创新研究院 | Industrial defect detection method, device and storage medium based on pre-training model |
CN116503683A (en) * | 2023-06-06 | 2023-07-28 | 重庆师范大学 | Modal interaction enhanced prompt learning method of visual language model |
CN116628303A (en) * | 2023-04-26 | 2023-08-22 | 中国科学院信息工程研究所 | Semi-structured webpage attribute value extraction method and system based on prompt learning |
CN116645683A (en) * | 2023-05-31 | 2023-08-25 | 重庆西部笔迹大数据研究院 | Signature handwriting identification method, system and storage medium based on prompt learning |
CN116662565A (en) * | 2023-05-23 | 2023-08-29 | 中国人民解放军国防科技大学 | Heterogeneous information network keyword generation method based on contrast learning pre-training |
CN116702035A (en) * | 2023-06-02 | 2023-09-05 | 中国科学院合肥物质科学研究院 | Pest identification method based on multi-mode self-supervision transducer architecture |
-
2023
- 2023-09-27 CN CN202311261605.1A patent/CN116994098B/en active Active
Patent Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190244059A1 (en) * | 2018-02-06 | 2019-08-08 | Hrl Laboratories, Llc | Machine vision system for recognizing novel objects |
US20230075862A1 (en) * | 2021-09-08 | 2023-03-09 | Samsung Electronics Co., Ltd. | Supervised contrastive learning for visual grounding |
US20230154146A1 (en) * | 2021-11-16 | 2023-05-18 | Salesforce.Com, Inc. | Systems and methods for video and language pre-training |
CN114972795A (en) * | 2021-12-30 | 2022-08-30 | 昆明理工大学 | National clothing image subtitle generation method combining attribute detection and visual perception |
US20230230198A1 (en) * | 2022-01-14 | 2023-07-20 | Adobe Inc. | Utilizing a generative neural network to interactively create and modify digital images based on natural language feedback |
CN115311389A (en) * | 2022-08-05 | 2022-11-08 | 西北大学 | Multi-mode visual prompting technology representation learning method based on pre-training model |
CN115631365A (en) * | 2022-09-29 | 2023-01-20 | 浙江大学 | Cross-modal contrast zero sample learning method fusing knowledge graph |
CN115758998A (en) * | 2022-11-24 | 2023-03-07 | 华润数字科技有限公司 | Metaphor recognition method, electronic device, and computer-readable storage medium |
CN116259075A (en) * | 2023-01-16 | 2023-06-13 | 安徽大学 | Pedestrian attribute identification method based on prompt fine tuning pre-training large model |
CN116469110A (en) * | 2023-04-18 | 2023-07-21 | 平安科技(深圳)有限公司 | Image classification method, device, electronic equipment and computer readable storage medium |
CN116628303A (en) * | 2023-04-26 | 2023-08-22 | 中国科学院信息工程研究所 | Semi-structured webpage attribute value extraction method and system based on prompt learning |
CN116662565A (en) * | 2023-05-23 | 2023-08-29 | 中国人民解放军国防科技大学 | Heterogeneous information network keyword generation method based on contrast learning pre-training |
CN116645683A (en) * | 2023-05-31 | 2023-08-25 | 重庆西部笔迹大数据研究院 | Signature handwriting identification method, system and storage medium based on prompt learning |
CN116702035A (en) * | 2023-06-02 | 2023-09-05 | 中国科学院合肥物质科学研究院 | Pest identification method based on multi-mode self-supervision transducer architecture |
CN116503683A (en) * | 2023-06-06 | 2023-07-28 | 重庆师范大学 | Modal interaction enhanced prompt learning method of visual language model |
CN116468725A (en) * | 2023-06-13 | 2023-07-21 | 北京航空航天大学杭州创新研究院 | Industrial defect detection method, device and storage medium based on pre-training model |
Non-Patent Citations (1)
Title |
---|
M. MANIPARAMBIL 等: "Enhancing CLIP with GPT-4: harnessing visual descriptions prompts", 《ARXIV平台在线公开: ARXIV.ORG/PDF/2307.11661.PDF》, pages 1 - 15 * |
Also Published As
Publication number | Publication date |
---|---|
CN116994098B (en) | 2023-12-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113987209B (en) | Natural language processing method, device, computing equipment and storage medium based on knowledge-guided prefix fine adjustment | |
CN112100383B (en) | Meta-knowledge fine tuning method and platform for multitask language model | |
CN112183747B (en) | Neural network training method, neural network compression method and related equipment | |
CN111680484B (en) | Answer model generation method and system for visual general knowledge reasoning question and answer | |
CN112256866B (en) | Text fine-grained emotion analysis algorithm based on deep learning | |
CN107657313B (en) | System and method for transfer learning of natural language processing task based on field adaptation | |
CN117218498B (en) | Multi-modal large language model training method and system based on multi-modal encoder | |
CN110162789A (en) | A kind of vocabulary sign method and device based on the Chinese phonetic alphabet | |
CN116821287B (en) | Knowledge graph and large language model-based user psychological portrait system and method | |
CN116975776A (en) | Multi-mode data fusion method and device based on tensor and mutual information | |
CN111666752A (en) | Circuit teaching material entity relation extraction method based on keyword attention mechanism | |
CN115063119A (en) | Recruitment decision system and method based on adaptivity of recruitment behavior data | |
CN114781375A (en) | Military equipment relation extraction method based on BERT and attention mechanism | |
Ferlitsch | Deep Learning Patterns and Practices | |
CN113297374B (en) | Text classification method based on BERT and word feature fusion | |
CN113869005A (en) | Pre-training model method and system based on sentence similarity | |
CN112905750A (en) | Generation method and device of optimization model | |
CN116994098B (en) | Large model prompt learning method based on category attribute knowledge enhancement | |
CN112597770A (en) | Sensitive information query method based on deep learning | |
CN117093692A (en) | Multi-granularity image-text matching method and system based on depth fusion | |
CN116561272A (en) | Open domain visual language question-answering method and device, electronic equipment and storage medium | |
CN114239575B (en) | Statement analysis model construction method, statement analysis method, device, medium and computing equipment | |
CN115952360A (en) | Domain-adaptive cross-domain recommendation method and system based on user and article commonality modeling | |
Zeng | Intelligent test algorithm for English writing using English semantic and neural networks | |
CN112989068A (en) | Knowledge graph construction method for Tang poetry knowledge and Tang poetry knowledge question-answering system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |