CN116701637A - Zero sample text classification method, system and medium based on CLIP - Google Patents
Zero sample text classification method, system and medium based on CLIP Download PDFInfo
- Publication number
- CN116701637A CN116701637A CN202310778409.5A CN202310778409A CN116701637A CN 116701637 A CN116701637 A CN 116701637A CN 202310778409 A CN202310778409 A CN 202310778409A CN 116701637 A CN116701637 A CN 116701637A
- Authority
- CN
- China
- Prior art keywords
- text
- classification
- image
- label
- clip
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 54
- 239000013598 vector Substances 0.000 claims abstract description 40
- 238000012360 testing method Methods 0.000 claims description 19
- 238000013507 mapping Methods 0.000 claims description 5
- 238000004590 computer program Methods 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims 2
- 238000003058 natural language processing Methods 0.000 abstract description 3
- 238000010586 diagram Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 2
- 230000008451 emotion Effects 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/761—Proximity, similarity or dissimilarity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Medical Informatics (AREA)
- Multimedia (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a zero sample text classification method, a zero sample text classification system and a zero sample text classification medium based on a CLIP, wherein the zero sample text classification method comprises the following steps: s1: acquiring a text to be classified; s2: inputting the text into a text encoder to obtain text vectors, and inputting the image set in the text image set into an image encoder to obtain image vectors; s3: calculating the text vector and the image vector to obtain the similarity degree of the picture and the text; s4: and carrying out prediction matching according to the current classification task type and the calculated similarity degree to obtain a text classification result. The text information and the image information are combined and applied to natural language processing, so that the text image matching task which can be solved by reconstructing the text classification task into the CLIP model is realized, and the precision of text classification is improved.
Description
Technical Field
The invention relates to the technical field of Internet, in particular to a zero sample text classification method, system and medium based on CLIP.
Background
With the increasing maturity of internet technology, particularly the continuous progress of deep learning technology and natural language processing technology, the development of text classification technology has been greatly promoted. Meanwhile, the text classification technology is also widely applied in real life, such as intelligent customer service, intelligent mailbox and the like, and can be used for automatically identifying incoming message types, automatically detecting illegal contents and other services; the video platform field can help auditors to automatically carry out marking classification on related contents, so that manpower and material resources are greatly saved, and life experience of people is improved. Meanwhile, as a pre-trained model on a massive text image dataset, the CLIP can directly complete text image matching in the appointed field under the condition of no use of examples, namely zero sample learning.
However, in the existing study of text classification problems, people only pay attention to semantic information in input text, and ignore very valuable image information. For example, when a person sees the word "his mouth is raised", the mind first appears on a smiling picture, after which the person is reasonably considered to be happy, and the emotion expressed by the word is correspondingly classified as "happy". The process combines double information of the text river images, so that the classification result is more accurate, however, the text information and the image information are not combined in the current text classification field, and the process is further applied to natural language tasks.
Disclosure of Invention
The invention provides a zero sample classification method, a zero sample classification system and a zero sample classification medium based on CLIP, wherein the zero sample classification method solves the problem that text information and image information are not combined and applied to natural language tasks.
In a first aspect, the present invention provides a zero-sample text classification method based on CLIP, including:
s1: acquiring a text to be classified;
s2: inputting the text into a text encoder to obtain text vectors, and inputting the image set in the text image set into an image encoder to obtain image vectors;
s3: calculating the text vector and the image vector to obtain the similarity degree of the picture and the text;
s4: and carrying out prediction matching according to the current classification task type and the calculated similarity degree to obtain a text classification result.
Further, the text image set acquisition process comprises the following steps:
s21: acquiring a text set and a label set according to the text to be classified; the text set is a set of texts to be classified, and the tag set is a set of classifications to which the texts to be classified possibly belong.
S22: randomly downloading a picture aiming at each tag in the tag set to obtain an image set formed by all downloaded pictures;
s23: the text label set is converted into a text image set.
Further, the text image set acquisition process comprises the following steps:
s21: acquiring a text set and a label set according to the text to be classified; the text set is a set of texts to be classified, and the tag set is a set of classifications to which the texts to be classified possibly belong.
S22: randomly downloading a plurality of pictures for each tag in a tag set to perform ensemble enhancement to obtain an image set formed by all downloaded pictures;
s23: the text label set is converted into a text image set.
Further, the method is characterized in that the specific process of converting the text label set into the text image set is as follows:
according to the type of each label, the corresponding picture obtained in S22 is adopted to replace, so that the text label set Mapping to text image set->Wherein x is i For the ith text, y i For the ith tag, V i M Is y i And corresponding M pictures, wherein N is the number of texts in the test set.
Further, after the text image set is acquired, additional semantic cue words are added before the beginning of the text in the test set for prompt enhancement, expressed as:
wherein, prompt is a semantic Prompt for a specific task of a different text classification test set; x is the text in the test set;text after adding additional semantic cue words.
Further, the similarity degree is calculated by performing dot product operation on the text vector and the image vector.
Further, the classification task types comprise a single-label classification task and a multi-label classification task.
Further, the process of obtaining the classified result by performing prediction matching according to the calculated similarity and the current classification task type specifically comprises the following steps:
if the classification task type is a single-label classification task, selecting the category with the highest similarity degree as a final matching result;
if the classification task type is the multi-label classification task, selecting the class with the similarity degree larger than a preset threshold value as a final matching result.
In a second aspect, the present invention provides a CLIP-based zero-sample text classification system, comprising:
and a data acquisition module: the method comprises the steps of obtaining a text to be classified;
and a coding module: for inputting text into a text encoder to obtain text vectors; inputting an image set in the text image set into an image encoder to obtain an image vector;
and a classification prediction module: the method comprises the steps of calculating a text vector and an image vector to obtain the similarity degree of a picture and a text; and the method is used for carrying out prediction matching according to the calculated similarity and the current classification task type to obtain a text classification result.
In a third aspect, the present invention provides a computer-readable storage medium: a computer program is stored which, when called by a processor, performs the steps of the method as described above.
Advantageous effects
The invention provides a zero sample classification method, a zero sample classification system and a zero sample classification medium based on a CLIP, wherein the zero sample classification method, the zero sample classification system and the zero sample classification medium based on the CLIP are used for realizing the text image matching task which can be solved by reconstructing a text classification task into a CLIP model by combining text information and image information and applying the zero sample classification method and the zero sample classification medium to natural language processing, and improving the precision of text classification.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a zero sample classification method based on CLIP provided by an embodiment of the invention;
FIG. 2 is an exemplary diagram of a zero sample text classification method based on CLIP provided by an embodiment of the invention;
FIG. 3 is a text image matching architecture diagram of a CLIP model provided by an embodiment of the present invention;
FIG. 4 is a schematic diagram of a campt enhancement mode provided by an embodiment of the present invention;
fig. 5 is a schematic diagram of an ensable enhancement mode provided by an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be described in detail below. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, based on the examples herein, which are within the scope of the invention as defined by the claims, will be within the scope of the invention as defined by the claims.
Example 1
As shown in fig. 1, this embodiment provides a zero sample text classification method based on CLIP, including:
s1: and obtaining the text to be classified. In this embodiment, as shown in fig. 2, the acquired text to be classified is "Bye.
S2: inputting the text into a text encoder to obtain text vectors, and inputting the image set in the text image set into an image encoder to obtain image vectors.
Specifically, the text image set acquisition process comprises the following steps:
s21: acquiring a text set and a label set according to the text to be classified; the text set is a set of texts to be classified, and the tag set is a set of classifications to which the texts to be classified possibly belong. The number of texts in the text set is the same as the number of labels in the label set, for example, the number of texts in the text set { A, B, C, D } is 4, wherein the labels of the text A and the text B are both a, the label of the text C is C, and the label of the text D is D, and the label set corresponding to the text set is { a, a, C, D }.
In this embodiment, the test set data is a text label setThe text Set is a Test Set, and contains a plurality of texts, such as "Byeest Set={x 1 ,x 2 ,…,x N X, where x N Is the nth text; the tag Set is a Label Set, which includes a plurality of tags, such as "ear", "anger", "joy", and..the term "surrise", which is denoted Label set= { y 1 ,y 2 ,…,y N -wherein y N Is the nth tag. In this embodiment, there are multiple texts corresponding to the same tag or a single text corresponding to a single tag. Each text can be classified as one of a set of labels. The text in the text set and the labels in the label set are in one-to-one correspondence in order, i.e. x i Corresponding to y i 。
S22: and randomly downloading a picture for each tag in the tag set to obtain an image set formed by all downloaded pictures.
In this embodiment, for each tag in the tag Set, a picture is randomly selected and downloaded from the internet in a classified manner, and is Set as an Image Set, wherein the picture data includes a fear picture, a happy picture, a … picture, and a surprise picture in sequence, and is recorded as Image set= { v 1 ,v 2 ,…,v N -a }; wherein v is N And the picture corresponding to the Nth label.
S23: the text label set is converted into a text image set.
In this embodiment, according to the type of each tag, the corresponding picture obtained in S22 is used for replacement, and Label set= { y 1 ,y 2 ,…,y N Mapping to Image set= { v 1 ,v 2 ,…,v N Label set= { "ear", "anger", "joy",... I.e. to assemble text labelsMapping to text image set->Wherein x is i For the ith text, y i For the ith tag, V i Is y i And the corresponding pictures, N, are the number of texts in the test set or the number of labels in the test set. In this embodiment, each tag only downloads one picture, and +.>M in (2) is 1, thereby omitting V i And V is i I.e. v i . Wherein the texts in the text set and the labels in the label set are in one-to-one correspondence in order, i.e. x i Corresponding to y i Thus, the number of texts is the same as the number of labels.
Inputting an image set of text and text image sets into a trained CLIP model, a text encoder in the CLIP modelFor text x i Coding to obtain text vector T i Image encoder->For image v in image set in text image set i Coding to obtain image vector I i The expression is as follows:
s3: and calculating the text vector and the image vector to obtain the similarity degree of the picture and the text.
In this embodiment, dot product operation is performed on the calculated text vector T and the image vector I to calculate the similarity between the image and the text in the image set. Wherein the text encoderA transducer network is adopted, the scale is 12 layers 512 wide, and 8 attention heads are provided; image encoder->ResNet or Vision Transformer is used.
S4: according to the calculated similarity and the current classification task type, prediction matching is carried out to obtain a classification result, namely a text label, and the specific process is as follows:
if the classification task type is a single-label classification task, selecting the category with the highest similarity as a final matching result; if the classification task type is a multi-label classification task, selecting a class with similarity larger than a preset threshold as a final matching result; the predicted final match results are as follows:
wherein, information is the final matching result; single Label Task is a single label classification task; t is a preset threshold for the degree of similarity.
As shown in FIG. 3, the classification task type is a single label classification task, text encoding vector T 1 Sum image vector i= (I 1 ,I 2 ,...,I N ) Dot product operations are performed to calculate the degree of similarity of the image and text. Because of the single label classification, the text information only comprises one type of classification, and T with the highest dot product result is taken 1 ·I 2 As a final match result. The text "I felt fear when my mother was heavily ill term" is classified as image data v 2 The corresponding label "spar".
Example 2
The present embodiment provides a zero-sample text classification method based on CLIP, which is different from embodiment 1 in that after a text image set is acquired, an additional semantic prompt word is added before the beginning of a text in a test set for prompt enhancement, and the method is expressed as:
wherein, the liquid crystal display device comprises a liquid crystal display device,text after adding additional moral cue words; prompt is a semantic Prompt for a particular task of a test set of different text classifications, e.g., prompt may be taken as "Sentiment" for emotion classification, and "Intent" for Intent class classification; x is the text in the test set.
As shown in fig. 4, the semantic cue word "Topic". For the text "What is an" imaginary number "", without the sample enhancement, the CLIP model would classify it as matheat, however this is incomplete. By the enhancement of the prompt, the Chinese character is changed into ' Topic: what is an ' imaginary number ', and can be classified into Science and Mathematics.
Example 3
The present embodiment provides a zero sample text classification method based on CLIP, which is different from embodiment 1 in that the text image set acquisition process is as follows:
s21: acquiring a text set and a label set according to the text to be classified; the text set is a set of texts to be classified, and the tag set is a set of possibly belonging to the classification of the texts to be classified;
in this embodiment, the test set data is a text label set, and is recorded asWherein the text Set is a Test Set, and comprises a plurality of texts of' Bye 1 ,x 2 ,…,x N X, where x N Is the nth text; the tag Set is a Label Set, which includes the tags "ear", "anger", "joy", "surrise", noted Label set= { y 1 ,y 2 ,…,y N And N-th tag. In this embodiment, there are multiple labels with the same text or a single label with a single text. Each text can be classified as one of a set of labels.
S22: randomly downloading a plurality of pictures for each tag in a tag set to perform ensemble enhancement to obtain an image set formed by all downloaded pictures;
in this embodiment, as shown in fig. 5, for each tag in the tag Set, a plurality of pictures are randomly selected and downloaded from the internet, and an Image Set is Set, wherein the Image Set includes a fear picture, a happy picture, a … picture and a surprise picture, and the picture data is recorded as a fear picture, a happy picture and a surprise picture in sequence Wherein (1)>The j-th picture in M pictures corresponding to the i-th label; m is the number of pictures corresponding to each tag download for the ensable enhancement, 2 in this example.
S23: the text label set is converted into a text image set.
In this embodiment, according to the type of each tag, the corresponding picture obtained in S22 is used as a replacement, and after the enhancement of the ensembe is used, the tag y is i And (3) withCorrespondingly, i is E (1, N). That is, text tag set +.>Mapping to text image set->Wherein, x is i For the ith text, y i For the ith tag->Is y i And corresponding M pictures, wherein N is the number of texts in the test set.
When the CLIP model matches the text of the test set with the image of the image set, the similarity degree of the text coding vector T and the image vector I is:
wherein M is the number of pictures corresponding to each tag download by the ensable enhancement; in this embodiment, the similarity is obtained by a simple addition method, and when the method is implemented, a specific weight can be set for a specific picture according to actual needs to perform a weighting operation.
For the text "I felt frustrated," angry, utterly detected ". The picture is selected for the" anger "tag if the picture selection effect is not good without the ensable enhancementCorrespondingly, a picture is selected for the "sadness" label->Correspondingly, it can be seen from fig. 5 that erroneous results will be obtained. After the enhancement of the ensamble is adopted, the influence of individual errors of a single picture on a matching result is reduced, and the accuracy is improved.
Example 4
The embodiment provides a zero sample text classification system based on CLIP, which comprises:
and a data acquisition module: the method comprises the steps of obtaining a text to be classified;
and a coding module: for inputting text into a text encoder to obtain text vectors; inputting an image set in the text image set into an image encoder to obtain an image vector;
and a classification prediction module: the method comprises the steps of calculating a text vector and an image vector to obtain the similarity degree of a picture and a text; and the method is used for carrying out prediction matching according to the calculated similarity and the current classification task type to obtain a classification result, namely a text label.
Example 5
The present embodiment provides a computer-readable storage medium storing a computer program which, when called by a processor, performs the steps of a method as described above
It is to be understood that the same or similar parts in the above embodiments may be referred to each other, and that in some embodiments, the same or similar parts in other embodiments may be referred to.
While embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the invention, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the invention.
It should be appreciated that in embodiments of the present invention, the processor may be a central processing unit (Central Processing Unit, CPU), which may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf Programmable gate arrays (FPGAs) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The memory may include read only memory and random access memory and provide instructions and data to the processor. A portion of the memory may also include non-volatile random access memory. For example, the memory may also store information of the device type.
The readable storage medium is a computer readable storage medium, which may be an internal storage unit of the controller according to any one of the foregoing embodiments, for example, a hard disk or a memory of the controller. The readable storage medium may also be an external storage device of the controller, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the controller. Further, the readable storage medium may also include both an internal storage unit and an external storage device of the controller. The readable storage medium is used to store the computer program and other programs and data required by the controller. The readable storage medium may also be used to temporarily store data that has been output or is to be output.
Based on such understanding, the technical solution of the present invention is essentially or a part contributing to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned readable storage medium includes: a U-disk, a removable hard disk, a Read-only Memory (ROM), a random access Memory (RAM, randomAccess Memory), a magnetic disk, an optical disk, or other various media capable of storing program codes.
Claims (10)
1. A zero sample text classification method based on CLIP, comprising:
s1: acquiring a text to be classified;
s2: inputting the text into a text encoder to obtain text vectors, and inputting the image set in the text image set into an image encoder to obtain image vectors;
s3: calculating the text vector and the image vector to obtain the similarity degree of the picture and the text;
s4: and carrying out prediction matching according to the current classification task type and the calculated similarity degree to obtain a text classification result.
2. The CLIP-based zero-sample text classification method of claim 1, wherein said text image set acquisition process is:
s21: acquiring a text set and a label set according to the text to be classified; the text set is a set of texts to be classified, and the tag set is a set of possibly belonging to the classification of the texts to be classified;
s22: randomly downloading a picture aiming at each tag in the tag set to obtain an image set formed by all downloaded pictures;
s23: the text label set is converted into a text image set.
3. The CLIP-based zero-sample text classification method of claim 1, wherein said text image set acquisition process is:
s21: acquiring a text set and a label set according to the text to be classified; the text set is a set of texts to be classified, and the tag set is a set of possibly belonging to the classification of the texts to be classified;
s22: randomly downloading a plurality of pictures for each tag in a tag set to perform ensemble enhancement to obtain an image set formed by all downloaded pictures;
s23: the text label set is converted into a text image set.
4. A zero sample text classification method based on CLIP according to any of claims 2 or 3, wherein the specific process of converting the text label set into a text image set is:
according to the type of each label, the corresponding picture obtained in S22 is adopted to replace, so that the text labels are opposite to each otherMapping to text image set->Wherein x is i For the ith text, y i For the ith tag->Is y i And corresponding M pictures, wherein N is the number of texts in the test set.
5. A CLIP-based zero-sample text classification method according to any of claims 2 or 3, characterized in that after the text image set is acquired, additional semantic cue words are added for prompt enhancement before the beginning of the text in the test set, expressed as:
wherein, prompt is a semantic Prompt for a specific task of a different text classification test set; x is the text in the test set;text after adding additional semantic cue words.
6. The CLIP-based zero-sample text classification method of claim 1, wherein said similarity calculation is performed by dot product operation of a text vector and an image vector.
7. The CLIP-based zero-sample text classification method according to claim 1, wherein said classification task type comprises a single-label classification task, a multi-label classification task.
8. The zero-sample text classification method based on CLIP according to claim 7, wherein the process of performing prediction matching according to the calculated similarity and the current classification task type to obtain the classification result is specifically as follows:
if the classification task type is a single-label classification task, selecting the category with the highest similarity degree as a final matching result;
if the classification task type is the multi-label classification task, selecting the class with the similarity degree larger than a preset threshold value as a final matching result.
9. A CLIP-based zero-sample text classification system, comprising:
and a data acquisition module: the method comprises the steps of obtaining a text to be classified;
and a coding module: for inputting text into a text encoder to obtain text vectors; inputting an image set in the text image set into an image encoder to obtain an image vector;
and a classification prediction module: the method comprises the steps of calculating a text vector and an image vector to obtain the similarity degree of a picture and a text; and the method is used for carrying out prediction matching according to the similarity obtained by calculation and the current classification task type to obtain a text classification result.
10. A computer-readable storage medium, characterized by: a computer program is stored which, when called by a processor, performs: the method of any one of claims 1-8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310778409.5A CN116701637B (en) | 2023-06-29 | 2023-06-29 | Zero sample text classification method, system and medium based on CLIP |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310778409.5A CN116701637B (en) | 2023-06-29 | 2023-06-29 | Zero sample text classification method, system and medium based on CLIP |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116701637A true CN116701637A (en) | 2023-09-05 |
CN116701637B CN116701637B (en) | 2024-03-08 |
Family
ID=87823836
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310778409.5A Active CN116701637B (en) | 2023-06-29 | 2023-06-29 | Zero sample text classification method, system and medium based on CLIP |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116701637B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116935418A (en) * | 2023-09-15 | 2023-10-24 | 成都索贝数码科技股份有限公司 | Automatic three-dimensional graphic template reorganization method, device and system |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH01131960A (en) * | 1988-10-21 | 1989-05-24 | Toshiba Corp | Document and image filing device |
EP1871064A1 (en) * | 2006-06-19 | 2007-12-26 | Research In Motion Limited | Device for transferring information |
US20080310685A1 (en) * | 2007-06-18 | 2008-12-18 | Speigle Jon M | Methods and Systems for Refining Text Segmentation Results |
US20140270347A1 (en) * | 2013-03-13 | 2014-09-18 | Sharp Laboratories Of America, Inc. | Hierarchical image classification system |
CN113449808A (en) * | 2021-07-13 | 2021-09-28 | 广州华多网络科技有限公司 | Multi-source image-text information classification method and corresponding device, equipment and medium |
CN113836298A (en) * | 2021-08-05 | 2021-12-24 | 合肥工业大学 | Text classification method and system based on visual enhancement |
CN114239560A (en) * | 2021-12-03 | 2022-03-25 | 上海人工智能创新中心 | Three-dimensional image classification method, device, equipment and computer-readable storage medium |
US20220343626A1 (en) * | 2019-08-15 | 2022-10-27 | Vision Semantics Limited | Text Based Image Search |
CN115393902A (en) * | 2022-09-26 | 2022-11-25 | 华东师范大学 | Pedestrian re-identification method based on comparison language image pre-training model CLIP |
CN115761314A (en) * | 2022-11-07 | 2023-03-07 | 重庆邮电大学 | E-commerce image and text classification method and system based on prompt learning |
CN115761757A (en) * | 2022-11-04 | 2023-03-07 | 福州大学 | Multi-mode text page classification method based on decoupling feature guidance |
CN116226688A (en) * | 2023-05-10 | 2023-06-06 | 粤港澳大湾区数字经济研究院(福田) | Data processing, image-text searching and image classifying method and related equipment |
CN116702091A (en) * | 2023-06-21 | 2023-09-05 | 中南大学 | Multi-mode ironic intention recognition method, device and equipment based on multi-view CLIP |
CN117421639A (en) * | 2023-11-03 | 2024-01-19 | 中南大学 | Multi-mode data classification method, terminal equipment and storage medium |
-
2023
- 2023-06-29 CN CN202310778409.5A patent/CN116701637B/en active Active
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH01131960A (en) * | 1988-10-21 | 1989-05-24 | Toshiba Corp | Document and image filing device |
EP1871064A1 (en) * | 2006-06-19 | 2007-12-26 | Research In Motion Limited | Device for transferring information |
US20080310685A1 (en) * | 2007-06-18 | 2008-12-18 | Speigle Jon M | Methods and Systems for Refining Text Segmentation Results |
US20140270347A1 (en) * | 2013-03-13 | 2014-09-18 | Sharp Laboratories Of America, Inc. | Hierarchical image classification system |
US20220343626A1 (en) * | 2019-08-15 | 2022-10-27 | Vision Semantics Limited | Text Based Image Search |
CN113449808A (en) * | 2021-07-13 | 2021-09-28 | 广州华多网络科技有限公司 | Multi-source image-text information classification method and corresponding device, equipment and medium |
CN113836298A (en) * | 2021-08-05 | 2021-12-24 | 合肥工业大学 | Text classification method and system based on visual enhancement |
CN114239560A (en) * | 2021-12-03 | 2022-03-25 | 上海人工智能创新中心 | Three-dimensional image classification method, device, equipment and computer-readable storage medium |
CN115393902A (en) * | 2022-09-26 | 2022-11-25 | 华东师范大学 | Pedestrian re-identification method based on comparison language image pre-training model CLIP |
CN115761757A (en) * | 2022-11-04 | 2023-03-07 | 福州大学 | Multi-mode text page classification method based on decoupling feature guidance |
CN115761314A (en) * | 2022-11-07 | 2023-03-07 | 重庆邮电大学 | E-commerce image and text classification method and system based on prompt learning |
CN116226688A (en) * | 2023-05-10 | 2023-06-06 | 粤港澳大湾区数字经济研究院(福田) | Data processing, image-text searching and image classifying method and related equipment |
CN116702091A (en) * | 2023-06-21 | 2023-09-05 | 中南大学 | Multi-mode ironic intention recognition method, device and equipment based on multi-view CLIP |
CN117421639A (en) * | 2023-11-03 | 2024-01-19 | 中南大学 | Multi-mode data classification method, terminal equipment and storage medium |
Non-Patent Citations (5)
Title |
---|
LIBO QIN: ""CLIPText: A New Paradigm for Zero-shot Text Classification"", 《IN FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS:ACL 2023》, 31 July 2023 (2023-07-31), pages 1077 - 1088 * |
WENPENG YIN,JAMAAL HAY,AND DANROTH: ""Bench-marking zero-shot text classification:Datasets ,evaluation and entailment approach"", 《PROCEEDINGS OF THE 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING》, 30 November 2018 (2018-11-30), pages 3914 - 3923 * |
刘亚楠;吴飞;庄越挺;: "基于多模态子空间相关性传递的视频语义挖掘", 计算机研究与发展, no. 01, 15 January 2009 (2009-01-15), pages 3 - 10 * |
吕海峰;蔡明;: "基于概率潜在语义分析模型的分类融合图像标注", 电子技术与软件工程, no. 07, 6 April 2018 (2018-04-06), pages 102 - 104 * |
徐戈;肖永强;汪涛;陈开志;廖祥文;吴运兵;: "基于视觉误差与语义属性的零样本图像分类", 计算机应用, no. 04, 20 November 2018 (2018-11-20), pages 92 - 98 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116935418A (en) * | 2023-09-15 | 2023-10-24 | 成都索贝数码科技股份有限公司 | Automatic three-dimensional graphic template reorganization method, device and system |
CN116935418B (en) * | 2023-09-15 | 2023-12-05 | 成都索贝数码科技股份有限公司 | Automatic three-dimensional graphic template reorganization method, device and system |
Also Published As
Publication number | Publication date |
---|---|
CN116701637B (en) | 2024-03-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
AU2019222819B2 (en) | Method for scaling object detection to a very large number of categories | |
CN109902271B (en) | Text data labeling method, device, terminal and medium based on transfer learning | |
CN110597961B (en) | Text category labeling method and device, electronic equipment and storage medium | |
CN109034203B (en) | Method, device, equipment and medium for training expression recommendation model and recommending expression | |
CN111324769A (en) | Training method of video information processing model, video information processing method and device | |
CN110704586A (en) | Information processing method and system | |
CN110363084A (en) | A kind of class state detection method, device, storage medium and electronics | |
CN116701637B (en) | Zero sample text classification method, system and medium based on CLIP | |
CN114358203A (en) | Training method and device for image description sentence generation module and electronic equipment | |
CN113836992A (en) | Method for identifying label, method, device and equipment for training label identification model | |
CN114218945A (en) | Entity identification method, device, server and storage medium | |
CN114298157A (en) | Short text sentiment classification method, medium and system based on public sentiment big data analysis | |
CN113435499A (en) | Label classification method and device, electronic equipment and storage medium | |
CN112667803A (en) | Text emotion classification method and device | |
CN112132075B (en) | Method and medium for processing image-text content | |
CN111445545B (en) | Text transfer mapping method and device, storage medium and electronic equipment | |
CN117349402A (en) | Emotion cause pair identification method and system based on machine reading understanding | |
CN112633394B (en) | Intelligent user label determination method, terminal equipment and storage medium | |
CN113010717B (en) | Image verse description generation method, device and equipment | |
CN114780757A (en) | Short media label extraction method and device, computer equipment and storage medium | |
CN111767710B (en) | Indonesia emotion classification method, device, equipment and medium | |
Newnham | Machine Learning with Core ML: An iOS developer's guide to implementing machine learning in mobile apps | |
Yang et al. | Automatic metadata information extraction from scientific literature using deep neural networks | |
US20240184860A1 (en) | Methods and arrangements for providing impact imagery | |
CN116150406B (en) | Context sparse entity linking method, device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |