CN116701637A - Zero sample text classification method, system and medium based on CLIP - Google Patents

Zero sample text classification method, system and medium based on CLIP Download PDF

Info

Publication number
CN116701637A
CN116701637A CN202310778409.5A CN202310778409A CN116701637A CN 116701637 A CN116701637 A CN 116701637A CN 202310778409 A CN202310778409 A CN 202310778409A CN 116701637 A CN116701637 A CN 116701637A
Authority
CN
China
Prior art keywords
text
classification
image
label
clip
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310778409.5A
Other languages
Chinese (zh)
Other versions
CN116701637B (en
Inventor
覃立波
李勤政
王玮赟
陈麒光
车万翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central South University
Original Assignee
Central South University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central South University filed Critical Central South University
Priority to CN202310778409.5A priority Critical patent/CN116701637B/en
Publication of CN116701637A publication Critical patent/CN116701637A/en
Application granted granted Critical
Publication of CN116701637B publication Critical patent/CN116701637B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a zero sample text classification method, a zero sample text classification system and a zero sample text classification medium based on a CLIP, wherein the zero sample text classification method comprises the following steps: s1: acquiring a text to be classified; s2: inputting the text into a text encoder to obtain text vectors, and inputting the image set in the text image set into an image encoder to obtain image vectors; s3: calculating the text vector and the image vector to obtain the similarity degree of the picture and the text; s4: and carrying out prediction matching according to the current classification task type and the calculated similarity degree to obtain a text classification result. The text information and the image information are combined and applied to natural language processing, so that the text image matching task which can be solved by reconstructing the text classification task into the CLIP model is realized, and the precision of text classification is improved.

Description

Zero sample text classification method, system and medium based on CLIP
Technical Field
The invention relates to the technical field of Internet, in particular to a zero sample text classification method, system and medium based on CLIP.
Background
With the increasing maturity of internet technology, particularly the continuous progress of deep learning technology and natural language processing technology, the development of text classification technology has been greatly promoted. Meanwhile, the text classification technology is also widely applied in real life, such as intelligent customer service, intelligent mailbox and the like, and can be used for automatically identifying incoming message types, automatically detecting illegal contents and other services; the video platform field can help auditors to automatically carry out marking classification on related contents, so that manpower and material resources are greatly saved, and life experience of people is improved. Meanwhile, as a pre-trained model on a massive text image dataset, the CLIP can directly complete text image matching in the appointed field under the condition of no use of examples, namely zero sample learning.
However, in the existing study of text classification problems, people only pay attention to semantic information in input text, and ignore very valuable image information. For example, when a person sees the word "his mouth is raised", the mind first appears on a smiling picture, after which the person is reasonably considered to be happy, and the emotion expressed by the word is correspondingly classified as "happy". The process combines double information of the text river images, so that the classification result is more accurate, however, the text information and the image information are not combined in the current text classification field, and the process is further applied to natural language tasks.
Disclosure of Invention
The invention provides a zero sample classification method, a zero sample classification system and a zero sample classification medium based on CLIP, wherein the zero sample classification method solves the problem that text information and image information are not combined and applied to natural language tasks.
In a first aspect, the present invention provides a zero-sample text classification method based on CLIP, including:
s1: acquiring a text to be classified;
s2: inputting the text into a text encoder to obtain text vectors, and inputting the image set in the text image set into an image encoder to obtain image vectors;
s3: calculating the text vector and the image vector to obtain the similarity degree of the picture and the text;
s4: and carrying out prediction matching according to the current classification task type and the calculated similarity degree to obtain a text classification result.
Further, the text image set acquisition process comprises the following steps:
s21: acquiring a text set and a label set according to the text to be classified; the text set is a set of texts to be classified, and the tag set is a set of classifications to which the texts to be classified possibly belong.
S22: randomly downloading a picture aiming at each tag in the tag set to obtain an image set formed by all downloaded pictures;
s23: the text label set is converted into a text image set.
Further, the text image set acquisition process comprises the following steps:
s21: acquiring a text set and a label set according to the text to be classified; the text set is a set of texts to be classified, and the tag set is a set of classifications to which the texts to be classified possibly belong.
S22: randomly downloading a plurality of pictures for each tag in a tag set to perform ensemble enhancement to obtain an image set formed by all downloaded pictures;
s23: the text label set is converted into a text image set.
Further, the method is characterized in that the specific process of converting the text label set into the text image set is as follows:
according to the type of each label, the corresponding picture obtained in S22 is adopted to replace, so that the text label set Mapping to text image set->Wherein x is i For the ith text, y i For the ith tag, V i M Is y i And corresponding M pictures, wherein N is the number of texts in the test set.
Further, after the text image set is acquired, additional semantic cue words are added before the beginning of the text in the test set for prompt enhancement, expressed as:
wherein, prompt is a semantic Prompt for a specific task of a different text classification test set; x is the text in the test set;text after adding additional semantic cue words.
Further, the similarity degree is calculated by performing dot product operation on the text vector and the image vector.
Further, the classification task types comprise a single-label classification task and a multi-label classification task.
Further, the process of obtaining the classified result by performing prediction matching according to the calculated similarity and the current classification task type specifically comprises the following steps:
if the classification task type is a single-label classification task, selecting the category with the highest similarity degree as a final matching result;
if the classification task type is the multi-label classification task, selecting the class with the similarity degree larger than a preset threshold value as a final matching result.
In a second aspect, the present invention provides a CLIP-based zero-sample text classification system, comprising:
and a data acquisition module: the method comprises the steps of obtaining a text to be classified;
and a coding module: for inputting text into a text encoder to obtain text vectors; inputting an image set in the text image set into an image encoder to obtain an image vector;
and a classification prediction module: the method comprises the steps of calculating a text vector and an image vector to obtain the similarity degree of a picture and a text; and the method is used for carrying out prediction matching according to the calculated similarity and the current classification task type to obtain a text classification result.
In a third aspect, the present invention provides a computer-readable storage medium: a computer program is stored which, when called by a processor, performs the steps of the method as described above.
Advantageous effects
The invention provides a zero sample classification method, a zero sample classification system and a zero sample classification medium based on a CLIP, wherein the zero sample classification method, the zero sample classification system and the zero sample classification medium based on the CLIP are used for realizing the text image matching task which can be solved by reconstructing a text classification task into a CLIP model by combining text information and image information and applying the zero sample classification method and the zero sample classification medium to natural language processing, and improving the precision of text classification.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a zero sample classification method based on CLIP provided by an embodiment of the invention;
FIG. 2 is an exemplary diagram of a zero sample text classification method based on CLIP provided by an embodiment of the invention;
FIG. 3 is a text image matching architecture diagram of a CLIP model provided by an embodiment of the present invention;
FIG. 4 is a schematic diagram of a campt enhancement mode provided by an embodiment of the present invention;
fig. 5 is a schematic diagram of an ensable enhancement mode provided by an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be described in detail below. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, based on the examples herein, which are within the scope of the invention as defined by the claims, will be within the scope of the invention as defined by the claims.
Example 1
As shown in fig. 1, this embodiment provides a zero sample text classification method based on CLIP, including:
s1: and obtaining the text to be classified. In this embodiment, as shown in fig. 2, the acquired text to be classified is "Bye.
S2: inputting the text into a text encoder to obtain text vectors, and inputting the image set in the text image set into an image encoder to obtain image vectors.
Specifically, the text image set acquisition process comprises the following steps:
s21: acquiring a text set and a label set according to the text to be classified; the text set is a set of texts to be classified, and the tag set is a set of classifications to which the texts to be classified possibly belong. The number of texts in the text set is the same as the number of labels in the label set, for example, the number of texts in the text set { A, B, C, D } is 4, wherein the labels of the text A and the text B are both a, the label of the text C is C, and the label of the text D is D, and the label set corresponding to the text set is { a, a, C, D }.
In this embodiment, the test set data is a text label setThe text Set is a Test Set, and contains a plurality of texts, such as "Byeest Set={x 1 ,x 2 ,…,x N X, where x N Is the nth text; the tag Set is a Label Set, which includes a plurality of tags, such as "ear", "anger", "joy", and..the term "surrise", which is denoted Label set= { y 1 ,y 2 ,…,y N -wherein y N Is the nth tag. In this embodiment, there are multiple texts corresponding to the same tag or a single text corresponding to a single tag. Each text can be classified as one of a set of labels. The text in the text set and the labels in the label set are in one-to-one correspondence in order, i.e. x i Corresponding to y i
S22: and randomly downloading a picture for each tag in the tag set to obtain an image set formed by all downloaded pictures.
In this embodiment, for each tag in the tag Set, a picture is randomly selected and downloaded from the internet in a classified manner, and is Set as an Image Set, wherein the picture data includes a fear picture, a happy picture, a … picture, and a surprise picture in sequence, and is recorded as Image set= { v 1 ,v 2 ,…,v N -a }; wherein v is N And the picture corresponding to the Nth label.
S23: the text label set is converted into a text image set.
In this embodiment, according to the type of each tag, the corresponding picture obtained in S22 is used for replacement, and Label set= { y 1 ,y 2 ,…,y N Mapping to Image set= { v 1 ,v 2 ,…,v N Label set= { "ear", "anger", "joy",... I.e. to assemble text labelsMapping to text image set->Wherein x is i For the ith text, y i For the ith tag, V i Is y i And the corresponding pictures, N, are the number of texts in the test set or the number of labels in the test set. In this embodiment, each tag only downloads one picture, and +.>M in (2) is 1, thereby omitting V i And V is i I.e. v i . Wherein the texts in the text set and the labels in the label set are in one-to-one correspondence in order, i.e. x i Corresponding to y i Thus, the number of texts is the same as the number of labels.
Inputting an image set of text and text image sets into a trained CLIP model, a text encoder in the CLIP modelFor text x i Coding to obtain text vector T i Image encoder->For image v in image set in text image set i Coding to obtain image vector I i The expression is as follows:
s3: and calculating the text vector and the image vector to obtain the similarity degree of the picture and the text.
In this embodiment, dot product operation is performed on the calculated text vector T and the image vector I to calculate the similarity between the image and the text in the image set. Wherein the text encoderA transducer network is adopted, the scale is 12 layers 512 wide, and 8 attention heads are provided; image encoder->ResNet or Vision Transformer is used.
S4: according to the calculated similarity and the current classification task type, prediction matching is carried out to obtain a classification result, namely a text label, and the specific process is as follows:
if the classification task type is a single-label classification task, selecting the category with the highest similarity as a final matching result; if the classification task type is a multi-label classification task, selecting a class with similarity larger than a preset threshold as a final matching result; the predicted final match results are as follows:
wherein, information is the final matching result; single Label Task is a single label classification task; t is a preset threshold for the degree of similarity.
As shown in FIG. 3, the classification task type is a single label classification task, text encoding vector T 1 Sum image vector i= (I 1 ,I 2 ,...,I N ) Dot product operations are performed to calculate the degree of similarity of the image and text. Because of the single label classification, the text information only comprises one type of classification, and T with the highest dot product result is taken 1 ·I 2 As a final match result. The text "I felt fear when my mother was heavily ill term" is classified as image data v 2 The corresponding label "spar".
Example 2
The present embodiment provides a zero-sample text classification method based on CLIP, which is different from embodiment 1 in that after a text image set is acquired, an additional semantic prompt word is added before the beginning of a text in a test set for prompt enhancement, and the method is expressed as:
wherein, the liquid crystal display device comprises a liquid crystal display device,text after adding additional moral cue words; prompt is a semantic Prompt for a particular task of a test set of different text classifications, e.g., prompt may be taken as "Sentiment" for emotion classification, and "Intent" for Intent class classification; x is the text in the test set.
As shown in fig. 4, the semantic cue word "Topic". For the text "What is an" imaginary number "", without the sample enhancement, the CLIP model would classify it as matheat, however this is incomplete. By the enhancement of the prompt, the Chinese character is changed into ' Topic: what is an ' imaginary number ', and can be classified into Science and Mathematics.
Example 3
The present embodiment provides a zero sample text classification method based on CLIP, which is different from embodiment 1 in that the text image set acquisition process is as follows:
s21: acquiring a text set and a label set according to the text to be classified; the text set is a set of texts to be classified, and the tag set is a set of possibly belonging to the classification of the texts to be classified;
in this embodiment, the test set data is a text label set, and is recorded asWherein the text Set is a Test Set, and comprises a plurality of texts of' Bye 1 ,x 2 ,…,x N X, where x N Is the nth text; the tag Set is a Label Set, which includes the tags "ear", "anger", "joy", "surrise", noted Label set= { y 1 ,y 2 ,…,y N And N-th tag. In this embodiment, there are multiple labels with the same text or a single label with a single text. Each text can be classified as one of a set of labels.
S22: randomly downloading a plurality of pictures for each tag in a tag set to perform ensemble enhancement to obtain an image set formed by all downloaded pictures;
in this embodiment, as shown in fig. 5, for each tag in the tag Set, a plurality of pictures are randomly selected and downloaded from the internet, and an Image Set is Set, wherein the Image Set includes a fear picture, a happy picture, a … picture and a surprise picture, and the picture data is recorded as a fear picture, a happy picture and a surprise picture in sequence Wherein (1)>The j-th picture in M pictures corresponding to the i-th label; m is the number of pictures corresponding to each tag download for the ensable enhancement, 2 in this example.
S23: the text label set is converted into a text image set.
In this embodiment, according to the type of each tag, the corresponding picture obtained in S22 is used as a replacement, and after the enhancement of the ensembe is used, the tag y is i And (3) withCorrespondingly, i is E (1, N). That is, text tag set +.>Mapping to text image set->Wherein, x is i For the ith text, y i For the ith tag->Is y i And corresponding M pictures, wherein N is the number of texts in the test set.
When the CLIP model matches the text of the test set with the image of the image set, the similarity degree of the text coding vector T and the image vector I is:
wherein M is the number of pictures corresponding to each tag download by the ensable enhancement; in this embodiment, the similarity is obtained by a simple addition method, and when the method is implemented, a specific weight can be set for a specific picture according to actual needs to perform a weighting operation.
For the text "I felt frustrated," angry, utterly detected ". The picture is selected for the" anger "tag if the picture selection effect is not good without the ensable enhancementCorrespondingly, a picture is selected for the "sadness" label->Correspondingly, it can be seen from fig. 5 that erroneous results will be obtained. After the enhancement of the ensamble is adopted, the influence of individual errors of a single picture on a matching result is reduced, and the accuracy is improved.
Example 4
The embodiment provides a zero sample text classification system based on CLIP, which comprises:
and a data acquisition module: the method comprises the steps of obtaining a text to be classified;
and a coding module: for inputting text into a text encoder to obtain text vectors; inputting an image set in the text image set into an image encoder to obtain an image vector;
and a classification prediction module: the method comprises the steps of calculating a text vector and an image vector to obtain the similarity degree of a picture and a text; and the method is used for carrying out prediction matching according to the calculated similarity and the current classification task type to obtain a classification result, namely a text label.
Example 5
The present embodiment provides a computer-readable storage medium storing a computer program which, when called by a processor, performs the steps of a method as described above
It is to be understood that the same or similar parts in the above embodiments may be referred to each other, and that in some embodiments, the same or similar parts in other embodiments may be referred to.
While embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the invention, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the invention.
It should be appreciated that in embodiments of the present invention, the processor may be a central processing unit (Central Processing Unit, CPU), which may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf Programmable gate arrays (FPGAs) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The memory may include read only memory and random access memory and provide instructions and data to the processor. A portion of the memory may also include non-volatile random access memory. For example, the memory may also store information of the device type.
The readable storage medium is a computer readable storage medium, which may be an internal storage unit of the controller according to any one of the foregoing embodiments, for example, a hard disk or a memory of the controller. The readable storage medium may also be an external storage device of the controller, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the controller. Further, the readable storage medium may also include both an internal storage unit and an external storage device of the controller. The readable storage medium is used to store the computer program and other programs and data required by the controller. The readable storage medium may also be used to temporarily store data that has been output or is to be output.
Based on such understanding, the technical solution of the present invention is essentially or a part contributing to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned readable storage medium includes: a U-disk, a removable hard disk, a Read-only Memory (ROM), a random access Memory (RAM, randomAccess Memory), a magnetic disk, an optical disk, or other various media capable of storing program codes.

Claims (10)

1. A zero sample text classification method based on CLIP, comprising:
s1: acquiring a text to be classified;
s2: inputting the text into a text encoder to obtain text vectors, and inputting the image set in the text image set into an image encoder to obtain image vectors;
s3: calculating the text vector and the image vector to obtain the similarity degree of the picture and the text;
s4: and carrying out prediction matching according to the current classification task type and the calculated similarity degree to obtain a text classification result.
2. The CLIP-based zero-sample text classification method of claim 1, wherein said text image set acquisition process is:
s21: acquiring a text set and a label set according to the text to be classified; the text set is a set of texts to be classified, and the tag set is a set of possibly belonging to the classification of the texts to be classified;
s22: randomly downloading a picture aiming at each tag in the tag set to obtain an image set formed by all downloaded pictures;
s23: the text label set is converted into a text image set.
3. The CLIP-based zero-sample text classification method of claim 1, wherein said text image set acquisition process is:
s21: acquiring a text set and a label set according to the text to be classified; the text set is a set of texts to be classified, and the tag set is a set of possibly belonging to the classification of the texts to be classified;
s22: randomly downloading a plurality of pictures for each tag in a tag set to perform ensemble enhancement to obtain an image set formed by all downloaded pictures;
s23: the text label set is converted into a text image set.
4. A zero sample text classification method based on CLIP according to any of claims 2 or 3, wherein the specific process of converting the text label set into a text image set is:
according to the type of each label, the corresponding picture obtained in S22 is adopted to replace, so that the text labels are opposite to each otherMapping to text image set->Wherein x is i For the ith text, y i For the ith tag->Is y i And corresponding M pictures, wherein N is the number of texts in the test set.
5. A CLIP-based zero-sample text classification method according to any of claims 2 or 3, characterized in that after the text image set is acquired, additional semantic cue words are added for prompt enhancement before the beginning of the text in the test set, expressed as:
wherein, prompt is a semantic Prompt for a specific task of a different text classification test set; x is the text in the test set;text after adding additional semantic cue words.
6. The CLIP-based zero-sample text classification method of claim 1, wherein said similarity calculation is performed by dot product operation of a text vector and an image vector.
7. The CLIP-based zero-sample text classification method according to claim 1, wherein said classification task type comprises a single-label classification task, a multi-label classification task.
8. The zero-sample text classification method based on CLIP according to claim 7, wherein the process of performing prediction matching according to the calculated similarity and the current classification task type to obtain the classification result is specifically as follows:
if the classification task type is a single-label classification task, selecting the category with the highest similarity degree as a final matching result;
if the classification task type is the multi-label classification task, selecting the class with the similarity degree larger than a preset threshold value as a final matching result.
9. A CLIP-based zero-sample text classification system, comprising:
and a data acquisition module: the method comprises the steps of obtaining a text to be classified;
and a coding module: for inputting text into a text encoder to obtain text vectors; inputting an image set in the text image set into an image encoder to obtain an image vector;
and a classification prediction module: the method comprises the steps of calculating a text vector and an image vector to obtain the similarity degree of a picture and a text; and the method is used for carrying out prediction matching according to the similarity obtained by calculation and the current classification task type to obtain a text classification result.
10. A computer-readable storage medium, characterized by: a computer program is stored which, when called by a processor, performs: the method of any one of claims 1-8.
CN202310778409.5A 2023-06-29 2023-06-29 Zero sample text classification method, system and medium based on CLIP Active CN116701637B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310778409.5A CN116701637B (en) 2023-06-29 2023-06-29 Zero sample text classification method, system and medium based on CLIP

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310778409.5A CN116701637B (en) 2023-06-29 2023-06-29 Zero sample text classification method, system and medium based on CLIP

Publications (2)

Publication Number Publication Date
CN116701637A true CN116701637A (en) 2023-09-05
CN116701637B CN116701637B (en) 2024-03-08

Family

ID=87823836

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310778409.5A Active CN116701637B (en) 2023-06-29 2023-06-29 Zero sample text classification method, system and medium based on CLIP

Country Status (1)

Country Link
CN (1) CN116701637B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116935418A (en) * 2023-09-15 2023-10-24 成都索贝数码科技股份有限公司 Automatic three-dimensional graphic template reorganization method, device and system

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH01131960A (en) * 1988-10-21 1989-05-24 Toshiba Corp Document and image filing device
EP1871064A1 (en) * 2006-06-19 2007-12-26 Research In Motion Limited Device for transferring information
US20080310685A1 (en) * 2007-06-18 2008-12-18 Speigle Jon M Methods and Systems for Refining Text Segmentation Results
US20140270347A1 (en) * 2013-03-13 2014-09-18 Sharp Laboratories Of America, Inc. Hierarchical image classification system
CN113449808A (en) * 2021-07-13 2021-09-28 广州华多网络科技有限公司 Multi-source image-text information classification method and corresponding device, equipment and medium
CN113836298A (en) * 2021-08-05 2021-12-24 合肥工业大学 Text classification method and system based on visual enhancement
CN114239560A (en) * 2021-12-03 2022-03-25 上海人工智能创新中心 Three-dimensional image classification method, device, equipment and computer-readable storage medium
US20220343626A1 (en) * 2019-08-15 2022-10-27 Vision Semantics Limited Text Based Image Search
CN115393902A (en) * 2022-09-26 2022-11-25 华东师范大学 Pedestrian re-identification method based on comparison language image pre-training model CLIP
CN115761314A (en) * 2022-11-07 2023-03-07 重庆邮电大学 E-commerce image and text classification method and system based on prompt learning
CN115761757A (en) * 2022-11-04 2023-03-07 福州大学 Multi-mode text page classification method based on decoupling feature guidance
CN116226688A (en) * 2023-05-10 2023-06-06 粤港澳大湾区数字经济研究院(福田) Data processing, image-text searching and image classifying method and related equipment
CN116702091A (en) * 2023-06-21 2023-09-05 中南大学 Multi-mode ironic intention recognition method, device and equipment based on multi-view CLIP
CN117421639A (en) * 2023-11-03 2024-01-19 中南大学 Multi-mode data classification method, terminal equipment and storage medium

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH01131960A (en) * 1988-10-21 1989-05-24 Toshiba Corp Document and image filing device
EP1871064A1 (en) * 2006-06-19 2007-12-26 Research In Motion Limited Device for transferring information
US20080310685A1 (en) * 2007-06-18 2008-12-18 Speigle Jon M Methods and Systems for Refining Text Segmentation Results
US20140270347A1 (en) * 2013-03-13 2014-09-18 Sharp Laboratories Of America, Inc. Hierarchical image classification system
US20220343626A1 (en) * 2019-08-15 2022-10-27 Vision Semantics Limited Text Based Image Search
CN113449808A (en) * 2021-07-13 2021-09-28 广州华多网络科技有限公司 Multi-source image-text information classification method and corresponding device, equipment and medium
CN113836298A (en) * 2021-08-05 2021-12-24 合肥工业大学 Text classification method and system based on visual enhancement
CN114239560A (en) * 2021-12-03 2022-03-25 上海人工智能创新中心 Three-dimensional image classification method, device, equipment and computer-readable storage medium
CN115393902A (en) * 2022-09-26 2022-11-25 华东师范大学 Pedestrian re-identification method based on comparison language image pre-training model CLIP
CN115761757A (en) * 2022-11-04 2023-03-07 福州大学 Multi-mode text page classification method based on decoupling feature guidance
CN115761314A (en) * 2022-11-07 2023-03-07 重庆邮电大学 E-commerce image and text classification method and system based on prompt learning
CN116226688A (en) * 2023-05-10 2023-06-06 粤港澳大湾区数字经济研究院(福田) Data processing, image-text searching and image classifying method and related equipment
CN116702091A (en) * 2023-06-21 2023-09-05 中南大学 Multi-mode ironic intention recognition method, device and equipment based on multi-view CLIP
CN117421639A (en) * 2023-11-03 2024-01-19 中南大学 Multi-mode data classification method, terminal equipment and storage medium

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
LIBO QIN: ""CLIPText: A New Paradigm for Zero-shot Text Classification"", 《IN FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS:ACL 2023》, 31 July 2023 (2023-07-31), pages 1077 - 1088 *
WENPENG YIN,JAMAAL HAY,AND DANROTH: ""Bench-marking zero-shot text classification:Datasets ,evaluation and entailment approach"", 《PROCEEDINGS OF THE 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING》, 30 November 2018 (2018-11-30), pages 3914 - 3923 *
刘亚楠;吴飞;庄越挺;: "基于多模态子空间相关性传递的视频语义挖掘", 计算机研究与发展, no. 01, 15 January 2009 (2009-01-15), pages 3 - 10 *
吕海峰;蔡明;: "基于概率潜在语义分析模型的分类融合图像标注", 电子技术与软件工程, no. 07, 6 April 2018 (2018-04-06), pages 102 - 104 *
徐戈;肖永强;汪涛;陈开志;廖祥文;吴运兵;: "基于视觉误差与语义属性的零样本图像分类", 计算机应用, no. 04, 20 November 2018 (2018-11-20), pages 92 - 98 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116935418A (en) * 2023-09-15 2023-10-24 成都索贝数码科技股份有限公司 Automatic three-dimensional graphic template reorganization method, device and system
CN116935418B (en) * 2023-09-15 2023-12-05 成都索贝数码科技股份有限公司 Automatic three-dimensional graphic template reorganization method, device and system

Also Published As

Publication number Publication date
CN116701637B (en) 2024-03-08

Similar Documents

Publication Publication Date Title
AU2019222819B2 (en) Method for scaling object detection to a very large number of categories
CN109902271B (en) Text data labeling method, device, terminal and medium based on transfer learning
CN110597961B (en) Text category labeling method and device, electronic equipment and storage medium
CN109034203B (en) Method, device, equipment and medium for training expression recommendation model and recommending expression
CN111324769A (en) Training method of video information processing model, video information processing method and device
CN110704586A (en) Information processing method and system
CN110363084A (en) A kind of class state detection method, device, storage medium and electronics
CN116701637B (en) Zero sample text classification method, system and medium based on CLIP
CN114358203A (en) Training method and device for image description sentence generation module and electronic equipment
CN113836992A (en) Method for identifying label, method, device and equipment for training label identification model
CN114218945A (en) Entity identification method, device, server and storage medium
CN114298157A (en) Short text sentiment classification method, medium and system based on public sentiment big data analysis
CN113435499A (en) Label classification method and device, electronic equipment and storage medium
CN112667803A (en) Text emotion classification method and device
CN112132075B (en) Method and medium for processing image-text content
CN111445545B (en) Text transfer mapping method and device, storage medium and electronic equipment
CN117349402A (en) Emotion cause pair identification method and system based on machine reading understanding
CN112633394B (en) Intelligent user label determination method, terminal equipment and storage medium
CN113010717B (en) Image verse description generation method, device and equipment
CN114780757A (en) Short media label extraction method and device, computer equipment and storage medium
CN111767710B (en) Indonesia emotion classification method, device, equipment and medium
Newnham Machine Learning with Core ML: An iOS developer's guide to implementing machine learning in mobile apps
Yang et al. Automatic metadata information extraction from scientific literature using deep neural networks
US20240184860A1 (en) Methods and arrangements for providing impact imagery
CN116150406B (en) Context sparse entity linking method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant