CN116701637B - Zero sample text classification method, system and medium based on CLIP - Google Patents

Zero sample text classification method, system and medium based on CLIP Download PDF

Info

Publication number
CN116701637B
CN116701637B CN202310778409.5A CN202310778409A CN116701637B CN 116701637 B CN116701637 B CN 116701637B CN 202310778409 A CN202310778409 A CN 202310778409A CN 116701637 B CN116701637 B CN 116701637B
Authority
CN
China
Prior art keywords
text
image
classification
label
classified
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310778409.5A
Other languages
Chinese (zh)
Other versions
CN116701637A (en
Inventor
覃立波
李勤政
王玮赟
陈麒光
车万翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central South University
Original Assignee
Central South University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central South University filed Critical Central South University
Priority to CN202310778409.5A priority Critical patent/CN116701637B/en
Publication of CN116701637A publication Critical patent/CN116701637A/en
Application granted granted Critical
Publication of CN116701637B publication Critical patent/CN116701637B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a zero sample text classification method, a zero sample text classification system and a zero sample text classification medium based on a CLIP, wherein the zero sample text classification method comprises the following steps: s1: acquiring a text to be classified; s2: inputting the text into a text encoder to obtain text vectors, and inputting the image set in the text image set into an image encoder to obtain image vectors; s3: calculating the text vector and the image vector to obtain the similarity degree of the picture and the text; s4: and carrying out prediction matching according to the current classification task type and the calculated similarity degree to obtain a text classification result. The text information and the image information are combined and applied to natural language processing, so that the text image matching task which can be solved by reconstructing the text classification task into the CLIP model is realized, and the precision of text classification is improved.

Description

Zero sample text classification method, system and medium based on CLIP
Technical Field
The invention relates to the technical field of Internet, in particular to a zero sample text classification method, system and medium based on CLIP.
Background
With the increasing maturity of internet technology, particularly the continuous progress of deep learning technology and natural language processing technology, the development of text classification technology has been greatly promoted. Meanwhile, the text classification technology is also widely applied in real life, such as intelligent customer service, intelligent mailbox and the like, and can be used for automatically identifying incoming message types, automatically detecting illegal contents and other services; the video platform field can help auditors to automatically carry out marking classification on related contents, so that manpower and material resources are greatly saved, and life experience of people is improved. Meanwhile, as a pre-trained model on a massive text image dataset, the CLIP can directly complete text image matching in the appointed field under the condition of no use of examples, namely zero sample learning.
However, in the existing study of text classification problems, people only pay attention to semantic information in input text, and ignore very valuable image information. For example, when a person sees the word "his mouth is raised", the mind first appears on a smiling picture, after which the person is reasonably considered to be happy, and the emotion expressed by the word is correspondingly classified as "happy". The process combines double information of the text river images, so that the classification result is more accurate, however, the text information and the image information are not combined in the current text classification field, and the process is further applied to natural language tasks.
Disclosure of Invention
The invention provides a zero sample classification method, a zero sample classification system and a zero sample classification medium based on CLIP, wherein the zero sample classification method solves the problem that text information and image information are not combined and applied to natural language tasks.
In a first aspect, the present invention provides a zero-sample text classification method based on CLIP, including:
s1: acquiring a text to be classified;
s2: inputting the text into a text encoder to obtain text vectors, and inputting the image set in the text image set into an image encoder to obtain image vectors;
s3: calculating the text vector and the image vector to obtain the similarity degree of the picture and the text;
s4: and carrying out prediction matching according to the current classification task type and the calculated similarity degree to obtain a text classification result.
Further, the text image set acquisition process comprises the following steps:
s21: acquiring a text set and a label set according to the text to be classified; the text set is a set of texts to be classified, and the tag set is a set of classifications to which the texts to be classified possibly belong.
S22: randomly downloading a picture aiming at each tag in the tag set to obtain an image set formed by all downloaded pictures;
s23: the text label set is converted into a text image set.
Further, the text image set acquisition process comprises the following steps:
s21: acquiring a text set and a label set according to the text to be classified; the text set is a set of texts to be classified, and the tag set is a set of classifications to which the texts to be classified possibly belong.
S22: randomly downloading a plurality of pictures for each tag in a tag set to perform ensemble enhancement to obtain an image set formed by all downloaded pictures;
s23: the text label set is converted into a text image set.
Further, the method is characterized in that the specific process of converting the text label set into the text image set is as follows:
according to the type of each label, the corresponding picture obtained in S22 is adopted to replace, so that the text label set Mapping to text image set->Wherein x is i For the ith text, y i For the ith tag, V i M Is y i And corresponding M pictures, wherein N is the number of texts in the test set.
Further, after the text image set is acquired, additional semantic cue words are added before the beginning of the text in the test set for prompt enhancement, expressed as:
wherein, prompt is a semantic Prompt for a specific task of a different text classification test set; x is the text in the test set;text after adding additional semantic cue words.
Further, the similarity degree is calculated by performing dot product operation on the text vector and the image vector.
Further, the classification task types comprise a single-label classification task and a multi-label classification task.
Further, the process of obtaining the classified result by performing prediction matching according to the calculated similarity and the current classification task type specifically comprises the following steps:
if the classification task type is a single-label classification task, selecting the category with the highest similarity degree as a final matching result;
if the classification task type is the multi-label classification task, selecting the class with the similarity degree larger than a preset threshold value as a final matching result.
In a second aspect, the present invention provides a CLIP-based zero-sample text classification system, comprising:
and a data acquisition module: the method comprises the steps of obtaining a text to be classified;
and a coding module: for inputting text into a text encoder to obtain text vectors; inputting an image set in the text image set into an image encoder to obtain an image vector;
and a classification prediction module: the method comprises the steps of calculating a text vector and an image vector to obtain the similarity degree of a picture and a text; and the method is used for carrying out prediction matching according to the calculated similarity and the current classification task type to obtain a text classification result.
In a third aspect, the present invention provides a computer-readable storage medium: a computer program is stored which, when called by a processor, performs the steps of the method as described above.
Advantageous effects
The invention provides a zero sample classification method, a zero sample classification system and a zero sample classification medium based on a CLIP, wherein the zero sample classification method, the zero sample classification system and the zero sample classification medium based on the CLIP are used for realizing the text image matching task which can be solved by reconstructing a text classification task into a CLIP model by combining text information and image information and applying the zero sample classification method and the zero sample classification medium to natural language processing, and improving the precision of text classification.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a zero sample classification method based on CLIP provided by an embodiment of the invention;
FIG. 2 is an exemplary diagram of a zero sample text classification method based on CLIP provided by an embodiment of the invention;
FIG. 3 is a text image matching architecture diagram of a CLIP model provided by an embodiment of the present invention;
FIG. 4 is a schematic diagram of a campt enhancement mode provided by an embodiment of the present invention;
fig. 5 is a schematic diagram of an ensable enhancement mode provided by an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be described in detail below. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, based on the examples herein, which are within the scope of the invention as defined by the claims, will be within the scope of the invention as defined by the claims.
Example 1
As shown in fig. 1, this embodiment provides a zero sample text classification method based on CLIP, including:
s1: and obtaining the text to be classified. In this embodiment, as shown in fig. 2, the acquired text to be classified is "Bye.
S2: inputting the text into a text encoder to obtain text vectors, and inputting the image set in the text image set into an image encoder to obtain image vectors.
Specifically, the text image set acquisition process comprises the following steps:
s21: acquiring a text set and a label set according to the text to be classified; the text set is a set of texts to be classified, and the tag set is a set of classifications to which the texts to be classified possibly belong. The number of texts in the text set is the same as the number of labels in the label set, for example, the number of texts in the text set { A, B, C, D } is 4, wherein the labels of the text A and the text B are both a, the label of the text C is C, and the label of the text D is D, and the label set corresponding to the text set is { a, a, C, D }.
In this embodiment, the test set data is a text label setThe text Set is a Test Set, and contains a plurality of texts, such as "Bye 1 ,x 2 ,…,x N X, where x N Is the nth text; the tag Set is a Label Set, which includes a plurality of tags, such as "ear", "anger", "joy", and..the term "surrise", which is denoted Label set= { y 1 ,y 2 ,…,y N -wherein y N Is the nth tag. In this embodiment, there are multiple texts corresponding to the same tag or a single text corresponding to a single tag. Each text can be classified as one of a set of labels. The text in the text set and the labels in the label set are in one-to-one correspondence in order, i.e. x i Corresponding to y i
S22: and randomly downloading a picture for each tag in the tag set to obtain an image set formed by all downloaded pictures.
In this embodiment, for each tag in the tag Set, a picture is randomly selected and downloaded from the internet in a classified manner, and is Set as an Image Set, wherein the picture data includes a fear picture, a happy picture, a … picture, and a surprise picture in sequence, and is recorded as Image set= { v 1 ,v 2 ,…,v N -a }; wherein v is N And the picture corresponding to the Nth label.
S23: the text label set is converted into a text image set.
In this embodiment, according to the type of each tag, the corresponding picture obtained in S22 is used for replacement, and Label set= { y 1 ,y 2 ,…,y N Mapping to Image set= { v 1 ,v 2 ,…,v N Label set= { "ear", "anger", "joy",... I.e. to assemble text labelsMapping toText image set->Wherein x is i For the ith text, y i For the ith tag, V i Is y i And the corresponding pictures, N, are the number of texts in the test set or the number of labels in the test set. In this embodiment, each tag only downloads one picture, and +.>M in (2) is 1, thereby omitting V i And V is i I.e. v i . Wherein the texts in the text set and the labels in the label set are in one-to-one correspondence in order, i.e. x i Corresponding to y i Thus, the number of texts is the same as the number of labels.
Inputting an image set of text and text image sets into a trained CLIP model, a text encoder in the CLIP modelFor text x i Coding to obtain text vector T i Image encoder->For image v in image set in text image set i Coding to obtain image vector I i The expression is as follows:
s3: and calculating the text vector and the image vector to obtain the similarity degree of the picture and the text.
In this embodiment, dot product operation is performed on the calculated text vector T and image vector IAnd calculating the similarity degree of the images in the image set and the text. Wherein the text encoderA transducer network is adopted, the scale is 12 layers 512 wide, and 8 attention heads are provided; image encoder->ResNet or Vision Transformer is used.
S4: according to the calculated similarity and the current classification task type, prediction matching is carried out to obtain a classification result, namely a text label, and the specific process is as follows:
if the classification task type is a single-label classification task, selecting the category with the highest similarity as a final matching result; if the classification task type is a multi-label classification task, selecting a class with similarity larger than a preset threshold as a final matching result; the predicted final match results are as follows:
wherein, information is the final matching result; single Label Task is a single label classification task; t is a preset threshold for the degree of similarity.
As shown in FIG. 3, the classification task type is a single label classification task, text encoding vector T 1 Sum image vector i= (I 1 ,I 2 ,...,I N ) Dot product operations are performed to calculate the degree of similarity of the image and text. Because of the single label classification, the text information only comprises one type of classification, and T with the highest dot product result is taken 1 ·I 2 As a final match result. The text "I felt fear when my mother was heavily ill term" is classified as image data v 2 The corresponding label "spar".
Example 2
The present embodiment provides a zero-sample text classification method based on CLIP, which is different from embodiment 1 in that after a text image set is acquired, an additional semantic prompt word is added before the beginning of a text in a test set for prompt enhancement, and the method is expressed as:
wherein,text after adding additional moral cue words; prompt is a semantic Prompt for a particular task of a test set of different text classifications, e.g., prompt may be taken as "Sentiment" for emotion classification, and "Intent" for Intent class classification; x is the text in the test set.
As shown in fig. 4, the semantic cue word "Topic". For the text "What is an" imaginary number "", without the sample enhancement, the CLIP model would classify it as matheat, however this is incomplete. By the enhancement of the prompt, the Chinese character is changed into ' Topic: what is an ' imaginary number ', and can be classified into Science and Mathematics.
Example 3
The present embodiment provides a zero sample text classification method based on CLIP, which is different from embodiment 1 in that the text image set acquisition process is as follows:
s21: acquiring a text set and a label set according to the text to be classified; the text set is a set of texts to be classified, and the tag set is a set of possibly belonging to the classification of the texts to be classified;
in this embodiment, the test set data is a text label set, and is recorded asWherein the text Set is a Test Set, and comprises a plurality of texts of' Bye 1 ,x 2 ,…,x N X, where x N Is the nth text; the tag Set is a Label Set, including the tags "ear", "anger", "joy",.Surpise ", denoted Label set= { y 1 ,y 2 ,…,y N And N-th tag. In this embodiment, there are multiple texts corresponding to the same tag or a single text corresponding to a single tag. Each text can be classified as one of a set of labels.
S22: randomly downloading a plurality of pictures for each tag in a tag set to perform ensemble enhancement to obtain an image set formed by all downloaded pictures;
in this embodiment, as shown in fig. 5, for each tag in the tag Set, a plurality of pictures are randomly selected and downloaded from the internet, and an Image Set is Set, wherein the Image Set includes a fear picture, a happy picture, a … picture and a surprise picture, and the picture data is recorded as a fear picture, a happy picture and a surprise picture in sequence Wherein (1)>The j-th picture in M pictures corresponding to the i-th label; m is the number of pictures corresponding to each tag download for the ensable enhancement, 2 in this example.
S23: the text label set is converted into a text image set.
In this embodiment, according to the type of each tag, the corresponding picture obtained in S22 is used as a replacement, and after the enhancement of the ensembe is used, the tag y is i And (3) withCorrespondingly, i is E (1, N). That is, text tag set +.>Mapping to text image set->Wherein, x is i For the ith text, y i For the ith tag->Is y i And corresponding M pictures, wherein N is the number of texts in the test set.
When the CLIP model matches the text of the test set with the image of the image set, the similarity degree of the text coding vector T and the image vector I is:
wherein M is the number of pictures corresponding to each tag download by the ensable enhancement; in this embodiment, the similarity is obtained by a simple addition method, and when the method is implemented, a specific weight can be set for a specific picture according to actual needs to perform a weighting operation.
For the text "I felt frustrated," angry, utterly detected ". The picture is selected for the" anger "tag if the picture selection effect is not good without the ensable enhancementCorrespondingly, a picture is selected for the "sadness" label->Correspondingly, it can be seen from fig. 5 that erroneous results will be obtained. After the enhancement of the ensamble is adopted, the influence of individual errors of a single picture on a matching result is reduced, and the accuracy is improved.
Example 4
The embodiment provides a zero sample text classification system based on CLIP, which comprises:
and a data acquisition module: the method comprises the steps of obtaining a text to be classified;
and a coding module: for inputting text into a text encoder to obtain text vectors; inputting an image set in the text image set into an image encoder to obtain an image vector;
and a classification prediction module: the method comprises the steps of calculating a text vector and an image vector to obtain the similarity degree of a picture and a text; and the method is used for carrying out prediction matching according to the calculated similarity and the current classification task type to obtain a classification result, namely a text label.
Example 5
The present embodiment provides a computer-readable storage medium storing a computer program which, when called by a processor, performs the steps of a method as described above
It is to be understood that the same or similar parts in the above embodiments may be referred to each other, and that in some embodiments, the same or similar parts in other embodiments may be referred to.
While embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the invention, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the invention.
It should be appreciated that in embodiments of the present invention, the processor may be a central processing unit (Central Processing Unit, CPU), which may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf Programmable gate arrays (FPGAs) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The memory may include read only memory and random access memory and provide instructions and data to the processor. A portion of the memory may also include non-volatile random access memory. For example, the memory may also store information of the device type.
The readable storage medium is a computer readable storage medium, which may be an internal storage unit of the controller according to any one of the foregoing embodiments, for example, a hard disk or a memory of the controller. The readable storage medium may also be an external storage device of the controller, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the controller. Further, the readable storage medium may also include both an internal storage unit and an external storage device of the controller. The readable storage medium is used to store the computer program and other programs and data required by the controller. The readable storage medium may also be used to temporarily store data that has been output or is to be output.
Based on such understanding, the technical solution of the present invention is essentially or a part contributing to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned readable storage medium includes: a U-disk, a removable hard disk, a Read-only Memory (ROM), a random access Memory (RAM, randomAccess Memory), a magnetic disk, an optical disk, or other various media capable of storing program codes.

Claims (4)

1. A zero sample text classification method based on CLIP, comprising:
s1: acquiring a text to be classified;
s2: inputting the text into a text encoder to obtain text vectors, and inputting the image set in the text image set into an image encoder to obtain image vectors;
the text image set acquisition process comprises the following steps of:
s21: acquiring a text set and a label set according to the text to be classified; the text set is a set of texts to be classified, and the tag set is a set of possibly belonging to the classification of the texts to be classified;
s22: randomly downloading a picture aiming at each tag in the tag set to obtain an image set formed by all downloaded pictures;
s23: converting the text label set into a text image set;
or, the text image set acquisition process comprises the following steps:
s21: acquiring a text set and a label set according to the text to be classified; the text set is a set of texts to be classified, and the tag set is a set of possibly belonging to the classification of the texts to be classified;
s22: randomly downloading a plurality of pictures for each tag in a tag set to perform ensemble enhancement to obtain an image set formed by all downloaded pictures;
s23: converting the text label set into a text image set;
the specific process of converting the text label set into the text image set is as follows:
according to the type of each label, the corresponding picture obtained in S22 is adopted to replace, so that the text labels are opposite to each otherMapping to text image set->Wherein x is i For the ith text, y i For the ith tag->Is y i Corresponding M pictures, wherein N is the number of texts in the test set;
after the text image set is acquired, adding additional semantic prompt words for the prompt enhancement before the beginning of the text in the test set, wherein the text image set is expressed as:
wherein, prompt is a semantic Prompt for a specific task of a different text classification test set; x is the text in the test set;Text after adding additional semantic cue words;
s3: calculating the text vector and the image vector to obtain the similarity degree of the picture and the text;
s4: according to the current classification task type and the calculated similarity degree, carrying out prediction matching to obtain a text classification result;
the classification task type comprises a single-label classification task and a multi-label classification task;
the process for obtaining the classified result by carrying out prediction matching according to the similarity obtained by calculation and the current classification task type specifically comprises the following steps: if the classification task type is a single-label classification task, selecting the category with the highest similarity degree as a final matching result;
if the classification task type is the multi-label classification task, selecting the class with the similarity degree larger than a preset threshold value as a final matching result.
2. The CLIP-based zero-sample text classification method of claim 1, wherein said similarity calculation is performed by dot product operation of a text vector and an image vector.
3. A CLIP-based zero-sample text classification system, comprising:
and a data acquisition module: the method comprises the steps of obtaining a text to be classified;
and a coding module: for inputting text into a text encoder to obtain text vectors; inputting an image set in the text image set into an image encoder to obtain an image vector; the text image set acquisition process comprises the following steps of:
s21: acquiring a text set and a label set according to the text to be classified; the text set is a set of texts to be classified, and the tag set is a set of possibly belonging to the classification of the texts to be classified;
s22: randomly downloading a picture aiming at each tag in the tag set to obtain an image set formed by all downloaded pictures;
s23: converting the text label set into a text image set;
the text image centralized image set acquisition process further comprises the following steps:
s21: acquiring a text set and a label set according to the text to be classified; the text set is a set of texts to be classified, and the tag set is a set of possibly belonging to the classification of the texts to be classified;
s22: randomly downloading a plurality of pictures for each tag in a tag set to perform ensemble enhancement to obtain an image set formed by all downloaded pictures;
s23: converting the text label set into a text image set;
the specific process of converting the text label set into the text image set is as follows:
according to the type of each label, the corresponding picture obtained in S22 is adopted to replace, so that the text labels are opposite to each otherMapping to text image set->Wherein x is i For the ith text, y i For the ith tag->Is y i Corresponding M pictures, wherein N is the number of texts in the test set;
after the text image set is acquired, adding additional semantic prompt words for the prompt enhancement before the beginning of the text in the test set, wherein the text image set is expressed as:
wherein, prompt is a semantic Prompt for a specific task of a different text classification test set; x is the text in the test set;text after adding additional semantic cue words;
and a classification prediction module: the method comprises the steps of calculating a text vector and an image vector to obtain the similarity degree of a picture and a text; the method comprises the steps of carrying out prediction matching according to the similarity obtained by calculation and the current classification task type to obtain a text classification result; the classification task type comprises a single-label classification task and a multi-label classification task;
the process for obtaining the classified result by carrying out prediction matching according to the similarity obtained by calculation and the current classification task type specifically comprises the following steps: if the classification task type is a single-label classification task, selecting the category with the highest similarity degree as a final matching result;
if the classification task type is the multi-label classification task, selecting the class with the similarity degree larger than a preset threshold value as a final matching result.
4. A computer-readable storage medium, characterized by: a computer program is stored which, when called by a processor, performs: the method of any one of claims 1-2.
CN202310778409.5A 2023-06-29 2023-06-29 Zero sample text classification method, system and medium based on CLIP Active CN116701637B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310778409.5A CN116701637B (en) 2023-06-29 2023-06-29 Zero sample text classification method, system and medium based on CLIP

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310778409.5A CN116701637B (en) 2023-06-29 2023-06-29 Zero sample text classification method, system and medium based on CLIP

Publications (2)

Publication Number Publication Date
CN116701637A CN116701637A (en) 2023-09-05
CN116701637B true CN116701637B (en) 2024-03-08

Family

ID=87823836

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310778409.5A Active CN116701637B (en) 2023-06-29 2023-06-29 Zero sample text classification method, system and medium based on CLIP

Country Status (1)

Country Link
CN (1) CN116701637B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116935418B (en) * 2023-09-15 2023-12-05 成都索贝数码科技股份有限公司 Automatic three-dimensional graphic template reorganization method, device and system

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH01131960A (en) * 1988-10-21 1989-05-24 Toshiba Corp Document and image filing device
EP1871064A1 (en) * 2006-06-19 2007-12-26 Research In Motion Limited Device for transferring information
CN113449808A (en) * 2021-07-13 2021-09-28 广州华多网络科技有限公司 Multi-source image-text information classification method and corresponding device, equipment and medium
CN113836298A (en) * 2021-08-05 2021-12-24 合肥工业大学 Text classification method and system based on visual enhancement
CN114239560A (en) * 2021-12-03 2022-03-25 上海人工智能创新中心 Three-dimensional image classification method, device, equipment and computer-readable storage medium
CN115393902A (en) * 2022-09-26 2022-11-25 华东师范大学 Pedestrian re-identification method based on comparison language image pre-training model CLIP
CN115761314A (en) * 2022-11-07 2023-03-07 重庆邮电大学 E-commerce image and text classification method and system based on prompt learning
CN115761757A (en) * 2022-11-04 2023-03-07 福州大学 Multi-mode text page classification method based on decoupling feature guidance
CN116226688A (en) * 2023-05-10 2023-06-06 粤港澳大湾区数字经济研究院(福田) Data processing, image-text searching and image classifying method and related equipment
CN116702091A (en) * 2023-06-21 2023-09-05 中南大学 Multi-mode ironic intention recognition method, device and equipment based on multi-view CLIP
CN117421639A (en) * 2023-11-03 2024-01-19 中南大学 Multi-mode data classification method, terminal equipment and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8000535B2 (en) * 2007-06-18 2011-08-16 Sharp Laboratories Of America, Inc. Methods and systems for refining text segmentation results
US20140270347A1 (en) * 2013-03-13 2014-09-18 Sharp Laboratories Of America, Inc. Hierarchical image classification system
GB2586265B (en) * 2019-08-15 2023-02-15 Vision Semantics Ltd Text based image search

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH01131960A (en) * 1988-10-21 1989-05-24 Toshiba Corp Document and image filing device
EP1871064A1 (en) * 2006-06-19 2007-12-26 Research In Motion Limited Device for transferring information
CN113449808A (en) * 2021-07-13 2021-09-28 广州华多网络科技有限公司 Multi-source image-text information classification method and corresponding device, equipment and medium
CN113836298A (en) * 2021-08-05 2021-12-24 合肥工业大学 Text classification method and system based on visual enhancement
CN114239560A (en) * 2021-12-03 2022-03-25 上海人工智能创新中心 Three-dimensional image classification method, device, equipment and computer-readable storage medium
CN115393902A (en) * 2022-09-26 2022-11-25 华东师范大学 Pedestrian re-identification method based on comparison language image pre-training model CLIP
CN115761757A (en) * 2022-11-04 2023-03-07 福州大学 Multi-mode text page classification method based on decoupling feature guidance
CN115761314A (en) * 2022-11-07 2023-03-07 重庆邮电大学 E-commerce image and text classification method and system based on prompt learning
CN116226688A (en) * 2023-05-10 2023-06-06 粤港澳大湾区数字经济研究院(福田) Data processing, image-text searching and image classifying method and related equipment
CN116702091A (en) * 2023-06-21 2023-09-05 中南大学 Multi-mode ironic intention recognition method, device and equipment based on multi-view CLIP
CN117421639A (en) * 2023-11-03 2024-01-19 中南大学 Multi-mode data classification method, terminal equipment and storage medium

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
"Bench-marking zero-shot text classification:Datasets ,evaluation and entailment approach";Wenpeng Yin,Jamaal Hay,and DanRoth;《Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing》;20181130;3914-3923 *
"CLIPText: A New Paradigm for Zero-shot Text Classification";Libo QIN;《In Findings of the Association for Computational Linguistics:ACL 2023》;20230731;1077-1088页 *
基于多模态子空间相关性传递的视频语义挖掘;刘亚楠;吴飞;庄越挺;;计算机研究与发展;20090115(01);3-10 *
基于概率潜在语义分析模型的分类融合图像标注;吕海峰;蔡明;;电子技术与软件工程;20180406(07);102-104 *
基于视觉误差与语义属性的零样本图像分类;徐戈;肖永强;汪涛;陈开志;廖祥文;吴运兵;;计算机应用;20181120(04);92-98 *

Also Published As

Publication number Publication date
CN116701637A (en) 2023-09-05

Similar Documents

Publication Publication Date Title
CN109117777B (en) Method and device for generating information
US10504010B2 (en) Systems and methods for fast novel visual concept learning from sentence descriptions of images
CN110597961B (en) Text category labeling method and device, electronic equipment and storage medium
CN111324769A (en) Training method of video information processing model, video information processing method and device
CN110363084A (en) A kind of class state detection method, device, storage medium and electronics
CN110704586A (en) Information processing method and system
CN116701637B (en) Zero sample text classification method, system and medium based on CLIP
CN111159417A (en) Method, device and equipment for extracting key information of text content and storage medium
CN114218945A (en) Entity identification method, device, server and storage medium
CN114298157A (en) Short text sentiment classification method, medium and system based on public sentiment big data analysis
CN113836992A (en) Method for identifying label, method, device and equipment for training label identification model
CN114358203A (en) Training method and device for image description sentence generation module and electronic equipment
CN112132075B (en) Method and medium for processing image-text content
CN116127080A (en) Method for extracting attribute value of description object and related equipment
CN113435499A (en) Label classification method and device, electronic equipment and storage medium
CN111445545B (en) Text transfer mapping method and device, storage medium and electronic equipment
Kim et al. On text localization in end-to-end OCR-Free document understanding transformer without text localization supervision
US20180047137A1 (en) Automatic correction of facial sentiment of portrait images
CN116737938A (en) Fine granularity emotion detection method and device based on fine tuning large model online data network
CN110704581A (en) Computer-executed text emotion analysis method and device
Shang et al. Deep learning generic features for cross-media retrieval
CN114780757A (en) Short media label extraction method and device, computer equipment and storage medium
Hossain et al. Attention-based image captioning using DenseNet features
Newnham Machine Learning with Core ML: An iOS developer's guide to implementing machine learning in mobile apps
Yang et al. Automatic metadata information extraction from scientific literature using deep neural networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant