CN117173511A - Category identification method, apparatus, device, storage medium, and program product - Google Patents

Category identification method, apparatus, device, storage medium, and program product Download PDF

Info

Publication number
CN117173511A
CN117173511A CN202311160076.6A CN202311160076A CN117173511A CN 117173511 A CN117173511 A CN 117173511A CN 202311160076 A CN202311160076 A CN 202311160076A CN 117173511 A CN117173511 A CN 117173511A
Authority
CN
China
Prior art keywords
picture
text
sample set
training
category
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311160076.6A
Other languages
Chinese (zh)
Inventor
陈祥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bigo Technology Pte Ltd
Original Assignee
Bigo Technology Pte Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bigo Technology Pte Ltd filed Critical Bigo Technology Pte Ltd
Priority to CN202311160076.6A priority Critical patent/CN117173511A/en
Publication of CN117173511A publication Critical patent/CN117173511A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Image Analysis (AREA)

Abstract

The embodiment of the application provides a kind of identification method, a device, equipment, a storage medium and a program product, wherein the method comprises the following steps: acquiring a picture sample set and a text sample set, wherein picture samples in the picture sample set and text samples in the text sample set have different association relations; training a set recognition model based on the picture sample set and the text sample set; and inputting the picture to be identified and the set category information into the identification model after training is completed so as to obtain a matching picture of the category information in the picture to be identified. The scheme can greatly save time cost and labor cost and improve class identification efficiency.

Description

Category identification method, apparatus, device, storage medium, and program product
Technical Field
Embodiments of the present application relate to the field of computer technologies, and in particular, to a class identification method, apparatus, device, storage medium, and program product.
Background
Currently, a visual recognition system applied to a large-scale image auditing technology is mainly realized by a deep learning-based method. The method can ensure that the model has better generalization capability and practical application value only by accumulating larger data volume. It requires that positive samples of interest be obtained from a large amount of data, such positive samples often requiring hundreds of thousands or tens of thousands of accumulated amounts when applied to large-scale audit data, and in some training data, over class identification which has not been well defined, the related art mitigates the data shortfall by manually collecting more data of interest, while further training a separate visual recognition model, in such a way that efficient recognition of new classes is achieved.
In the above scheme, a mode of retraining the identification model is adopted, so that a great deal of labor cost and time cost are required to be input, such as great deal of labeling time cost is consumed, particularly, in the condition that the occupied ratio of the concerned positive sample is extremely small (such as less than one part per million), millions of data or even tens of millions of data need to be labeled for obtaining a sufficient number of positive samples, and in practical application, the time and labor cost are extremely high, and improvement is required.
Disclosure of Invention
The embodiment of the application provides a class identification method, a device, equipment, a storage medium and a program product, which can greatly save time cost and labor cost and improve class identification efficiency.
In a first aspect, an embodiment of the present application provides a class identification method, including:
acquiring a picture sample set and a text sample set, wherein picture samples in the picture sample set and text samples in the text sample set have different association relations;
training a set recognition model based on the picture sample set and the text sample set;
and inputting the picture to be identified and the set category information into the identification model after training is completed so as to obtain a matching picture of the category information in the picture to be identified.
In a second aspect, an embodiment of the present application further provides a class identification device, including:
the acquisition module is configured to acquire a picture sample set and a text sample set, wherein the picture samples in the picture sample set and the text samples in the text sample set have different association relations;
the training module is configured to train the set recognition model based on the picture sample set and the text sample set;
the identification module is configured to input the picture to be identified and the set category information into the identification model after training is completed so as to obtain a matching picture of the category information in the picture to be identified.
In a third aspect, an embodiment of the present application further provides a class identification device, including:
one or more processors;
storage means for storing one or more programs,
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the class identification method described in the embodiments of the present application.
In a fourth aspect, embodiments of the present application also provide a non-volatile storage medium storing computer-executable instructions that, when executed by a computer processor, are configured to perform the class identification method of embodiments of the present application.
In a fifth aspect, the present embodiment further provides a computer program product, where the computer program product includes a computer program, where the computer program is stored in a computer readable storage medium, and where at least one processor of the device reads from the computer readable storage medium and executes the computer program, so that the device performs the category identification method according to the embodiment of the present application.
According to the embodiment of the application, the picture sample set and the text sample set are obtained, wherein the picture sample in the picture sample set and the text sample in the text sample set have different association relations, the set recognition model is trained based on the picture sample set and the text sample set, and the picture to be recognized and the set category information are input into the trained recognition model, so that the matching picture of the category information in the picture to be recognized is obtained. In the above-mentioned category recognition mode, utilize the recognition model that obtains based on training data training of picture and text to confirm the matching picture that corresponds with the category mode of setting, need not to carry out independent model training to specific category, this scheme need not to carry out the mark of sample to the training in-process of recognition model simultaneously, can save a large amount of time and human cost, and the recognition accuracy of model is high, and it is pressed close to the service scenario more, and the commonality is stronger.
Drawings
FIG. 1 is a flow chart of a class identification method provided in an embodiment of the present application;
FIG. 2 is a flowchart of an identification model training method according to an embodiment of the present application;
fig. 3 is a schematic diagram of a network structure in an identification model according to an embodiment of the present application;
FIG. 4 is a flowchart of a method for model training based on a generated association according to an embodiment of the present application;
FIG. 5 is a flowchart of another type of identification method according to an embodiment of the present application;
FIG. 6 is a block diagram of a class identification device according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of a class identification device according to an embodiment of the present application.
Detailed Description
Embodiments of the present application will be described in further detail below with reference to the drawings and examples. It should be understood that the particular embodiments described herein are illustrative only and are not limiting of embodiments of the application. It should be further noted that, for convenience of description, only some, but not all of the structures related to the embodiments of the present application are shown in the drawings.
The terms first, second and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged, as appropriate, such that embodiments of the present application may be implemented in sequences other than those illustrated or described herein, and that the objects identified by "first," "second," etc. are generally of a type, and are not limited to the number of objects, such as the first object may be one or more. Furthermore, in the description and claims, "and/or" means at least one of the connected objects, and the character "/", generally means that the associated object is an "or" relationship.
The class identification method provided by the embodiment of the application can be applied to the auditing process of videos and pictures in the live broadcast industry, and the application scene of the matched pictures is given to the newly added class.
Fig. 1 is a flowchart of a class identification method according to an embodiment of the present application, as shown in fig. 1, specifically including the following steps:
step S101, a picture sample set and a text sample set are obtained, wherein the picture samples in the picture sample set and the text samples in the text sample set have different association relations.
In one embodiment, prior to training of the recognition model, a stored picture sample set and a text sample set are obtained, wherein the picture sample set comprises a plurality of pictures and the text sample set comprises a plurality of texts. The text contained for the text sample set may be natural language descriptive text composed of words and sentences, etc. The picture samples in the picture sample set and the text samples in the text sample set have different association relations. Alternatively, the association relationship may be a relationship of association or non-association of the dichotomy division.
In one embodiment, before storing the picture sample set and the text sample set, a process of acquiring the picture and the text through a network and generating an association relationship between the picture and the text is further included. Alternatively, it may be: and acquiring pictures and text description information in the website information, and generating a picture sample set and a text sample set based on the pictures and the text description information, and the association relationship between the picture samples in the picture sample set and the text samples in the text sample set. The website information may be bar information of a live broadcast platform, and the bar information includes pictures and corresponding texts, that is, the pictures in the bar are stored as picture samples in a picture sample set in a mode of collecting information through a network, and texts of characters are stored as text samples in a text sample set. The association relation between the generated picture sample and the text sample is that the association relation can be automatically generated according to the specific sources of the pictures and the texts, such as the association relation of the pictures and the texts of the same source, and the association relation of the pictures and the texts of different sources. The judgment standards of the same source and different sources can be set by a developer, and the texts and pictures under the same post can be determined to be the same in terms of the bar information in the website information, and the texts and pictures under different posts can be determined to be different in source. By the aid of the generation mode of the picture sample set and the text sample set, sample labeling work is not needed, and the generation can be automatically performed.
In the sample generation mode, firstly, starting from a batch of initial data, each piece of data at least comprises pictures and short text information, and although the data has no clear label, the data is quite huge and common in magnitude, such as posting in a live broadcasting room, posting on a social platform and the like, comprises some pictures and short text descriptions, and is used as a sample for subsequent model training, so that the efficiency of obtaining the sample can be remarkably improved, no label is needed, and efficient information identification under the condition of data deficiency can be realized. The method also solves the defects brought by the way of describing possible labels or entity sets in advance, and mainly has the defects that the identification of zero samples can be realized to a certain extent by predefining the attribute information of the labels, but the technology often assumes that different labels or categories have similar attribute information on the bottom layer, such as different birds have different colors, heads, abdomen and other attribute information, the assumption is too strict in an actual identification system, and an expert is required to predefine attribute vectors corresponding to the labels of the different categories, the design process is very time-consuming, and a lot of time and labor cost are required to be invested for collecting a certain number of samples, so the practicability is greatly compromised.
And step S102, training the set recognition model based on the picture sample set and the text sample set.
In one embodiment, after the picture sample set and the text sample set are obtained, the set recognition model is trained based on the picture sample set and the text sample set. Optionally, the set network model includes a picture coding network and a text coding network. An optional training manner is shown in fig. 2, and fig. 2 is a flowchart of a training method for an identification model according to an embodiment of the present application, where the method includes:
and S1021, performing picture normalization processing on the picture samples in the picture sample set to obtain a standard picture, and performing text normalization processing on the texts in the text sample set to obtain a standard text.
In one embodiment, before inputting the picture and the text into the picture coding network and the text coding network, respectively, picture normalization processing is performed on the picture samples to obtain standard pictures, and text normalization processing is performed on the text in the text sample set to obtain standard texts. The standard pictures obtained through processing can be tensor matrixes with preset sizes, and the standard texts can be matrixes with preset dimensions. For example, the standard picture may be a 224×224×3 tensor matrix, and the standard text may be a 76×768-dimensional matrix. The specific mode of picture standardization and text standardization can be to process by adopting a set corresponding function or programming language code, such as imread (filename) function to perform picture standardization, and the text standardization is realized by writing txt text standardization processing code by using the python programming language. Optionally, when the image normalization processing is performed, the image is converted into a 2-dimensional matrix by subtracting the mean and variance of three channels of the image from the RGB image information, and the image can also be converted into a 1-dimensional vector.
Step S1022, inputting the standard picture and the standard text to a picture coding network and a text coding network, respectively, to obtain a picture vector corresponding to the standard picture and a text vector corresponding to the standard text.
After obtaining the standard picture and the standard text, inputting the standard picture into a picture coding network under training, and inputting the standard text into a text coding network under training to obtain a picture vector and a text vector respectively. Exemplary, 512-dimensional image vector E can be obtained I (I)∈R 512 And 512-dimensional text vector E T (T)∈R 512 . Through the standardization departmentAnd finally, the acquired massive pictures and texts can be aligned for subsequent model training. This allows for multi-modal model training in the absence of data and subsequent recognition.
The image coding network and the text coding network can be a Transformer network, and also can adopt a visual network structure based on RNN or CNN.
Optionally, the specially configured picture coding network and text coding network of the scheme adopt a Transformer network architecture, and the picture coding network and text coding network comprise a self-attention module, a residual neural network module and a forward network module. The network structure may be a 12-layer self-attention and residual neural network, as shown in fig. 3, fig. 3 is a network structure schematic diagram in an identification model provided by the embodiment of the application, where the network structure includes a multi-head self-attention module and a short-cut residual link normalization module for information extraction, and includes a learnable forward network, where the above components form an encocoder-Block, and each of the information input and information output of the encocoder-Block is an exemplary matrix output fixed to 768 dimensions, and the dimensions remain unchanged. The converter network structure comprises 12 Encoder-blocks which are arranged in a cascading way, and after the cascading module extracts high-level semantic information in images and texts, the image vectors/text vectors are output after the image and text are connected to a full-connection layer.
Step S1023, calculating the similarity between the picture vector and the text vector, and training the picture coding network and the text coding network based on the association relation between the picture sample and the text sample to obtain a training recognition model.
In one embodiment, the similarity between the picture vector and the text vector may be obtained by calculating the euclidean distance, or the similarity may be obtained by calculating the cosine distance of the picture vector and the text vector, which is not limited in this scheme. Optionally, when training the recognition model, as shown in fig. 4, fig. 4 is a flowchart of a method for training the model based on the generated association relationship according to an embodiment of the present application, where the method includes:
step S10231, calculating the similarity between the picture vector and the text vector through the set similarity calculation formula.
And step S10232, carrying out loss calculation based on the association relation between the picture sample and the text sample to obtain a loss value.
Taking the association relationship as an example, the association relationship includes association and non-association, the sample label value corresponding to the association is 1, and the non-association is 0. Wherein for image vector E I (I) And a text vector E T (T) for the tag value y, the similarity calculation result of the tag value y and the tag value y is denoted as Sim (I, T), and the loss value calculation mode can be as follows:
L(Sim(I,T),y)=-ylog(Sim(I,T))
step S10233, adjusting network parameters of the picture coding network and the text coding network based on the loss value to obtain a trained recognition model.
And continuously adjusting network parameters of the picture coding network and the text coding network through the input of different pictures and texts and the calculation feedback of the loss values, after the optimal network parameters are obtained after convergence, not updating the weights any more, and using the coding network under the fixed network parameters as a feature extraction model for subsequent category identification.
Step S103, inputting the picture to be identified and the set category information into the identification model after training is completed, so as to obtain a matching picture of the category information in the picture to be identified.
The picture to be identified can be a picture for auditing generated in a live broadcast process such as live broadcast screenshot, or can be other picture sets for finding pictures with matched categories. The category information may be a new category that needs to be set to find a corresponding matching picture. The category as previously entered may be volleyball, in which case the newly set category may be beach volleyball or beach football, etc. Through the recognition model which is completed through the training, a picture with high matching degree corresponding to the category information can be obtained as a matching picture.
According to the method, the picture sample set and the text sample set are obtained, wherein the picture sample in the picture sample set and the text sample in the text sample set have different association relations, the set recognition model is trained based on the picture sample set and the text sample set, and the picture to be recognized and the set category information are input into the trained recognition model, so that the matching picture of the category information in the picture to be recognized is obtained. In the above-mentioned category recognition mode, utilize the recognition model that obtains based on training data training of picture and text to confirm the matching picture that corresponds with the category mode of setting, need not to carry out independent model training to specific category, this scheme need not to carry out the mark of sample to the training in-process of recognition model simultaneously, can save a large amount of time and human cost, and the recognition accuracy of model is high, and it is pressed close to the service scenario more, and the commonality is stronger.
Fig. 5 is a flowchart of another type of identification method according to an embodiment of the present application, as shown in fig. 5, including:
step 201, a picture sample set and a text sample set are obtained, wherein the picture samples in the picture sample set and the text samples in the text sample set have different association relations.
And step S202, training the set recognition model based on the picture sample set and the text sample set.
Step S203, inputting the picture to be identified and the set category information into the identification model after training is completed, obtaining the similarity value of the category information and each picture in the picture to be identified, and determining the picture corresponding to the similarity value meeting the set similarity condition as the matching picture of the category information.
In one embodiment, the set category information may be a brief description of the compliance with the category requirements, without requiring strict restrictions on the specification requirements text. The category information may be identified by a category that has not been previously well defined. And inputting the pictures to be identified and the set category information into the identification model after training is completed, so that a picture vector of each picture in the pictures to be identified and a text vector corresponding to the category information can be obtained, and calculating the similarity between the text vector and each picture vector to obtain the similarity between the category information and each picture, wherein the picture with the similarity value larger than the set threshold value is used as a matching picture corresponding to the category information, namely the matching picture is directly used as an identification result.
According to the method, the picture sample set and the text sample set are obtained, wherein the picture sample in the picture sample set and the text sample in the text sample set have different association relations, the set recognition model is trained based on the picture sample set and the text sample set, and the picture to be recognized and the set category information are input into the trained recognition model, so that the matching picture of the category information in the picture to be recognized is obtained. In the above-mentioned category recognition mode, utilize the recognition model that obtains based on training data training of picture and text to confirm the matching picture that corresponds with the category mode of setting, need not to carry out independent model training to specific category, this scheme need not to carry out the mark of sample to the training in-process of recognition model simultaneously, can save a large amount of time and human cost, and the recognition accuracy of model is high, and it is pressed close to the service scenario more, and the commonality is stronger.
In the scheme of the category identification, in the scene of the multi-mode visual identification system based on the condition of lack of data, a large number of images of target categories can be obtained by defining text labels or text descriptions of target data, namely category information, so that the time and labor cost of data collection and labeling are greatly reduced. In the multi-mode visual recognition system based on the condition of lack of data, a multi-mode model with text and image information aligned semantically can be obtained by utilizing massive image-text pair data in a service scene and is used for subsequent efficient recognition. Meanwhile, through the description of the test sample set, a matching recognition result can be obtained, and the problem that the recognition under the limited category needs to additionally develop the low-efficiency behavior of the algorithm model is avoided. The text description can be based on natural language or word, so that the refining time cost for category labeling is greatly reduced, the text description is closer to the use scene of a user, the text description is more universal, and dynamic requirements such as standard change in auditing service can be responded quickly.
Fig. 6 is a block diagram of a class identification device according to an embodiment of the present application, and as shown in fig. 6, the device is configured to execute the class identification method according to the foregoing embodiment, and has functional modules and beneficial effects corresponding to the execution method. As shown in fig. 6, the apparatus specifically includes: an acquisition module 101, a training module 102, and an identification module 103, wherein,
an obtaining module 101, configured to obtain a picture sample set and a text sample set, where picture samples in the picture sample set and text samples in the text sample set have different association relations;
a training module 102 configured to train the set recognition model based on the picture sample set and the text sample set;
the recognition module 103 is configured to input the picture to be recognized and the set category information into the recognition model after training is completed, so as to obtain a matching picture of the category information in the picture to be recognized.
According to the method, the picture sample set and the text sample set are obtained, wherein the picture sample in the picture sample set and the text sample in the text sample set have different association relations, the set recognition model is trained based on the picture sample set and the text sample set, and the picture to be recognized and the set category information are input into the trained recognition model, so that the matching picture of the category information in the picture to be recognized is obtained. In the above-mentioned category recognition mode, utilize the recognition model that obtains based on training data training of picture and text to confirm the matching picture that corresponds with the category mode of setting, need not to carry out independent model training to specific category, this scheme need not to carry out the mark of sample to the training in-process of recognition model simultaneously, can save a large amount of time and human cost, and the recognition accuracy of model is high, and it is pressed close to the service scenario more, and the commonality is stronger.
In one possible embodiment, the apparatus further comprises a sample generation module configured to:
before the picture sample set and the text sample set are acquired, acquiring pictures and text description information in website information;
and generating a picture sample set and a text sample set based on the picture and the text description information, and the association relation between the picture sample in the picture sample set and the text sample in the text sample set.
In one possible embodiment, the training module 102 is configured to:
performing picture standardization processing on the picture samples in the picture sample set to obtain a standard picture, and performing text standardization processing on the text in the text sample set to obtain a standard text;
respectively inputting the standard picture and the standard text into a picture coding network and a text coding network which are arranged to obtain a picture vector corresponding to the standard picture and a text vector corresponding to the standard text;
and calculating the similarity of the picture vector and the text vector, and training the picture coding network and the text coding network based on the association relation between the picture sample and the text sample to obtain a trained recognition model.
In one possible embodiment, the standard picture includes a tensor matrix of a preset size, and the standard text includes a matrix of a preset dimension.
In one possible embodiment, the training module 102 is configured to:
calculating the similarity between the picture vector and the text vector through a set similarity calculation formula;
performing loss calculation based on the similarity and the association relation between the picture sample and the text sample to obtain a loss value;
and adjusting network parameters of the picture coding network and the text coding network based on the loss value to obtain a trained identification model.
In one possible embodiment, the picture coding network and the text coding network include a self-attention module, a residual neural network module, and a forward network module.
In a possible embodiment, the identification module 103 is configured to:
inputting the picture to be identified and the set category information into the identification model after training is completed, and obtaining a similarity value of the category information and each picture in the picture to be identified;
and determining the picture corresponding to the similarity value meeting the set similarity condition as the matching picture of the category information.
Fig. 7 is a schematic structural diagram of a class identification device according to an embodiment of the present application, as shown in fig. 7, the device includes a processor 201, a memory 202, an input device 203, and an output device 204; the number of processors 201 in the device may be one or more, one processor 201 being taken as an example in fig. 7; the processor 201, memory 202, input devices 203, and output devices 204 in the apparatus may be connected by a bus or other means, for example in fig. 7. The memory 202 is a computer readable storage medium, and may be used to store a software program, a computer executable program, and modules, such as program instructions/modules corresponding to the category identification method in the embodiment of the present application. The processor 201 executes various functional applications of the device and data processing, i.e., implements the above-described category identification method, by running software programs, instructions, and modules stored in the memory 202. The input device 703 may be used to receive input numeric or character information and to generate key signal inputs related to user settings and function control of the apparatus. The output device 204 may include a display device such as a display screen.
The embodiments of the present application also provide a non-volatile storage medium containing computer executable instructions which, when executed by a computer processor, are adapted to carry out a class identification method as described in the above embodiments, comprising:
acquiring a picture sample set and a text sample set, wherein picture samples in the picture sample set and text samples in the text sample set have different association relations;
training a set recognition model based on the picture sample set and the text sample set;
and inputting the picture to be identified and the set category information into the identification model after training is completed so as to obtain a matching picture of the category information in the picture to be identified.
It should be noted that, in the embodiment of the category identifying device, each unit and module included are only divided according to the functional logic, but not limited to the above-mentioned division, so long as the corresponding functions can be implemented; in addition, the specific names of the functional units are also only for distinguishing from each other, and are not used to limit the protection scope of the embodiments of the present application.
In some possible embodiments, aspects of the method provided by the present application may also be implemented in the form of a program product, which comprises a program code for causing a computer device to carry out the steps of the method according to the various exemplary embodiments of the application described in the present specification, when said program product is run on the computer device, for example, the computer device may carry out the category identification method described in the examples of the present application. The program product may be implemented using any combination of one or more readable media.

Claims (11)

1. The category identification method is characterized by comprising the following steps:
acquiring a picture sample set and a text sample set, wherein picture samples in the picture sample set and text samples in the text sample set have different association relations;
training a set recognition model based on the picture sample set and the text sample set;
and inputting the picture to be identified and the set category information into the identification model after training is completed so as to obtain a matching picture of the category information in the picture to be identified.
2. The category identification method of claim 1, further comprising, prior to the acquiring the picture sample set and the text sample set:
acquiring pictures and text description information in website information;
and generating a picture sample set and a text sample set based on the picture and the text description information, and the association relation between the picture sample in the picture sample set and the text sample in the text sample set.
3. The category identification method of claim 1, wherein the training the set identification model based on the picture sample set and the text sample set includes:
performing picture standardization processing on the picture samples in the picture sample set to obtain a standard picture, and performing text standardization processing on the text in the text sample set to obtain a standard text;
respectively inputting the standard picture and the standard text into a picture coding network and a text coding network which are arranged to obtain a picture vector corresponding to the standard picture and a text vector corresponding to the standard text;
and calculating the similarity of the picture vector and the text vector, and training the picture coding network and the text coding network based on the association relation between the picture sample and the text sample to obtain a trained recognition model.
4. A category identification method as claimed in claim 3, wherein the standard picture comprises a tensor matrix of a preset size and the standard text comprises a matrix of a preset dimension.
5. The method of claim 3, wherein the calculating the similarity between the picture vector and the text vector, and training the picture coding network and the text coding network based on the association between the picture sample and the text sample to obtain the trained recognition model comprises:
calculating the similarity between the picture vector and the text vector through a set similarity calculation formula;
performing loss calculation based on the similarity and the association relation between the picture sample and the text sample to obtain a loss value;
and adjusting network parameters of the picture coding network and the text coding network based on the loss value to obtain a trained identification model.
6. The category identification method of claim 3, wherein the picture coding network and the text coding network include a self-attention module, a residual neural network module, and a forward network module.
7. The method for identifying a category according to any one of claims 1 to 6, wherein inputting the picture to be identified and the set category information into the identification model after training is completed, to obtain a matching picture of the category information in the picture to be identified, includes:
inputting the picture to be identified and the set category information into the identification model after training is completed, and obtaining a similarity value of the category information and each picture in the picture to be identified;
and determining the picture corresponding to the similarity value meeting the set similarity condition as the matching picture of the category information.
8. Category recognition device, characterized by comprising:
the acquisition module is configured to acquire a picture sample set and a text sample set, wherein the picture samples in the picture sample set and the text samples in the text sample set have different association relations;
the training module is configured to train the set recognition model based on the picture sample set and the text sample set;
the identification module is configured to input the picture to be identified and the set category information into the identification model after training is completed so as to obtain a matching picture of the category information in the picture to be identified.
9. A class identification device, the device comprising: one or more processors; storage means for storing one or more programs that when executed by the one or more processors cause the one or more processors to implement the class identification method of any of claims 1-7.
10. A non-transitory storage medium storing computer executable instructions which, when executed by a computer processor, are for performing the class identification method of any one of claims 1-7.
11. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the class identification method of any of claims 1-7.
CN202311160076.6A 2023-09-08 2023-09-08 Category identification method, apparatus, device, storage medium, and program product Pending CN117173511A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311160076.6A CN117173511A (en) 2023-09-08 2023-09-08 Category identification method, apparatus, device, storage medium, and program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311160076.6A CN117173511A (en) 2023-09-08 2023-09-08 Category identification method, apparatus, device, storage medium, and program product

Publications (1)

Publication Number Publication Date
CN117173511A true CN117173511A (en) 2023-12-05

Family

ID=88929654

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311160076.6A Pending CN117173511A (en) 2023-09-08 2023-09-08 Category identification method, apparatus, device, storage medium, and program product

Country Status (1)

Country Link
CN (1) CN117173511A (en)

Similar Documents

Publication Publication Date Title
CN110168535B (en) Information processing method and terminal, computer storage medium
US11093698B2 (en) Method and apparatus and computer device for automatic semantic annotation for an image
CN110795919B (en) Form extraction method, device, equipment and medium in PDF document
US10282643B2 (en) Method and apparatus for obtaining semantic label of digital image
CN109902271B (en) Text data labeling method, device, terminal and medium based on transfer learning
CN110781276A (en) Text extraction method, device, equipment and storage medium
CN113204660B (en) Multimedia data processing method, tag identification device and electronic equipment
CN111191012A (en) Knowledge graph generation apparatus, method and computer program product thereof
CN114580424B (en) Labeling method and device for named entity identification of legal document
CN106600213B (en) Intelligent management system and method for personal resume
CN116303537A (en) Data query method and device, electronic equipment and storage medium
CN112328655A (en) Text label mining method, device, equipment and storage medium
CN113868419B (en) Text classification method, device, equipment and medium based on artificial intelligence
CN114780701A (en) Automatic question-answer matching method, device, computer equipment and storage medium
CN113420116B (en) Medical document analysis method, device, equipment and medium
CN113935880A (en) Policy recommendation method, device, equipment and storage medium
CN112632260A (en) Intelligent question and answer method and device, electronic equipment and computer readable storage medium
CN114021555A (en) Method, device and equipment for automatically labeling knowledge points and readable storage medium
CN117173511A (en) Category identification method, apparatus, device, storage medium, and program product
CN115730603A (en) Information extraction method, device, equipment and storage medium based on artificial intelligence
CN115346095A (en) Visual question answering method, device, equipment and storage medium
CN114943306A (en) Intention classification method, device, equipment and storage medium
CN114021004A (en) Method, device and equipment for recommending science similar questions and readable storage medium
CN115063784A (en) Bill image information extraction method and device, storage medium and electronic equipment
CN113177543A (en) Certificate identification method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination