CN114328906A - Multistage category determination method, model training method and related device - Google Patents

Multistage category determination method, model training method and related device Download PDF

Info

Publication number
CN114328906A
CN114328906A CN202111114531.XA CN202111114531A CN114328906A CN 114328906 A CN114328906 A CN 114328906A CN 202111114531 A CN202111114531 A CN 202111114531A CN 114328906 A CN114328906 A CN 114328906A
Authority
CN
China
Prior art keywords
vector
category
text
information
distribution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111114531.XA
Other languages
Chinese (zh)
Inventor
黄剑辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202111114531.XA priority Critical patent/CN114328906A/en
Publication of CN114328906A publication Critical patent/CN114328906A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a multistage category determination method applicable to various scenes such as cloud technology, artificial intelligence, intelligent traffic, auxiliary driving and the like, which comprises the following steps: acquiring a text coding vector corresponding to target text information; based on the text encoding vector, obtaining a first distribution vector through a first classifier included in a hierarchical classification model; generating a text fusion vector according to the first distribution vector and the text coding vector; based on the text fusion vector, acquiring a second distribution vector through a second classifier included in the hierarchical classification model; and determining a target primary category to which the target text information belongs according to the first distribution vector, and determining a target secondary category to which the target text information belongs according to the second distribution vector. The application also discloses a model training method and a model training device. The method and the device predict the output of the next level based on the output result of the previous level, and fully and effectively utilize the constraint relation between the upper layer and the lower layer in the category system, so that the effect of category classification is enhanced, and the classification accuracy is improved.

Description

Multistage category determination method, model training method and related device
Technical Field
The application relates to the technical field of artificial intelligence, in particular to a method for determining multi-level categories, a method for training a model and a related device.
Background
With the continuous development of computer technology, the amount and day of information faced by users are increasing dramatically, and when users face a large amount of information, information related to keywords can be obtained by searching keywords. For example, a user enters the keyword "football" in a video platform, based on which the video platform is able to provide video related to "football".
The search and recommendation are premised on classifying information, so that in the traditional scheme, the category to which the text belongs can be directly determined by extracting keywords in the text information, or semantic recognition can be performed on the input text information by using a trained neural network model, and then classification of the information is realized.
The inventor finds that at least the following problems exist in the prior art, the classification based on the keywords is relatively limited, and if the text information relates to the keywords with different categories, the situation of classification errors is easily caused. The information classification based on the neural network model can improve the classification accuracy to a certain extent, but still has a larger improvement space in the classification accuracy.
Disclosure of Invention
The embodiment of the application provides a method for determining multi-level categories, a method for training a model and a related device. According to the method and the device, the output of the next level is predicted based on the output result of the previous level, and the constraint relation between the upper layer and the lower layer in the category system can be fully and effectively utilized. Therefore, the classification effect of the categories is enhanced, and the classification accuracy is improved.
In view of the above, an aspect of the present application provides a method for determining a multi-level category, including:
acquiring a text coding vector corresponding to target text information;
based on the text coding vector, obtaining a first distribution vector through a first classifier included in a hierarchical classification model, wherein the first distribution vector comprises M first element scores, each first element score is represented as a probability value of a primary category, and M is an integer greater than 1;
generating a text fusion vector according to the first distribution vector and the text coding vector;
based on the text fusion vector, obtaining a second distribution vector through a second classifier included in the hierarchical classification model, wherein the second distribution vector comprises N second element scores, each second element score represents a probability value of a secondary category, the secondary category belongs to a next-level category of the primary category, and N is an integer greater than 1;
and determining a target primary category to which the target text information belongs according to the first distribution vector, and determining a target secondary category to which the target text information belongs according to the second distribution vector.
Another aspect of the present application provides a method for model training, including:
acquiring a predictive text coding vector corresponding to text information to be trained, wherein the text information to be trained corresponds to a primary labeling category and a secondary labeling category;
based on the predictive text coding vector, obtaining a first predictive distribution vector through a first classifier to be trained included in a hierarchical classification model to be trained, wherein the first predictive distribution vector comprises M first element scores, each first element score is represented as a probability value of a primary category, and M is an integer greater than 1;
generating a predicted text fusion vector according to the first predicted distribution vector and the predicted text coding vector;
based on the prediction text fusion vector, obtaining a second prediction distribution vector through a second classifier to be trained included in a hierarchical classification model to be trained, wherein the second prediction distribution vector comprises N second element scores, each second element score represents a probability value of a second-level category, the second-level category belongs to a next-level category of the first-level category, and N is an integer greater than 1;
and updating model parameters of the hierarchical classification model to be trained according to the first prediction distribution vector, the second prediction distribution vector, the first-level labeling category and the second-level labeling category until model training conditions are met, and outputting the hierarchical classification model, wherein the hierarchical classification model comprises the first classifier and the second classifier related to the aspects.
Another aspect of the present application provides a multi-level category determining apparatus, including:
the acquisition module is used for acquiring a text coding vector corresponding to the target text information;
the acquisition module is further used for acquiring a first distribution vector through a first classifier included in the hierarchical classification model based on the text coding vector, wherein the first distribution vector comprises M first element scores, each first element score is represented as a probability value of one primary category, and M is an integer greater than 1;
the generating module is used for generating a text fusion vector according to the first distribution vector and the text coding vector;
the acquisition module is further used for acquiring a second distribution vector through a second classifier included in the hierarchical classification model based on the text fusion vector, wherein the second distribution vector comprises N second element scores, each second element score represents a probability value of a second-level category, the second-level category belongs to a next-level category of the first-level category, and N is an integer greater than 1;
and the determining module is used for determining a target primary category to which the target text information belongs according to the first distribution vector and determining a target secondary category to which the target text information belongs according to the second distribution vector.
In one possible design, in another implementation of another aspect of an embodiment of the present application,
the generating module is specifically used for generating a prior semantic vector according to the first distribution vector based on a primary category vector mapping relationship, wherein the primary category vector mapping relationship comprises a one-to-one mapping relationship between M index values and M semantic vectors, and each index value corresponds to one primary category;
and generating a text fusion vector according to the prior semantic vector and the text coding vector.
In one possible design, in another implementation of another aspect of an embodiment of the present application,
a generating module, configured to determine, from the first distribution vector, the first K first element scores with the largest probability value, where each first element score corresponds to an index value of one level class, and K is an integer greater than 1 and smaller than M;
acquiring an index value corresponding to each first element score in the previous K first element scores to obtain K index values;
acquiring corresponding K semantic vectors according to the K index values based on the primary category vector mapping relation;
for each index value in the K index values, performing weighted calculation on a semantic vector corresponding to the index value and a first element score corresponding to the index value to obtain an updated semantic vector corresponding to the index value;
and summing the updated semantic vectors corresponding to the K index values to obtain a prior semantic vector.
In one possible design, in another implementation of another aspect of an embodiment of the present application,
a generating module, configured to determine a first element score with a maximum probability value from the first distribution vector, where the first element score corresponds to an index value of a primary class;
if the first element score is larger than or equal to the element score threshold, acquiring an index value corresponding to the first element score;
acquiring a corresponding semantic vector according to the index value based on the primary category vector mapping relation;
and carrying out weighted calculation on the semantic vector corresponding to the index value and the first element score corresponding to the index value to obtain a prior semantic vector.
In one possible design, in another implementation of another aspect of an embodiment of the present application,
the generating module is specifically used for acquiring an index value corresponding to each first element score in the first distribution vector to obtain M index values;
acquiring corresponding M semantic vectors according to the M index values based on the primary category vector mapping relation;
for each index value in the M index values, performing weighted calculation on a semantic vector corresponding to the index value and a first element score corresponding to the index value to obtain an updated semantic vector corresponding to the index value;
and summing the updated semantic vectors corresponding to the K index values to obtain a prior semantic vector.
In one possible design, in another implementation of another aspect of an embodiment of the present application,
the generating module is specifically used for splicing the prior semantic vector and the text coding vector to obtain a text fusion vector;
or the like, or, alternatively,
the generating module is specifically used for determining first-order vector mapping according to a priori semantic vector, a text coding vector and a first parameter matrix, wherein the priori semantic vector is represented as a p-dimensional vector, the text coding vector is represented as a q-dimensional vector, the first parameter matrix is represented as a [ d (p + q) ] dimensional matrix, and p, q and d are integers greater than 1;
determining a second-order vector mapping according to the prior semantic vector, the text coding vector and a second parameter matrix, wherein the second parameter matrix is expressed as a (d × p × q) dimensional matrix;
and generating a text fusion vector according to the first-order vector mapping, the second-order vector mapping and the offset vector.
In one possible design, in another implementation of another aspect of an embodiment of the present application,
the generating module is specifically used for splicing the first distribution vector and the text coding vector to obtain a text fusion vector;
or the like, or, alternatively,
a generating module, configured to determine a first-order vector mapping according to a first distribution vector, a text coding vector, and a first parameter matrix, where the first distribution vector is represented as a p-dimensional vector, the text coding vector is represented as a q-dimensional vector, the first parameter matrix is represented as a [ d × (p + q) ] dimensional matrix, and p, q, and d are integers greater than 1;
determining a second-order vector mapping according to the first distribution vector, the text encoding vector and a second parameter matrix, wherein the second parameter matrix is expressed as a (d × p × q) dimensional matrix;
and generating a text fusion vector according to the first-order vector mapping, the second-order vector mapping and the offset vector.
In one possible design, in another implementation of another aspect of an embodiment of the present application,
the acquisition module is further used for acquiring target text information corresponding to the target video before acquiring a text coding vector corresponding to the target text information, wherein the target text information comprises at least one of title information, abstract information, subtitle information and comment information of the target video;
or the like, or, alternatively,
the acquisition module is further used for acquiring target text information corresponding to the target picture before acquiring a text coding vector corresponding to the target text information, wherein the target text information comprises at least one of title information, author information, Optical Character Recognition (OCR) information and abstract information of the target picture;
or the like, or, alternatively,
the acquisition module is further used for acquiring target text information corresponding to the target commodity before acquiring a text coding vector corresponding to the target text information, wherein the target text information comprises at least one of commodity name information, place of production information, comment information and commodity description information of the target commodity;
or the like, or, alternatively,
the obtaining module is further configured to obtain target text information corresponding to the target text before obtaining a text coding vector corresponding to the target text information, where the target text information includes at least one of title information, author information, summary information, comment information, and body information of the target text.
In one possible design, in another implementation manner of another aspect of the embodiment of the present application, the multi-level category determining apparatus further includes a receiving module and a sending module;
the receiving module is used for receiving a category query instruction which is sent by the terminal equipment and aims at the content to be searched;
the sending module is used for responding to the category query instruction, and sending a video search result to the terminal equipment if the content to be searched is video content;
the sending module is also used for responding to the category query instruction, and sending a picture searching result to the terminal equipment if the content to be searched is the picture content;
the sending module is also used for responding to the category inquiry instruction, and sending a commodity searching result to the terminal equipment if the content to be searched is commodity content;
and the sending module is also used for responding to the category query instruction, and sending a text search result to the terminal equipment if the content to be searched is text content.
Another aspect of the present application provides a model training apparatus, including:
the acquisition module is used for acquiring a predictive text coding vector corresponding to the text information to be trained, wherein the text information to be trained corresponds to a primary labeling category and a secondary labeling category;
the acquisition module is further used for acquiring a first prediction distribution vector through a first classifier to be trained included in a hierarchical classification model to be trained based on the predictive text coding vector, wherein the first prediction distribution vector comprises M first element scores, each first element score is represented by a probability value of a primary category, and M is an integer greater than 1;
the generating module is used for generating a predicted text fusion vector according to the first predicted distribution vector and the predicted text coding vector;
the acquisition module is further used for acquiring a second prediction distribution vector through a second classifier to be trained included in the hierarchical classification model to be trained based on the prediction text fusion vector, wherein the second prediction distribution vector comprises N second element scores, each second element score represents a probability value of a secondary category, the secondary category belongs to a next category of the primary category, and N is an integer greater than 1;
and the training module is used for updating model parameters of the hierarchical classification model to be trained according to the first prediction distribution vector, the second prediction distribution vector, the first-level labeling category and the second-level labeling category until model training conditions are met, and outputting the hierarchical classification model, wherein the hierarchical classification model comprises the first classifier and the second classifier related to the aspects.
In one possible design, in another implementation of another aspect of an embodiment of the present application,
the training module is specifically used for calculating a first loss value of the text information to be trained by adopting a first classification loss function according to the first prediction distribution vector and the first-level labeling category;
calculating by adopting a second classification loss function according to the second prediction distribution vector and the secondary labeling category to obtain a second loss value of the text information to be trained;
determining a comprehensive loss value of the text information to be trained according to the first loss value and the second loss value;
and updating the model parameters of the hierarchical classification model to be trained according to the comprehensive loss values.
In one possible design, in another implementation of another aspect of an embodiment of the present application,
the training module is specifically used for calculating a first loss value of the text information to be trained by adopting a first classification loss function according to the first prediction distribution vector and the first-level labeling category;
calculating by adopting a second classification loss function according to the second prediction distribution vector and the secondary labeling category to obtain a second loss value of the text information to be trained;
determining a first element prediction score corresponding to the primary labeling category from the first prediction distribution vector, and determining a second element prediction score corresponding to the secondary labeling category from the second prediction distribution vector;
calculating a third loss value of the text information to be trained by adopting a hinge loss function according to the first element prediction score, the second element prediction score and the target hyper-parameter;
determining a comprehensive loss value of the text information to be trained according to the first loss value, the second loss value and the third loss value;
and updating the model parameters of the hierarchical classification model to be trained according to the comprehensive loss values.
Another aspect of the present application provides a computer device, comprising: a memory, a processor, and a bus system;
wherein, the memory is used for storing programs;
the processor is used for executing the program in the memory, and the processor is used for executing the method provided by the aspects according to the instructions in the program code;
the bus system is used for connecting the memory and the processor so as to enable the memory and the processor to communicate.
Another aspect of the present application provides a computer-readable storage medium having stored therein instructions, which when executed on a computer, cause the computer to perform the method of the above-described aspects.
In another aspect of the application, a computer program product or computer program is provided, the computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method provided by the above aspects.
According to the technical scheme, the embodiment of the application has the following advantages:
in the embodiment of the application, a method for determining a multi-level category is provided, which includes firstly, obtaining a text encoding vector corresponding to target text information, and then, taking the text encoding vector as an input of a first classifier, thereby obtaining a first distribution vector. The first classifier is part of a hierarchical classification model that also includes a second classifier. And then, generating a text fusion vector according to the first distribution vector and the text coding vector, and taking the text fusion vector as the output of a second classifier, thereby obtaining a second distribution vector. And finally, determining a target primary category to which the target text information belongs based on the first distribution vector, and determining a target secondary category to which the target text information belongs based on the second distribution vector. According to the method, the prediction result corresponding to the first-level category is used as priori knowledge, the text coding vectors are fused to obtain the text fusion vector, the text fusion vector is used as a basis for predicting the second-level category, namely the output of the next level is predicted based on the output result of the previous level, and the constraint relation between the upper layer and the lower layer in the category system can be fully and effectively utilized. Therefore, when the second classifier carries out prediction, the second classifier related to the prediction result of the first class can be concerned more, so that the classification effect of the classes is enhanced, and the classification accuracy is improved.
Drawings
FIG. 1 is a block diagram of an embodiment of a multi-level category determination system;
FIG. 2 is a schematic diagram of the determination of multi-level categories based on a hierarchical classification model in the embodiment of the present application;
FIG. 3 is a schematic flow chart of a multi-level category determination method according to an embodiment of the present application;
FIG. 4 is a schematic diagram of an encoder-based implementation of text information encoding in an embodiment of the present application;
FIG. 5 is a diagram illustrating the task of classifying video content levels according to an embodiment of the present invention;
FIG. 6 is a diagram illustrating a task of hierarchical classification of picture content in an embodiment of the present application;
FIG. 7 is a diagram illustrating a task of hierarchical classification of merchandise content according to an embodiment of the present application;
FIG. 8 is a diagram illustrating a task of hierarchical classification of textual content in an embodiment of the present application;
FIG. 9 is a schematic diagram of an interface for displaying video search results according to an embodiment of the present application;
FIG. 10 is a schematic diagram of an interface for displaying picture search results according to an embodiment of the present application;
FIG. 11 is a schematic diagram of an interface for displaying search results of an article according to an embodiment of the present application;
FIG. 12 is a schematic diagram of an interface for displaying text search results in an embodiment of the present application;
FIG. 13 is a schematic flow chart diagram of a model training method in an embodiment of the present application;
FIG. 14 is a schematic diagram of a multi-level category determining apparatus in an embodiment of the present application;
FIG. 15 is a schematic view of a model training apparatus according to an embodiment of the present application;
FIG. 16 is a schematic structural diagram of a server in an embodiment of the present application;
fig. 17 is a schematic structural diagram of a terminal device in the embodiment of the present application.
Detailed Description
The embodiment of the application provides a method for determining multi-level categories, a method for training a model and a related device. According to the method and the device, the output of the next level is predicted based on the output result of the previous level, and the constraint relation between the upper layer and the lower layer in the category system can be fully and effectively utilized. Therefore, the classification effect of the categories is enhanced, and the classification accuracy is improved.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present application and in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "corresponding" and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
With the continuous development of internet technology, more and more network interaction platforms appear, which provide great convenience for people's daily life and also increase the difficulty for content integration of the network interaction platforms. Therefore, the content on the network interaction platform can be classified so as to perform tasks such as content search and content recommendation. For example, in a network video platform, a user can view video content of interest according to the video classification result. For example, in the network e-commerce platform, a user can purchase goods according to the goods classification result. For example, in the network game platform, the user can play the electronic game according to the game classification result. For example, in the network education platform, the user can learn courses according to the course classification result. For example, in the network electronic book platform, a user can read articles according to the classification result of the electronic book.
In order to implement more accurate content classification in the above scenario, the present application provides a method for determining a multi-level category, where the method is applied to a multi-level category determining system shown in fig. 1, as shown in the figure, the multi-level category determining system includes a server and a terminal device, and a client is deployed on the terminal device, where the client may run on the terminal device in a browser form, or may run on the terminal device in an Application (APP) form, and a specific presentation form of the client is not limited herein. The server related to the present application may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, a middleware service, a domain name service, a security service, a Content Delivery Network (CDN), a big data and artificial intelligence platform, and the like. The terminal device includes, but is not limited to, a mobile phone, a computer, an intelligent voice interaction device, an intelligent appliance, a vehicle-mounted terminal, and the like. The terminal device and the server may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein. The number of servers and terminal devices is not limited. The scheme provided by the application can be independently completed by the terminal device, can also be independently completed by the server, and can also be completed by the cooperation of the terminal device and the server, so that the application is not particularly limited. The embodiment of the application can be applied to various scenes, including but not limited to cloud technology, artificial intelligence, intelligent traffic, driving assistance and the like.
Based on the hierarchical category determination system shown in fig. 1, specifically, a database stores a large amount of contents to be classified (e.g., video contents, picture contents, commodity contents, text contents, etc.), and these contents to be classified are used as inputs of a hierarchical classification model, thereby obtaining a multi-level category of the contents to be classified. The hierarchical classification is a special text classification task, that is, a hierarchical structure relationship exists between categories, and the categories can be generally expressed as a tree or an undirected graph. And the multi-level categories include at least a primary category and a secondary category. Based on the method, the content and the corresponding multi-level categories are stored in the category mapping table, and the server of the network interaction platform can call the category mapping table. The user selects the multi-level categories through the terminal equipment, and then the server pushes related contents to the terminal equipment used by the user according to the category mapping table and the multi-level categories selected by the user.
The task of classifying contents based on a hierarchical classification model specifically relates to a Computer Vision (CV) technology, a Natural Language Processing (NLP) technology, a Machine Learning (ML) technology, and the like based on Artificial Intelligence (AI). The AI is a theory, method, technique and application system that simulates, extends and expands human intelligence, senses the environment, acquires knowledge and uses the knowledge to obtain the best results using a digital computer or a machine controlled by a digital computer. In other words, AI is an integrated technique of computer science that attempts to understand the essence of intelligence and produces a new intelligent machine that can react in a manner similar to human intelligence. AI is to study the design principles and implementation methods of various intelligent machines, so that the machine has the functions of perception, reasoning and decision making. The AI technology is a comprehensive subject, and relates to the field of extensive technology, both hardware level technology and software level technology. The AI base technologies generally include technologies such as sensors, dedicated AI chips, cloud computing, distributed storage, big data processing technologies, operating/interactive systems, mechatronics, and the like. The AI software technology mainly comprises CV technology, voice processing technology, NLP technology, ML/deep learning, automatic driving, intelligent traffic and other directions.
The CV is a science for researching how to make a machine look, and in particular, it refers to that a camera and a computer are used to replace human eyes to perform machine vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the computer processing becomes an image more suitable for human eyes to observe or is transmitted to an instrument to detect. As a scientific discipline, CV research-related theories and techniques attempt to build AI systems that can acquire information from images or multidimensional data. CV technologies generally include image processing, image recognition, image semantic understanding, image retrieval, Optical Character Recognition (OCR), video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technologies, virtual reality, augmented reality, synchronous positioning and map construction, automatic driving, smart transportation, and other technologies, and also include common biometric technologies such as face recognition and fingerprint recognition.
Among them, NLP is an important direction in the fields of computer science and AI. It studies various theories and methods that enable efficient communication between humans and computers using natural language. NLP is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. NLP techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.
The ML is a multi-field interdisciplinary subject and relates to a plurality of subjects such as probability theory, statistics, approximation theory, convex analysis and algorithm complexity theory. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. ML is the core of AI, is the fundamental way to make computers intelligent, and is applied throughout various areas of AI. ML and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, migratory learning, inductive learning, and formal learning.
Referring to fig. 2, fig. 2 is a schematic diagram illustrating a hierarchical classification model based on a hierarchical classification model for determining a multi-level category in an embodiment of the present application, where as shown in the figure, the hierarchical classification model uses a multi-classification model (i.e., a first classifier) for both the first level category and the second level category, and inputs target text information corresponding to content to be classified into an encoder to obtain a text coding vector. The text encoding vector is input to a first classifier, and a first distribution vector is output by the first classifier. Based on the first distribution vector, K semantic vectors corresponding to the K first element scores may be determined. And fusing the K semantic vectors and the text coding vector to obtain a text fusion vector. And inputting the text fusion vector into a second classifier, and outputting a second distribution vector by the second classifier. And combining the first distribution vector and the second distribution vector to determine the multi-level category of the content to be classified. It can be seen that the categories have a relationship of upper and lower levels, and the upper level category is a parent level of the lower level category, and the granularity is finer toward the lower level.
With reference to the above description, the solution provided in the embodiment of the present application relates to NLP, CV, ML and other technologies of AI, and a method for determining a multi-level category in the present application is described below, with reference to fig. 3, an embodiment of the multi-level category determining method in the embodiment of the present application includes:
110. acquiring a text coding vector corresponding to target text information;
in one or more embodiments, the multi-level category determining apparatus obtains target text information, where the target text information may be derived from a video, a picture, an article, or the like, and the target text information may be a title sentence, a paragraph, or the like, which is not limited herein.
Specifically, assume that the target text information is "skip one skip, teach you 600 minutes of strategy", and the target text information is used as an input of an encoder (encoder). Whether the input needs to be cut into words or directly input according to word granularity mainly depends on the type of the Encoder, for example, the Encoder adopts a Convolutional Neural Network (CNN) or a Long-Short Term Memory artificial Neural network (LSTM), and then mainly uses a word cutting mode, and for example, the Encoder adopts a Bidirectional code representation from transforms (rt be) model based on a converter, and then mainly uses word granularity.
For convenience of understanding, please refer to fig. 4, where fig. 4 is a schematic diagram illustrating an implementation of text information encoding based on an encoder in the embodiment of the present application, and as shown in the figure, target text information is input to a trained BERT model in a word granularity form and encoded to generate a text encoding vector. Taking 768 dimensions of semantic vector of each word as an example, the first character "CLS" output vector (768 dimensions) is usually taken as the vector representation of the whole target text information, so the final text encoding vector is 768 dimensions too. That is, l1_ emb is bert (text), where l1_ emb denotes a text encoding vector and text denotes target text information.
It should be noted that the multi-level category determining apparatus may be deployed in a server, or may be deployed in a terminal device, or may be deployed in a system composed of a terminal device and a server, which is not limited herein.
120. Based on the text coding vector, obtaining a first distribution vector through a first classifier included in a hierarchical classification model, wherein the first distribution vector comprises M first element scores, each first element score is represented as a probability value of a primary category, and M is an integer greater than 1;
in one or more embodiments, the multi-level category determination device inputs the text encoding vector into a first classifier (classification 1) included in the hierarchical classification model, and obtains a first distribution vector (logits1) by the first classifier. Namely, logits1 is classsify 1(l1_ emb). The first distribution vector is a prediction result of the first-level categories, the first distribution vector comprises M first element scores, M represents the total number of the first-level categories, and each first element score is represented as a probability value corresponding to the first-level category.
Specifically, taking M as an example 5, and assuming that the first distribution vector is (0,0.1,0.7,0.2,0), wherein the first element score "0" indicates that the probability value belonging to "game" is "0", the first element score "0.1" indicates that the probability value belonging to "dance" is "0.1", the first element score "0.7" indicates that the probability value belonging to "science" is "0.7", the first element score "0.2" indicates that the probability value belonging to "nature" is "0.2", and the last first element score "0" indicates that the probability value belonging to "motion" is "0". It can be seen that the probability that the target text information belongs to "science and technology" is the greatest.
130. Generating a text fusion vector according to the first distribution vector and the text coding vector;
in one or more embodiments, the classification results obtained by a higher level model (e.g., a first classifier) are generally more accurate, and thus, the first distribution vector may be input as a priori knowledge to a next level model (e.g., a second classifier).
Specifically, the multi-level category determination device generates a text fusion vector in combination with the first distribution vector and the text encoding vector. Namely, the text fusion vector has two-dimensional feature vectors, namely a text coding vector of the target text information and a first distribution vector output by the first classifier, so that the multi-dimensional feature vectors need to be fused.
140. Based on the text fusion vector, obtaining a second distribution vector through a second classifier included in the hierarchical classification model, wherein the second distribution vector comprises N second element scores, each second element score represents a probability value of a secondary category, the secondary category belongs to a next-level category of the primary category, and N is an integer greater than 1;
in one or more embodiments, the multi-level category determining apparatus inputs the text fusion vector (L2_ locations) into a second classifier (classification 2) included in the hierarchical classification model, and acquires a second distribution vector (locations 2) by the second classifier. That is, locations 2 is classsify 2(L2_ locations), where the second distribution vector is the prediction result of the secondary category, the second distribution vector includes N second element scores, N represents the total number of the secondary categories, and each second element score represents a probability value corresponding to the secondary category.
It should be noted that the secondary category belongs to the next primary category (i.e., sub-category) of the primary category. Typically, M is less than N, for example, there are 44 primary categories, including theme coarse-grained categories such as "sports", "games" and "entertainment", each of which may be subdivided into a number of such secondary categories, for example, 305 fine-grained secondary categories.
150. And determining a target primary category to which the target text information belongs according to the first distribution vector, and determining a target secondary category to which the target text information belongs according to the second distribution vector.
In one or more embodiments, the multi-level category determining device determines a level category corresponding to the maximum first element score according to the first distribution vector, and takes the level category as a target level category. Similarly, the secondary category corresponding to the maximum second element score is determined according to the second distribution vector, and the secondary category is used as the target secondary category.
It should be noted that, the present application is described by taking the output of the target first-level category and the target second-level category as examples, in practical applications, more levels of categories may also be output, and accordingly, the hierarchical classification model also needs to include a classifier that outputs different categories, which is not described herein again.
In the embodiment of the application, a method for determining multi-level categories is provided. According to the method, the prediction result corresponding to the first-level category is used as priori knowledge, the text coding vectors are fused to obtain the text fusion vector, the text fusion vector is used as a basis for predicting the second-level category, namely the output of the next level is predicted based on the output result of the previous level, and the constraint relation between the upper layer and the lower layer in the category system can be fully and effectively utilized. Therefore, when the second classifier carries out prediction, the second classifier related to the prediction result of the first class can be concerned more, so that the classification effect of the classes is enhanced, and the classification accuracy is improved.
Optionally, on the basis of each embodiment corresponding to fig. 3, in another optional embodiment provided in the embodiment of the present application, the generating a text fusion vector according to the first distribution vector and the text encoding vector specifically includes:
generating prior semantic vectors according to the first distribution vector based on a primary category vector mapping relation, wherein the primary category vector mapping relation comprises a one-to-one mapping relation between M index values and M semantic vectors, and each index value corresponds to one primary category;
and generating a text fusion vector according to the prior semantic vector and the text coding vector.
In one or more embodiments, a manner of generating a text fusion vector based on a prior semantic vector and a text encoding vector is presented. It can be seen from the foregoing embodiment that, first, the largest first K first element scores need to be selected from the first distribution vector, where K may be an integer greater than or equal to 1 and less than or equal to M. And then, generating a prior semantic vector based on the primary category vector mapping relation, and finally, carrying out fusion processing on the prior semantic vector and the text coding vector to obtain a text fusion vector.
Specifically, the primary category vector mapping relationship includes a one-to-one mapping relationship between M index values and M semantic vectors, and each index value corresponds to one primary category. The semantic vector can be expressed as a word vector, the word vector can be trained on a general corpus or a specific corpus in a general word2vec, one-hot (one-hot), BERT or matrix decomposition way. For example, taking word2vec as an example, please refer to table 1, where table 1 is an illustration of a primary category vector mapping relationship.
TABLE 1
Class I order Index value Semantic vector
Game machine 1 (0.1,-0.7,0.03,…,-0.9)
Film 2 (0.3,-0.2,0.03,…,-0.8)
Sports 3 (-0.1,-0.9,0.23,…,0.1)
Live broadcast 4 (0.2,-0.1,0.13,…,-0.7)
Comprehensive art 5 (0.3,-0.4,0.22,…,-0.1)
Cartoon 6 (0.3,-0.6,0.04,…,-0.9)
The index value and the first class have a unique mapping relation, and the semantic vector and the first class also have a unique mapping relation. Based on the method, according to the maximum first K first element values in the first distribution vector, the corresponding K index values can be found out, and then the K semantic vectors corresponding to the K index values are determined, so that the K semantic vectors can be used as a basis for calculating the prior semantic vector.
Secondly, in the embodiment of the application, a method for generating a text fusion vector based on a prior semantic vector and a text coding vector is provided, and through the method, a primary category vector mapping relation is introduced as a basis for generating the prior semantic vector, so that the feature expression of a corresponding primary category in a first distribution vector is strengthened, and the accuracy of category classification is improved.
Optionally, on the basis of each embodiment corresponding to fig. 3, in another optional embodiment provided in the embodiment of the present application, based on the first-level category vector mapping relationship, the generating a prior semantic vector according to the first distribution vector specifically includes:
determining the first K first element scores with the maximum probability value from the first distribution vector, wherein each first element score corresponds to an index value of one primary class, and K is an integer which is greater than 1 and less than M;
acquiring an index value corresponding to each first element score in the previous K first element scores to obtain K index values;
acquiring corresponding K semantic vectors according to the K index values based on the primary category vector mapping relation;
for each index value in the K index values, performing weighted calculation on a semantic vector corresponding to the index value and a first element score corresponding to the index value to obtain an updated semantic vector corresponding to the index value;
and summing the updated semantic vectors corresponding to the K index values to obtain a prior semantic vector.
In one or more embodiments, a way to compute a priori semantic vector is introduced. As can be seen from the foregoing embodiment, it is assumed that K is an integer greater than 1 and less than M, that is, the first K first element scores with the largest probability value are determined from the first distribution vector, and the following description will be given with reference to an example.
Specifically, assuming that M is 6 and the first distribution vector is (0.4,0.2,0.05,0.05,0,0.3), please refer to table 2 for easy understanding, and table 2 is an illustration of the corresponding relationship between the first element score and the index value in the first distribution vector.
TABLE 2
First element score Index value
0.4 1
0.2 2
0.05 3
0.05 4
0 5
0.3 6
Based on this, assuming that K is 3, the first K first element scores with the highest probability values are determined from the first distribution vector to be "0.4", "0.2", and "0.3", respectively, and the corresponding K index values are "1", "2", and "6", respectively. In combination with the first-level category vector mapping relationship provided in table 1 in the foregoing embodiment, three corresponding semantic vectors can be obtained, that is:
the semantic vector corresponding to the index value "1" is (0.1, -0.7,0.03, …, -0.9);
the semantic vector corresponding to the index value "2" is (0.3, -0.2,0.03, …, -0.8);
the semantic vector corresponding to the index value "6" is (0.3, -0.6,0.04, …, -0.9);
therefore, the semantic vectors corresponding to the same index values and the first element scores are subjected to weighted calculation to obtain updated semantic vectors corresponding to the index values, namely:
the updated semantic vector corresponding to the index value "1" is (0.04, -0.28,0.012, …, -0.36);
the updated semantic vector corresponding to the index value "2" is (0.06, -0.04,0.006, …, -0.16);
the updated semantic vector corresponding to the index value "6" is (0.09, -0.18,0.012, …, -0.27);
and finally, summing the updated semantic vectors corresponding to the three index values, namely adding element values of corresponding positions in the updated semantic vectors to obtain the prior semantic vector of (0.19, -0.5,0.03, …, -0.79).
In the embodiment of the application, a method for obtaining the prior semantic vector through calculation is provided, and through the method, the semantic vector corresponding to the index value can be queried by using the first-level category vector mapping relation, so that the feature expression of the corresponding first-level category in the first distribution vector is strengthened, and the accuracy of category classification is improved.
Optionally, on the basis of each embodiment corresponding to fig. 3, in another optional embodiment provided in the embodiment of the present application, based on the first-level category vector mapping relationship, the generating a prior semantic vector according to the first distribution vector specifically includes:
determining a first element score with the maximum probability value from the first distribution vector, wherein the first element score corresponds to an index value of one level class;
if the first element score is larger than or equal to the element score threshold, acquiring an index value corresponding to the first element score;
acquiring a corresponding semantic vector according to the index value based on the primary category vector mapping relation;
and carrying out weighted calculation on the semantic vector corresponding to the index value and the first element score corresponding to the index value to obtain a prior semantic vector.
In one or more embodiments, a way to compute a priori semantic vector is introduced. As can be seen from the foregoing embodiment, it is assumed that K is 1, that is, the first element score with the largest probability value is determined from the first distribution vector, and the following description will be given with reference to an example.
Specifically, assuming that M is 6 and the first distribution vector is (0.4,0.2,0.05,0.05,0,0.3), for easy understanding, please refer to table 2 again, based on which, assuming that K is 1, the first element score with the largest probability value is determined to be "0.4" and its corresponding K index values are "1" from the first distribution vector. In combination with the first-level category vector mapping relationship provided in table 1 in the foregoing embodiment, a corresponding semantic vector can be obtained, that is:
the semantic vector corresponding to the index value "1" is (0.1, -0.7,0.03, …, -0.9);
therefore, the semantic vectors corresponding to the same index values and the first element scores are subjected to weighted calculation to obtain the prior semantic vectors corresponding to the index values, namely:
the index value "1" corresponds to the a priori semantic vector of (0.04, -0.28,0.012, …, -0.36).
In the embodiment of the application, another way of obtaining the prior semantic vector through calculation is provided, and through the way, the semantic vector corresponding to one index value can be queried by using the first-level category vector mapping relation, so that the feature expression of the first-level category which is most likely to appear in the first distribution vector is strengthened, and the accuracy of category classification is favorably improved.
Optionally, on the basis of each embodiment corresponding to fig. 3, in another optional embodiment provided in the embodiment of the present application, based on the first-level category vector mapping relationship, the generating a prior semantic vector according to the first distribution vector specifically includes:
acquiring an index value corresponding to each first element score in the first distribution vector to obtain M index values;
acquiring corresponding M semantic vectors according to the M index values based on the primary category vector mapping relation;
for each index value in the M index values, performing weighted calculation on a semantic vector corresponding to the index value and a first element score corresponding to the index value to obtain an updated semantic vector corresponding to the index value;
and summing the updated semantic vectors corresponding to the K index values to obtain a prior semantic vector.
In one or more embodiments, a way to compute a priori semantic vector is introduced. As can be seen from the foregoing embodiments, it is assumed that K is an integer equal to M, that is, each first element score is determined to be processed from the first distribution vector, and the following description will be given with reference to an example.
Specifically, assuming that M is 6 and the first distribution vector is (0.4,0.2,0.05,0.05,0,0.3), for ease of understanding, referring again to table 2, based on which, assuming that K is equal to M, the respective first element scores are determined from the first distribution vector as "0.4", "0.2", "0.05", "0.05", "0", and "0.3", respectively, and the corresponding K index values are "1", "2", "3", "4", "5", and "6", respectively. In combination with the first-level category vector mapping relationship provided in table 1 in the foregoing embodiment, six corresponding semantic vectors can be obtained, that is:
the semantic vector corresponding to the index value "1" is (0.1, -0.7,0.03, …, -0.9);
the semantic vector corresponding to the index value "2" is (0.3, -0.2,0.03, …, -0.8);
the semantic vector corresponding to the index value "3" is (-0.1, -0.9,0.23, …, 0.1);
the semantic vector corresponding to the index value "4" is (0.2, -0.1,0.13, …, -0.7);
the semantic vector corresponding to the index value "5" is (0.3, -0.4,0.22, …, -0.1);
the semantic vector corresponding to the index value "6" is (0.3, -0.6,0.04, …, -0.9);
therefore, the semantic vectors corresponding to the same index values and the first element scores are subjected to weighted calculation to obtain updated semantic vectors corresponding to the index values, namely:
the updated semantic vector corresponding to the index value "1" is (0.04, -0.28,0.012, …, -0.36);
the updated semantic vector corresponding to the index value "2" is (0.06, -0.04,0.006, …, -0.16);
the update semantic vector corresponding to index value "3" is (-0.005, -0.045,0.0115, …, 0.005);
the update semantic vector corresponding to the index value "4" is (0.01, -0.005,0.0065, …, -0.035);
the update semantic vector corresponding to the index value "5" is (0, …, 0);
the updated semantic vector corresponding to the index value "6" is (0.09, -0.18,0.012, …, -0.27);
and finally, summing the updated semantic vectors corresponding to the six index values, namely adding element values of corresponding positions in the updated semantic vectors to obtain the prior semantic vector of (0.195, -0.55,0.048, …, -0.83).
In the embodiment of the application, another way of obtaining the prior semantic vector through calculation is provided, and through the way, the semantic vector corresponding to the index value can be queried by using the first-level category vector mapping relation, so that the feature expression of each first-level category in the first distribution vector is strengthened, and the accuracy of category classification is improved.
Optionally, on the basis of each embodiment corresponding to fig. 3, in another optional embodiment provided in the embodiment of the present application, the generating a text fusion vector according to the prior semantic vector and the text coding vector specifically includes:
splicing the prior semantic vector and the text coding vector to obtain a text fusion vector;
or the like, or, alternatively,
generating a text fusion vector according to the prior semantic vector and the text coding vector, which specifically comprises:
determining a first-order vector mapping according to a priori semantic vector, a text coding vector and a first parameter matrix, wherein the priori semantic vector is represented as a p-dimensional vector, the text coding vector is represented as a q-dimensional vector, the first parameter matrix is represented as a [ d (p + q) ] dimensional matrix, and p, q and d are integers more than 1;
determining a second-order vector mapping according to the prior semantic vector, the text coding vector and a second parameter matrix, wherein the second parameter matrix is expressed as a (d × p × q) dimensional matrix;
and generating a text fusion vector according to the first-order vector mapping, the second-order vector mapping and the offset vector.
In one or more embodiments, two ways of generating a text fusion vector by using a priori semantic vector and a text encoding vector are introduced, one way is direct concatenation (concat), and the other way is to introduce a first-order vector mapping way and a second-order vector mapping way to enhance the interaction of multi-dimensional vector features, which will be separately described below.
Exemplarily, assuming that the prior semantic vector (e _ emb) is 300 dimensions and the text encoding vector (L1_ emb) is 768 dimensions, a 1068-dimensional text fusion vector (L2_ logits) is obtained after splicing.
Illustratively, a bilinear (bilinear) fusion mode is adopted, namely, first-order vector mapping and second-order vector mapping are mixed, so that the interaction of two dimensionality vector characteristics can be effectively strengthened, and the effect is improved. The interaction process of the two dimensional features is as follows:
Figure BDA0003274815530000141
where L2_ locations represents a text fusion vector.
Figure BDA0003274815530000142
Representing a first order vector mapping. l1_ emb represents a text encoding vector, and l1_ emb ∈ RqI.e. the text encoding vector is represented as a q-dimensional vector. e _ emb represents the prior semantic vector, and e _ emb is equal to RpI.e. the a priori semantic vector is represented as a p-dimensional vector. V represents a first parameter matrix and V ∈ Rd*(p+q)I.e. the first parameter matrix is denoted as [ d x (p + q)]A dimension matrix. []The stitching process is indicated. b denotes a bias vector. l1_ emb W[1:d]E _ emb represents a second order vector mapping, i.e., a two-line interaction (i.e., a second order interaction). W[1:d]Represents a second parameter matrix, and W[1:d]∈Rp *q*dI.e. the second parameter matrix is represented as a matrix of dimensions (d p q). d represents the output dimension, belonging to the hyper-parameter.
In the embodiment of the application, two modes of generating the text fusion vector by using the prior semantic vector and the text coding vector are provided, and vectors can be directly spliced in one implementation by the modes, so that the operation difficulty can be reduced while feature fusion is realized. In another implementation, a mode of mixing first-order vector mapping and second-order vector mapping is introduced, so that interaction between two dimensionality vector characteristics can be effectively strengthened, and the effect is improved.
Optionally, on the basis of each embodiment corresponding to fig. 3, in another optional embodiment provided in the embodiment of the present application, the generating a text fusion vector according to the first distribution vector and the text encoding vector specifically includes:
splicing the first distribution vector and the text coding vector to obtain a text fusion vector;
or the like, or, alternatively,
generating a text fusion vector according to the first distribution vector and the text encoding vector, specifically comprising:
determining a first-order vector mapping according to a first distribution vector, a text coding vector and a first parameter matrix, wherein the first distribution vector is represented as a p-dimensional vector, the text coding vector is represented as a q-dimensional vector, the first parameter matrix is represented as a [ d (p + q) ] dimensional matrix, and p, q and d are integers more than 1;
determining a second-order vector mapping according to the first distribution vector, the text encoding vector and a second parameter matrix, wherein the second parameter matrix is expressed as a (d × p × q) dimensional matrix;
and generating a text fusion vector according to the first-order vector mapping, the second-order vector mapping and the offset vector.
In one or more embodiments, two ways of generating a text fusion vector by using a first distribution vector and a text encoding vector are introduced, one way is direct concatenation (concat), and the other way is to introduce a first-order vector mapping way and a second-order vector mapping way to enhance the interaction of multi-dimensional vector features, which will be separately described below.
Illustratively, assuming that the first distribution vector (logits1) is 44 dimensions and the text encoding vector (L1_ emb) is 768 dimensions, a text fusion vector (L2_ logits) of 812 dimensions is obtained after splicing.
Illustratively, a bilinear (bilinear) fusion mode is adopted, namely, first-order vector mapping and second-order vector mapping are mixed, so that the interaction of two dimensionality vector characteristics can be effectively strengthened, and the effect is improved. The interaction process of the two dimensional features is as follows:
Figure BDA0003274815530000143
where L2_ locations represents a text fusion vector.
Figure BDA0003274815530000151
Representing a first order vector mapping. l1_ emb represents a text encoding vector, and l1_ emb ∈ RqI.e. the text encoding vector is represented as a q-dimensional vector. logits1 represents the first distribution vector, and logits1 ∈ RpI.e. the first distribution vector is represented as a p-dimensional vector. V represents a first parameter matrix and V ∈ Rd*(p+q)I.e. the first parameter matrix is denoted as [ d x (p + q)]A dimension matrix. []The stitching process is indicated. b denotes a bias vector. l1_ emb W[1:d]Logits1 represents a second order vector mapping, i.e., a two-line interaction (i.e., a second order interaction). W[1:d]Represents a second parameter matrix, and W[1:d]∈Rp*q*dI.e. the second parameter matrix is represented as a matrix of dimensions (d p q). d represents the output dimension, belonging to the hyper-parameter.
Secondly, in the embodiment of the application, two ways of generating the text fusion vector by using the first distribution vector and the text coding vector are provided, and through the ways, vectors can be directly spliced in one implementation, so that the operation difficulty can be reduced while feature fusion is realized. In another implementation, a mode of mixing first-order vector mapping and second-order vector mapping is introduced, so that interaction between two dimensionality vector characteristics can be effectively strengthened, and the effect is improved.
Optionally, on the basis of each embodiment corresponding to fig. 3, in another optional embodiment provided in this application embodiment, before obtaining the text encoding vector corresponding to the target text information, the method may further include:
acquiring target text information corresponding to a target video, wherein the target text information comprises at least one item of title information, abstract information, subtitle information and comment information of the target video;
or the like, or, alternatively,
acquiring target text information corresponding to a target picture, wherein the target text information comprises at least one of title information, author information, Optical Character Recognition (OCR) information and abstract information of the target picture;
or the like, or, alternatively,
acquiring target text information corresponding to a target commodity, wherein the target text information comprises at least one item of commodity name information, place of production information, comment information and commodity description information of the target commodity;
or the like, or, alternatively,
and acquiring target text information corresponding to the target text, wherein the target text information comprises at least one item of title information, author information, abstract information, comment information and text information of the target text.
In one or more embodiments, various ways of extracting target textual information from various types of information are described. As can be seen from the foregoing embodiments, the target text information may be sourced from a video platform, a public number, an e-commerce platform, or a microblog, and the hierarchical classification task will be described below by taking a target video, a target picture, a target commodity, and a target text as examples.
The method comprises the steps of firstly, performing hierarchical classification on a target video;
illustratively, a video title, summary information, subtitle information, and comment information are main components of video content, and an algorithm such as NLP is used to complete parsing of a text, so as to enhance understanding of semantic information of a video, which is one of important works of a video search system. The summary information may be brief introduction of the video, the subtitle information is subtitles extracted from the video, and the comment information is information for a user to comment on the video.
For convenience of understanding, please refer to fig. 5, where fig. 5 is a schematic diagram of a video content hierarchy classification task in an embodiment of the present application, and as shown in the figure, it is assumed that target text information extracted from a target video is "hero is not developed, economy is suppressed, and it is completely impossible to continue fighting", and the target text information is input into a hierarchical classification model, so that a target first-level category is "game", and a target second-level category is "hand trip".
Task two, performing hierarchical classification on the target picture;
illustratively, title information, author information, OCR information and summary information are main components of picture content, and parsing of text is completed in combination with algorithms such as NLP, so as to enhance understanding of semantic information of a picture, which is one of important works of a picture search system. The author information represents the name of a photographer or maker information of the picture, the OCR information represents text information recognized from the picture, and the summary information may be brief introduction to the picture.
For easy understanding, please refer to fig. 6, fig. 6 is a schematic diagram of a task of hierarchical classification of picture content in the embodiment of the present application, and as shown in the figure, it is assumed that target text information extracted from a target picture is "a person in the picture is proud and self-esteem, she wears luxurious clothes and sits on an luxury open horse car", and the target text information is input into a hierarchical classification model, thereby obtaining a target first-level category as "painting" and a target second-level category as "person".
Thirdly, performing hierarchical classification on the target commodities;
illustratively, commodity name information, place of production information, comment information and commodity description information are main components of commodity contents, and text analysis is completed by combining algorithms such as NLP (NLP) and the like, so that understanding of commodity semantic information is enhanced, and the method is one of important works of a commodity search system. The comment information represents the comment of the buyer to the commodity, and the commodity description information represents the brief introduction of the commodity by the merchant.
For convenience of understanding, please refer to fig. 7, where fig. 7 is a schematic diagram of a hierarchical classification task of commodity content in an embodiment of the present application, and as shown in the figure, it is assumed that target text information extracted from a target picture is "commodity type: a calendar; the producing area: zhejiang; sales volume: 5000/month; price: and 18 yuan, inputting the target text information into the hierarchical classification model, thereby obtaining a target first-level category as 'electric appliance' and a target second-level category as 'telephone'.
Step four, performing hierarchical classification on the target text;
illustratively, title information, author information, summary information, comment information and text information are main components of text content, and the parsing of the text is completed by combining with algorithms such as NLP and the like, so that the understanding of text semantic information is enhanced, and the method is one of important works of a text search system. The comment information represents the comment of the buyer to the commodity, and the commodity description information represents the brief introduction of the commodity by the merchant.
For convenience of understanding, please refer to fig. 8, fig. 8 is a schematic diagram of a text content hierarchical classification task in the embodiment of the present application, and as shown in the figure, it is assumed that target text information extracted from a target picture is "rose is order, …, where rose is always highly appreciated", and the target text information is input into a hierarchical classification model, so that a target primary category is "science popularization", and a target secondary category is "plant".
It should be noted that, in practical applications, more types of tasks may be included, and the four types of hierarchical classification tasks described in this application are only an illustration and should not be construed as limitations of this application.
Secondly, in the embodiment of the application, various modes for extracting the target text information from various types of information are provided, and the method can be applied to different category classification scenes, no matter videos or pictures, no matter commodities or texts, and corresponding target text information can be extracted by adopting the method provided by the application so as to perform further prediction, so that the flexibility and diversity of the scheme are improved.
Optionally, on the basis of the foregoing respective embodiments corresponding to fig. 3, another optional embodiment provided in the embodiments of the present application may further include:
receiving a category query instruction sent by terminal equipment aiming at the content to be searched;
responding to the category query instruction, and if the content to be searched is video content, sending a video search result to the terminal equipment;
responding to the category query instruction, and if the content to be searched is the picture content, sending a picture search result to the terminal equipment;
responding to the category query instruction, and if the content to be searched is commodity content, sending a commodity search result to the terminal equipment;
and responding to the category query instruction, and if the content to be searched is text content, sending a text search result to the terminal equipment.
In one or more embodiments, various ways of pushing content corresponding to respective categories under a search scenario are described. As can be seen from the foregoing embodiment, the user may also send a category query instruction for the content to be searched to the server through the terminal device, and the server responds to the category query instruction and determines a search result that needs to be pushed to the terminal device according to the content to be searched. The following description will be made in conjunction with four types of application scenarios.
Firstly, searching a scene by a video;
the user can search for videos on the video platform, that is, the content to be searched is video content. For easy understanding, please refer to fig. 9, fig. 9 is a schematic diagram of an interface for displaying video search results in an embodiment of the present application, as shown in (a) of fig. 9, a plurality of primary categories are displayed on a video platform, and it is assumed that a user triggers a category query instruction for "heddles", and based on this, secondary categories related to the primary category "heddles" can be displayed. Such as "real show", "dance" and "emotion", etc. Assuming that the user triggers a category query instruction for "live show", as shown in (B) of fig. 9, three levels of category-related content, such as "super screenplay killer" and "escape maze" are displayed in the video search result.
Secondly, searching scenes by pictures;
the user can search for the wallpaper on the wallpaper platform, namely, the content to be searched is the picture content. For easy understanding, please refer to fig. 10, fig. 10 is a schematic diagram of an interface for displaying a picture search result in an embodiment of the present application, as shown in fig. 10 (a), a plurality of primary categories are displayed on a wallpaper platform, and it is assumed that a user triggers a category query command for "constellation wallpaper", based on which, a secondary category related to the primary category "constellation wallpaper" may be displayed. As shown in fig. 10 (B), the picture search result shows contents related to the secondary categories, such as "goat seat" and "bull seat".
Thirdly, searching scenes for commodities;
the user can search for the commodity on the E-commerce platform, namely, the content to be searched is commodity content. For convenience of understanding, referring to fig. 11, fig. 11 is a schematic diagram of an interface displaying a product search result in an embodiment of the present application, as shown in fig. 11 (a), a plurality of primary categories are displayed on an e-commerce platform, and it is assumed that a user triggers a category query instruction for "home appliance", and based on this, secondary categories related to the primary category "home appliance" may be displayed. As shown in fig. 11 (B), contents related to the second category, for example, "television", "air conditioner", and "refrigerator", are displayed in the product search result.
Fourthly, searching scenes through texts;
the user can search for novels on the e-book platform, i.e., the content to be searched is text content. For easy understanding, referring to fig. 12, fig. 12 is a schematic diagram of an interface for displaying text search results in an embodiment of the present application, as shown in fig. 12 (a), a plurality of primary categories are displayed on an electronic book platform, and assuming that a user triggers a category query instruction for "science fiction", secondary categories related to the primary category "science fiction" can be displayed based on the category query instruction. As shown in fig. 12 (B), secondary category-related contents, such as "machine era" and "science fiction world", are displayed in the text search result.
It should be noted that, in practical applications, more application scenarios may be involved, and the four types of application scenarios described in the present application are merely illustrative and should not be construed as limitations of the present application.
In the embodiment of the application, a plurality of modes for pushing the content corresponding to the corresponding category in the search scene are provided, and through the modes, the background can judge the search object (such as video content, picture content, commodity content or text content and the like) according to the content to be searched input by the user.
With reference to fig. 13, a method for training a model in the present application will be described below, and an embodiment of the method for training a model in the present application includes:
210. acquiring a predictive text coding vector corresponding to text information to be trained, wherein the text information to be trained corresponds to a primary labeling category and a secondary labeling category;
in one or more embodiments, the model training device obtains text information to be trained, wherein the text information to be trained has been previously determined to have a first-level labeling category and a second-level labeling category corresponding thereto in a manual labeling manner. For convenience of description, the following will describe training of a text message to be trained as an example, and a greater number of text messages to be trained are required in an actual training process, which is not described herein.
Specifically, the text information to be trained is taken as input to the encoder. Whether the input needs to be cut into words or directly input according to the word granularity depends on the type of the encoder. The encoder adopts a BERT model as an example for introduction, text information to be trained is input into the trained BERT model in a word granularity mode, and a predictive text coding vector is generated after coding. Taking 768 dimensions as an example of the semantic vector of each word, the first character "CLS" output vector (768 dimensions) is usually taken as the vector representation of the whole text information to be trained, so that the final predictive text coding vector is 768 dimensions as well.
It should be noted that the model training apparatus may be deployed in a server, or may be deployed in a terminal device, or may be deployed in a system composed of a terminal device and a server, which is not limited herein.
220. Based on the predictive text coding vector, obtaining a first predictive distribution vector through a first classifier to be trained included in a hierarchical classification model to be trained, wherein the first predictive distribution vector comprises M first element scores, each first element score is represented as a probability value of a primary category, and M is an integer greater than 1;
in one or more embodiments, the model training apparatus inputs the predictive text encoding vector into a first classifier to be trained included in the hierarchical classification model to be trained, and the first classifier to be trained acquires a first predictive distribution vector. The first prediction distribution vector is a prediction result of the first-level category, the first prediction distribution vector comprises M first element scores, M represents the total number of the first-level category, and each first element score is represented as a probability value corresponding to the first-level category.
It should be noted that step 220 is similar to that described in step 120 in the embodiment, and therefore, the description thereof is omitted here.
230. Generating a predicted text fusion vector according to the first predicted distribution vector and the predicted text coding vector;
in one or more embodiments, the first classifier to be trained is generally more accurate in obtaining the classification result, and therefore, the first prediction distribution vector can be input to the second classifier to be trained as a priori knowledge.
Specifically, the model training device combines the first prediction distribution vector and the predictive text coding vector to generate a predictive text fusion vector. Namely, the predicted text fusion vector has two-dimensional feature vectors, namely, the predicted text coding vector of the text information to be trained per se and the first predicted distribution vector, so that the multi-dimensional feature vectors need to be fused.
It should be noted that step 230 is similar to that described in step 130 of the embodiment and the embodiments corresponding to fig. 3, and therefore, the description thereof is omitted here.
240. Based on the prediction text fusion vector, obtaining a second prediction distribution vector through a second classifier to be trained included in a hierarchical classification model to be trained, wherein the second prediction distribution vector comprises N second element scores, each second element score represents a probability value of a second-level category, the second-level category belongs to a next-level category of the first-level category, and N is an integer greater than 1;
in one or more embodiments, the model training device inputs the predictive text fusion vector into a second classifier to be trained included in the hierarchical classification model to be trained, and the second classifier to be trained acquires a second predictive distribution vector. The second prediction distribution vector is a prediction result of the secondary category, the second prediction distribution vector comprises N second element scores, N represents the total number of the secondary category, and each second element score represents a probability value corresponding to the secondary category.
It should be noted that step 240 is similar to that described in step 140 in the embodiment, and therefore is not described herein again.
250. And updating model parameters of the hierarchical classification model to be trained according to the first prediction distribution vector, the second prediction distribution vector, the first-level labeling category and the second-level labeling category until model training conditions are met, and outputting the hierarchical classification model, wherein the hierarchical classification model comprises the first classifier and the second classifier involved in the embodiment.
In one or more embodiments, the model training device calculates a composite loss value according to the first prediction distribution vector, the second prediction distribution vector, the primary labeling category, and the secondary labeling category by using a loss function. Then, the model parameters of the hierarchical classification model to be trained can be updated by performing gradient calculation on the comprehensive loss values by adopting a random gradient descent method. When the iteration number reaches a threshold value or the comprehensive loss value is converged, the model training condition is satisfied, and therefore the hierarchical classification model comprising the trained first classifier and the trained second classifier is output.
In the embodiment of the application, a method for training a model is provided, and by the above method, a hierarchical classification model for realizing multi-level category classification can be trained, on the basis, a prediction result corresponding to a first-level category is used as prior knowledge, a text fusion vector is obtained after a text coding vector is fused, the text fusion vector is used as a basis for predicting a second-level category, that is, output of a next level is predicted based on an output result of a previous level, and a constraint relation between an upper layer and a lower layer in a category system can be fully and effectively utilized. Therefore, when the second classifier carries out prediction, the second classifier related to the prediction result of the first class can be concerned more, so that the classification effect of the classes is enhanced, and the classification accuracy is improved.
Optionally, on the basis of each embodiment corresponding to fig. 13, in another optional embodiment provided in this application embodiment, updating the model parameters of the to-be-trained hierarchical classification model according to the first prediction distribution vector, the second prediction distribution vector, the primary labeling category, and the secondary labeling category specifically includes:
calculating by adopting a first classification loss function according to the first prediction distribution vector and the primary labeling category to obtain a first loss value of the text information to be trained;
calculating by adopting a second classification loss function according to the second prediction distribution vector and the secondary labeling category to obtain a second loss value of the text information to be trained;
determining a comprehensive loss value of the text information to be trained according to the first loss value and the second loss value;
and updating the model parameters of the hierarchical classification model to be trained according to the comprehensive loss values.
In one or more embodiments, a manner of updating model parameters using a cross-entropy loss function is presented. As can be seen from the foregoing embodiments, the training may use a static weight sum as a function of the final loss of the task. In addition, a Dynamic weight sum mode can be adopted as a final Loss function of the Task, for example, Dynamic Task Priority (DTP) or Focal Loss (Focal Loss) is introduced, and such a Loss can adjust a related weight value according to the Loss value of an actual Loss module, so that the final Loss value focuses more on the module with a large Loss value.
Specifically, the loss function used in the present application may be composed of two parts, wherein the first prediction distribution vector and the first-level labeling category employ a first classification loss function (i.e., a negative log loss function), and the second prediction distribution vector and the second-level labeling category employ a second classification loss function (i.e., a negative log loss function).
Based on this, the overall loss function can be expressed as the following equation:
Loss=λ1losscls12losscls2(ii) a Formula (3)
Wherein Loss represents the comprehensive Loss value of the text information to be trained. Lambda [ alpha ]1Representing a first weight value (i.e., a hyper-parameter for adjusting the primary category classification task). Lambda [ alpha ]2Representing a second weight value (i.e., a hyper-parameter for adjusting the secondary category classification task). losscls1A first loss value representing text information to be trained. losscls2A second loss value representing the text information to be trained.
Since the present application is introduced by taking one text information to be trained as an example, if a plurality of text information to be trained are involved, the comprehensive loss values of the text information to be trained need to be accumulated.
Based on this, it is also necessary to calculate a first loss value and a second loss value, respectively, and the first loss value is calculated as follows:
Figure BDA0003274815530000201
therein, losscls1A first loss value representing text information to be trained. M represents the total number of primary categories. i denotes the ith primary category. y isiRepresenting a primary label category for the ith primary category (i.e., whether the label belongs to the ith primary category, y)iWhen 1, it means belonging to the i-th primary category, yi0 indicates not belonging to the ith primary category). a isiRepresenting the ith first element score (i.e., the probability value for the prediction to the ith primary category) in the first prediction distribution vector.
The second loss value is calculated as follows:
Figure BDA0003274815530000202
wherein, losscls2A second loss value representing the text information to be trained. N represents the total number of secondary categories. j denotes the jth secondary category. y isjRepresenting a secondary label category for the jth secondary category (i.e., whether the label belongs to the jth secondary category, y)jWhen 1, it means belonging to the jth secondary category, yj0 indicates not belonging to the jth secondary category). a isjRepresenting the jth second element score (i.e., the probability value of the prediction to the jth secondary category) in the second prediction distribution vector.
Secondly, in the embodiment of the application, a mode of updating the model parameters by adopting a cross entropy loss function is provided, and through the mode, the model is trained by utilizing the cross entropy loss values corresponding to a plurality of classifiers, so that the classification effect of the classifiers can be effectively improved, and the accuracy of multi-level category classification is improved.
Optionally, on the basis of each embodiment corresponding to fig. 13, in another optional embodiment provided in this application embodiment, updating the model parameters of the to-be-trained hierarchical classification model according to the first prediction distribution vector, the second prediction distribution vector, the primary labeling category, and the secondary labeling category specifically includes:
calculating by adopting a first classification loss function according to the first prediction distribution vector and the primary labeling category to obtain a first loss value of the text information to be trained;
calculating by adopting a second classification loss function according to the second prediction distribution vector and the secondary labeling category to obtain a second loss value of the text information to be trained;
determining a first element prediction score corresponding to the primary labeling category from the first prediction distribution vector, and determining a second element prediction score corresponding to the secondary labeling category from the second prediction distribution vector;
calculating a third loss value of the text information to be trained by adopting a hinge loss function according to the first element prediction score, the second element prediction score and the target hyper-parameter;
determining a comprehensive loss value of the text information to be trained according to the first loss value, the second loss value and the third loss value;
and updating the model parameters of the hierarchical classification model to be trained according to the comprehensive loss values.
In one or more embodiments, a manner of updating model parameters using a cross-entropy loss function and a hinge loss function is described. As can be seen from the foregoing embodiments, the training may use a static weight sum or a dynamic weight sum as a final loss function of the task.
Specifically, the loss function used in the present application may be composed of three parts, wherein the first prediction distribution vector and the first-level labeling category employ a first classification loss function (i.e., a negative log loss function), and the second prediction distribution vector and the second-level labeling category employ a second classification loss function (i.e., a negative log loss function). In order to ensure the consistency of the two-stage classification results, an additional hinge loss function is added. Given that the classification of upper classes is always easier than the classification of lower classes, adding a hinge loss function can make the probability of a first class always greater than the corresponding second class.
Based on this, the overall loss function can be expressed as the following equation:
Loss=λ1losscls12losscls23lossh(ii) a Formula (6)
Wherein Loss represents the comprehensive Loss value of the text information to be trained. Lambda [ alpha ]1Representing a first weight value (i.e., a hyper-parameter for adjusting the primary category classification task). Lambda [ alpha ]2Representing a second weight value (i.e., a hyper-parameter for adjusting the secondary category classification task). Lambda [ alpha ]3Representing a third weight value. losscls1A first loss value representing text information to be trained. losscls2A second loss value representing the text information to be trained. losshA third loss value representing the text information to be trained.
Since the present application is introduced by taking one text information to be trained as an example, if a plurality of text information to be trained are involved, the comprehensive loss values of the text information to be trained need to be accumulated.
Based on this, it is also necessary to calculate the first loss value, the second loss value, and the third loss value, respectively, and it should be noted that the first loss value is calculated by the equation (4) described in the foregoing embodiment, and the second loss value is calculated by the equation (5) described in the foregoing embodiment. The third loss value is calculated as follows (i.e., the hinge loss function):
losshmax (0, λ + l2_ score-l1_ score); formula (7)
Therein, losshA third loss value representing the text information to be trained. λ represents the target hyperparameter. l2_ score indicates that a second element prediction score corresponding to the secondary label category is determined in the second prediction distribution vector, for example, the secondary label category is "hand trip", which corresponds to the 5 th index value in the second prediction distribution vector, and therefore, the second element score corresponding to the 5 th index value in the second prediction distribution vector is taken as the second element prediction score. Similarly, l1_ score represents the determination of the first element prediction score in the first prediction distribution vector corresponding to the primary label category, e.g., the primary label category is "game" which corresponds to the 30 th index value in the first prediction distribution vector, and thus, the first element score corresponding to the 30 th index value in the first prediction distribution vector is taken as the first element prediction score. max (. cndot.) represents taking the maximum value.
Secondly, in the embodiment of the application, a mode of updating model parameters by adopting a cross entropy loss function and a hinge loss function is provided, and through the mode, the consistency of two-stage categories can be ensured by increasing the hinge loss function, namely, the categories on the upper layer are always easy to be the lower-stage classification results with fine granularity, namely, the classification difficulty of the fine granularity is higher, so that the increase of the hinge loss function can be used for ensuring that the probability of the first-stage category is always larger than that of the corresponding second-stage category.
Referring to fig. 14, fig. 14 is a schematic view of an embodiment of the multi-level category determining apparatus 30 in the present application, which includes:
an obtaining module 310, configured to obtain a text coding vector corresponding to target text information;
the obtaining module 310 is further configured to obtain, based on the text coding vector, a first distribution vector through a first classifier included in the hierarchical classification model, where the first distribution vector includes M first element scores, each first element score is represented by a probability value of one primary category, and M is an integer greater than 1;
a generating module 320, configured to generate a text fusion vector according to the first distribution vector and the text encoding vector;
the obtaining module 310 is further configured to obtain, based on the text fusion vector, a second distribution vector through a second classifier included in the hierarchical classification model, where the second distribution vector includes N second element scores, each second element score represents a probability value of one secondary category, the secondary category belongs to a next-level category of the primary category, and N is an integer greater than 1;
the determining module 330 is configured to determine a target primary category to which the target text information belongs according to the first distribution vector, and determine a target secondary category to which the target text information belongs according to the second distribution vector.
In the embodiment of the application, a multi-level category determining device is provided, and by adopting the device, the prediction result corresponding to the first-level category is used as priori knowledge, the text encoding vector is fused to obtain a text fusion vector, and the text fusion vector is used as a basis for predicting the second-level category, namely, the output of the next level is predicted based on the output result of the previous level, so that the constraint relation between the upper layer and the lower layer in a category system can be fully and effectively utilized. Therefore, when the second classifier carries out prediction, the second classifier related to the prediction result of the first class can be concerned more, so that the classification effect of the classes is enhanced, and the classification accuracy is improved.
Alternatively, on the basis of the embodiment corresponding to fig. 14, in another embodiment of the multi-level category determining apparatus 30 provided in the embodiment of the present application,
a generating module 320, configured to generate a priori semantic vector according to the first distribution vector based on a primary category vector mapping relationship, where the primary category vector mapping relationship includes mapping relationships between M index values and M semantic vectors, where each index value corresponds to one primary category;
and generating a text fusion vector according to the prior semantic vector and the text coding vector.
In the embodiment of the application, a multi-level category determining device is provided, and by adopting the device, a level category vector mapping relation is introduced as a basis for generating a prior semantic vector, and the feature expression of a corresponding level category in a first distribution vector is strengthened, so that the accuracy of category classification is improved.
Alternatively, on the basis of the embodiment corresponding to fig. 14, in another embodiment of the multi-level category determining apparatus 30 provided in the embodiment of the present application,
a generating module 320, configured to determine, from the first distribution vector, the first K first element scores with the largest probability value, where each first element score corresponds to an index value of one primary class, and K is an integer greater than 1 and smaller than M;
acquiring an index value corresponding to each first element score in the previous K first element scores to obtain K index values;
acquiring corresponding K semantic vectors according to the K index values based on the primary category vector mapping relation;
for each index value in the K index values, performing weighted calculation on a semantic vector corresponding to the index value and a first element score corresponding to the index value to obtain an updated semantic vector corresponding to the index value;
and summing the updated semantic vectors corresponding to the K index values to obtain a prior semantic vector.
In the embodiment of the application, a multi-level category determining device is provided, and by adopting the device, semantic vectors corresponding to index values can be queried by utilizing a one-level category vector mapping relation, so that feature expression of corresponding one-level categories in a first distribution vector is strengthened, and accuracy of category classification is improved.
Alternatively, on the basis of the embodiment corresponding to fig. 14, in another embodiment of the multi-level category determining apparatus 30 provided in the embodiment of the present application,
a generating module 320, configured to determine, from the first distribution vector, a first element score with a maximum probability value, where the first element score corresponds to an index value of a primary class;
if the first element score is larger than or equal to the element score threshold, acquiring an index value corresponding to the first element score;
acquiring a corresponding semantic vector according to the index value based on the primary category vector mapping relation;
and carrying out weighted calculation on the semantic vector corresponding to the index value and the first element score corresponding to the index value to obtain a prior semantic vector.
In the embodiment of the application, a multi-level category determining device is provided, and by adopting the device, a semantic vector corresponding to an index value can be queried by using a one-level category vector mapping relation, so that feature expression of the most likely one-level category in a first distribution vector is strengthened, and accuracy of category classification is improved.
Alternatively, on the basis of the embodiment corresponding to fig. 14, in another embodiment of the multi-level category determining apparatus 30 provided in the embodiment of the present application,
the generating module 320 is specifically configured to obtain an index value corresponding to each first element score in the first distribution vector, so as to obtain M index values;
acquiring corresponding M semantic vectors according to the M index values based on the primary category vector mapping relation;
for each index value in the M index values, performing weighted calculation on a semantic vector corresponding to the index value and a first element score corresponding to the index value to obtain an updated semantic vector corresponding to the index value;
and summing the updated semantic vectors corresponding to the K index values to obtain a prior semantic vector.
In the embodiment of the application, a multi-level category determining device is provided, and by adopting the device, semantic vectors corresponding to index values can be inquired by utilizing a one-level category vector mapping relation, so that feature expression of each one-level category in a first distribution vector is strengthened, and the accuracy of category classification is improved.
Alternatively, on the basis of the embodiment corresponding to fig. 14, in another embodiment of the multi-level category determining apparatus 30 provided in the embodiment of the present application,
the generating module 320 is specifically configured to perform splicing processing on the prior semantic vector and the text coding vector to obtain a text fusion vector;
or the like, or, alternatively,
a generating module 320, configured to determine a first-order vector mapping according to a prior semantic vector, a text coding vector, and a first parameter matrix, where the prior semantic vector is represented as a p-dimensional vector, the text coding vector is represented as a q-dimensional vector, the first parameter matrix is represented as a [ d (p + q) ] dimensional matrix, and p, q, and d are integers greater than 1;
determining a second-order vector mapping according to the prior semantic vector, the text coding vector and a second parameter matrix, wherein the second parameter matrix is expressed as a (d × p × q) dimensional matrix;
and generating a text fusion vector according to the first-order vector mapping, the second-order vector mapping and the offset vector.
In the embodiment of the application, a multistage category determining device is provided, and by adopting the device, vectors can be directly spliced in one implementation, so that the operation difficulty can be reduced while the feature fusion is realized. In another implementation, a mode of mixing first-order vector mapping and second-order vector mapping is introduced, so that interaction between two dimensionality vector characteristics can be effectively strengthened, and the effect is improved.
Alternatively, on the basis of the embodiment corresponding to fig. 14, in another embodiment of the multi-level category determining apparatus 30 provided in the embodiment of the present application,
the generating module 320 is specifically configured to perform splicing processing on the first distribution vector and the text coding vector to obtain a text fusion vector;
or the like, or, alternatively,
a generating module 320, configured to determine a first-order vector mapping according to a first distribution vector, a text coding vector, and a first parameter matrix, where the first distribution vector is represented as a p-dimensional vector, the text coding vector is represented as a q-dimensional vector, the first parameter matrix is represented as a [ d × (p + q) ] dimensional matrix, and p, q, and d are integers greater than 1;
determining a second-order vector mapping according to the first distribution vector, the text encoding vector and a second parameter matrix, wherein the second parameter matrix is expressed as a (d × p × q) dimensional matrix;
and generating a text fusion vector according to the first-order vector mapping, the second-order vector mapping and the offset vector.
In the embodiment of the application, a multistage category determining device is provided, and by adopting the device, vectors can be directly spliced in one implementation, so that the operation difficulty can be reduced while the feature fusion is realized. In another implementation, a mode of mixing first-order vector mapping and second-order vector mapping is introduced, so that interaction between two dimensionality vector characteristics can be effectively strengthened, and the effect is improved.
Alternatively, on the basis of the embodiment corresponding to fig. 14, in another embodiment of the multi-level category determining apparatus 30 provided in the embodiment of the present application,
the obtaining module 310 is further configured to obtain target text information corresponding to the target video before obtaining a text coding vector corresponding to the target text information, where the target text information includes at least one of title information, summary information, subtitle information, and comment information of the target video;
or the like, or, alternatively,
the obtaining module 310 is further configured to obtain target text information corresponding to the target picture before obtaining a text coding vector corresponding to the target text information, where the target text information includes at least one of title information, author information, Optical Character Recognition (OCR) information, and abstract information of the target picture;
or the like, or, alternatively,
the obtaining module 310 is further configured to obtain target text information corresponding to the target commodity before obtaining a text coding vector corresponding to the target text information, where the target text information includes at least one of commodity name information, place of origin information, comment information, and commodity description information of the target commodity;
or the like, or, alternatively,
the obtaining module 310 is further configured to obtain target text information corresponding to the target text before obtaining a text coding vector corresponding to the target text information, where the target text information includes at least one of title information, author information, summary information, comment information, and body information of the target text.
In the embodiment of the application, a multi-level category determining device is provided, and by adopting the device, the method can be applied to different category classification scenes, no matter videos or pictures, no matter commodities or texts, and corresponding target text information can be extracted by adopting the method provided by the application, so that further prediction is carried out, and the flexibility and diversity of the scheme are improved.
Optionally, on the basis of the embodiment corresponding to fig. 14, in another embodiment of the multi-level category determining apparatus 30 provided in the embodiment of the present application, the multi-level category determining apparatus 30 further includes a receiving module 340 and a sending module 350;
the receiving module 340 is configured to receive a category query instruction sent by a terminal device for a content to be searched;
a sending module 350, configured to respond to the category query instruction, and send a video search result to the terminal device if the content to be searched is video content;
the sending module 350 is further configured to respond to the category query instruction, and send a picture search result to the terminal device if the content to be searched is the picture content;
the sending module 350 is further configured to respond to the category query instruction, and send a commodity search result to the terminal device if the content to be searched is commodity content;
the sending module 350 is further configured to respond to the category query instruction, and send a text search result to the terminal device if the content to be searched is text content.
In the embodiment of the application, a multi-level category determining device is provided, and by adopting the device, a background can judge a search object (for example, video content, picture content, commodity content or text content and the like) according to content to be searched input by a user, and based on the judgment, the background can efficiently search content which is interested by the user by combining with the pre-determined multi-level category and push the content to a terminal device used by the user, so that the searching efficiency is improved.
Referring to fig. 15, fig. 15 is a schematic view of an embodiment of a model training apparatus in an embodiment of the present application, and the model training apparatus 40 includes:
an obtaining module 410, configured to obtain a predictive text coding vector corresponding to text information to be trained, where the text information to be trained corresponds to a primary labeling category and a secondary labeling category;
the obtaining module 410 is further configured to obtain, based on the predictive text coding vector, a first predictive distribution vector through a first classifier to be trained included in a hierarchical classification model to be trained, where the first predictive distribution vector includes M first element scores, each first element score represents a probability value of a primary category, and M is an integer greater than 1;
a generating module 420, configured to generate a predictive text fusion vector according to the first predictive distribution vector and the predictive text coding vector;
the obtaining module 410 is further configured to obtain, based on the predictive text fusion vector, a second predictive distribution vector through a second classifier to be trained included in the hierarchical classification model to be trained, where the second predictive distribution vector includes N second element scores, each second element score represents a probability value of a secondary category, the secondary category belongs to a next-level category of the primary category, and N is an integer greater than 1;
the training module 430 is configured to update model parameters of a hierarchical classification model to be trained according to the first prediction distribution vector, the second prediction distribution vector, the first-level labeling category and the second-level labeling category until a model training condition is met, and output the hierarchical classification model, where the hierarchical classification model includes the first classifier and the second classifier related to the above aspect.
In the embodiment of the application, a model training device is provided, and by using the above device, a hierarchical classification model for realizing multi-level category classification can be trained, on the basis, a prediction result corresponding to a first-level category is used as prior knowledge, a text fusion vector is obtained after a text coding vector is fused, the text fusion vector is used as a basis for predicting a second-level category, that is, output of a next level is predicted based on an output result of a previous level, and a constraint relation between an upper layer and a lower layer in a category system can be fully and effectively utilized. Therefore, when the second classifier carries out prediction, the second classifier related to the prediction result of the first class can be concerned more, so that the classification effect of the classes is enhanced, and the classification accuracy is improved.
Alternatively, on the basis of the embodiment corresponding to fig. 15, in another embodiment of the model training device 40 provided in the embodiment of the present application,
the training module 430 is specifically configured to calculate a first loss value of the text information to be trained by using a first classification loss function according to the first prediction distribution vector and the first-level labeling category;
calculating by adopting a second classification loss function according to the second prediction distribution vector and the secondary labeling category to obtain a second loss value of the text information to be trained;
determining a comprehensive loss value of the text information to be trained according to the first loss value and the second loss value;
and updating the model parameters of the hierarchical classification model to be trained according to the comprehensive loss values.
In the embodiment of the application, a model training device is provided, and by adopting the device, the model is trained by utilizing the cross entropy loss values corresponding to a plurality of classifiers, so that the classification effect of the classifiers can be effectively improved, and the accuracy of multi-level category classification is improved.
Alternatively, on the basis of the embodiment corresponding to fig. 15, in another embodiment of the model training device 40 provided in the embodiment of the present application,
the training module is specifically used for calculating a first loss value of the text information to be trained by adopting a first classification loss function according to the first prediction distribution vector and the first-level labeling category;
calculating by adopting a second classification loss function according to the second prediction distribution vector and the secondary labeling category to obtain a second loss value of the text information to be trained;
determining a first element prediction score corresponding to the primary labeling category from the first prediction distribution vector, and determining a second element prediction score corresponding to the secondary labeling category from the second prediction distribution vector;
calculating a third loss value of the text information to be trained by adopting a hinge loss function according to the first element prediction score, the second element prediction score and the target hyper-parameter;
determining a comprehensive loss value of the text information to be trained according to the first loss value, the second loss value and the third loss value;
and updating the model parameters of the hierarchical classification model to be trained according to the comprehensive loss values.
In the embodiment of the application, a model training device is provided, and by adopting the device, the consistency of two-stage categories can be ensured by increasing the hinge loss function, namely, the categories on the upper layer are always easy to be subjected to lower-stage classification results with fine granularity, namely, the classification difficulty of the fine granularity is higher, so that the increase of the hinge loss function can be used for ensuring that the probability of the first-stage category is always greater than that of the corresponding second-stage category.
Fig. 16 is a schematic diagram of a server structure provided in this embodiment, where the server 500 may generate a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 522 (e.g., one or more processors) and a memory 532, and one or more storage media 530 (e.g., one or more mass storage devices) storing an application program 542 or data 544. Memory 532 and storage media 530 may be, among other things, transient storage or persistent storage. The program stored on the storage medium 530 may include one or more modules (not shown), each of which may include a series of instruction operations for the server. Still further, the central processor 522 may be configured to communicate with the storage medium 530, and execute a series of instruction operations in the storage medium 530 on the server 500.
The Server 500 may also include one or more power supplies 526, one or more wired or wireless network interfaces 550, one or more input-output interfaces 558, and/or one or more operating systems 541, such as a Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTMAnd so on.
The steps performed by the server in the above embodiment may be based on the server structure shown in fig. 16.
The embodiment of the present application further provides a multi-level category determining device and a model training device that can be deployed in a terminal device, as shown in fig. 17, for convenience of description, only a part related to the embodiment of the present application is shown, and details of the specific technology are not disclosed, please refer to the method part in the embodiment of the present application. The terminal device may be any terminal device including a mobile phone, a tablet computer, a Personal Digital Assistant (PDA), a Point of Sales (POS), a vehicle-mounted computer, and the like, taking the terminal device as the mobile phone as an example:
fig. 17 is a block diagram illustrating a partial structure of a mobile phone related to a terminal device provided in an embodiment of the present application. Referring to fig. 17, the handset includes: radio Frequency (RF) circuit 610, memory 620, input unit 630, display unit 640, sensor 650, audio circuit 660, wireless fidelity (WiFi) module 670, processor 680, and power supply 690. Those skilled in the art will appreciate that the handset configuration shown in fig. 17 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.
The following describes each component of the mobile phone in detail with reference to fig. 17:
the RF circuit 610 may be used for receiving and transmitting signals during information transmission and reception or during a call, and in particular, receives downlink information of a base station and then processes the received downlink information to the processor 680; in addition, the data for designing uplink is transmitted to the base station. In general, RF circuit 610 includes, but is not limited to, an antenna, at least one Amplifier, a transceiver, a coupler, a Low Noise Amplifier (LNA), a duplexer, and the like. In addition, the RF circuitry 610 may also communicate with networks and other devices via wireless communications. The wireless communication may use any communication standard or protocol, including but not limited to Global System for Mobile communication (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), email, Short Messaging Service (SMS), and the like.
The memory 620 may be used to store software programs and modules, and the processor 680 may execute various functional applications and data processing of the mobile phone by operating the software programs and modules stored in the memory 620. The memory 620 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. Further, the memory 620 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.
The input unit 630 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the cellular phone. Specifically, the input unit 630 may include a touch panel 631 and other input devices 632. The touch panel 631, also referred to as a touch screen, may collect touch operations of a user (e.g., operations of the user on the touch panel 631 or near the touch panel 631 by using any suitable object or accessory such as a finger or a stylus) thereon or nearby, and drive the corresponding connection device according to a preset program. Alternatively, the touch panel 631 may include two parts of a touch detection device and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 680, and can receive and execute commands sent by the processor 680. In addition, the touch panel 631 may be implemented using various types, such as resistive, capacitive, infrared, and surface acoustic wave. The input unit 630 may include other input devices 632 in addition to the touch panel 631. In particular, other input devices 632 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like.
The display unit 640 may be used to display information input by the user or information provided to the user and various menus of the mobile phone. The Display unit 640 may include a Display panel 641, and optionally, the Display panel 641 may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like. Further, the touch panel 631 can cover the display panel 641, and when the touch panel 631 detects a touch operation thereon or nearby, the touch panel is transmitted to the processor 680 to determine the type of the touch event, and then the processor 680 provides a corresponding visual output on the display panel 641 according to the type of the touch event. Although in fig. 17, the touch panel 631 and the display panel 641 are two independent components to implement the input and output functions of the mobile phone, in some embodiments, the touch panel 631 and the display panel 641 may be integrated to implement the input and output functions of the mobile phone.
The handset may also include at least one sensor 650, such as a light sensor, motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor that adjusts the brightness of the display panel 641 according to the brightness of ambient light, and a proximity sensor that turns off the display panel 641 and/or the backlight when the mobile phone is moved to the ear. As one of the motion sensors, the accelerometer sensor can detect the magnitude of acceleration in each direction (generally, three axes), can detect the magnitude and direction of gravity when stationary, and can be used for applications of recognizing the posture of a mobile phone (such as horizontal and vertical screen switching, related games, magnetometer posture calibration), vibration recognition related functions (such as pedometer and tapping), and the like; as for other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which can be configured on the mobile phone, further description is omitted here.
Audio circuit 660, speaker 661, and microphone 662 can provide an audio interface between a user and a cell phone. The audio circuit 660 may transmit the electrical signal converted from the received audio data to the speaker 661, and convert the electrical signal into an audio signal through the speaker 661 for output; on the other hand, the microphone 662 converts the collected sound signals into electrical signals, which are received by the audio circuit 660 and converted into audio data, which are processed by the audio data output processor 680 and then transmitted via the RF circuit 610 to, for example, another cellular phone, or output to the memory 620 for further processing.
WiFi belongs to short-distance wireless transmission technology, and the mobile phone can help a user to receive and send e-mails, browse webpages, access streaming media and the like through the WiFi module 670, and provides wireless broadband Internet access for the user. Although fig. 17 shows the WiFi module 670, it is understood that it does not belong to the essential constitution of the handset, and can be omitted entirely as needed within the scope not changing the essence of the invention.
The processor 680 is a control center of the mobile phone, and connects various parts of the entire mobile phone by using various interfaces and lines, and performs various functions of the mobile phone and processes data by operating or executing software programs and/or modules stored in the memory 620 and calling data stored in the memory 620, thereby performing overall monitoring of the mobile phone. Optionally, processor 680 may include one or more processing units; optionally, the processor 680 may integrate an application processor and a modem processor, wherein the application processor mainly handles operating systems, user interfaces, application programs, and the like, and the modem processor mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into processor 680.
The handset also includes a power supply 690 (e.g., a battery) for powering the various components, optionally, the power supply may be logically connected to the processor 680 via a power management system, so that the power management system may be used to manage charging, discharging, and power consumption.
Although not shown, the mobile phone may further include a camera, a bluetooth module, etc., which are not described herein.
The steps performed by the terminal device in the above-described embodiment may be based on the terminal device configuration shown in fig. 17.
Embodiments of the present application also provide a computer-readable storage medium, in which a computer program is stored, and when the computer program runs on a computer, the computer is caused to execute the method described in the foregoing embodiments.
Embodiments of the present application also provide a computer program product including a program, which, when run on a computer, causes the computer to perform the methods described in the foregoing embodiments.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims (17)

1. A method for determining a multi-level category, comprising:
acquiring a text coding vector corresponding to target text information;
based on the text coding vector, obtaining a first distribution vector through a first classifier included in a hierarchical classification model, wherein the first distribution vector comprises M first element scores, each first element score is represented as a probability value of one level class, and M is an integer greater than 1;
generating a text fusion vector according to the first distribution vector and the text coding vector;
based on the text fusion vector, obtaining a second distribution vector through a second classifier included in the hierarchical classification model, wherein the second distribution vector includes N second element scores, each second element score represents a probability value of a secondary category, the secondary category belongs to a next category of the primary category, and N is an integer greater than 1;
and determining a target primary category to which the target text information belongs according to the first distribution vector, and determining a target secondary category to which the target text information belongs according to the second distribution vector.
2. The method of claim 1, wherein generating a text-fused vector based on the first distribution vector and the text-encoding vector comprises:
generating a prior semantic vector according to the first distribution vector based on a primary category vector mapping relation, wherein the primary category vector mapping relation comprises a one-to-one mapping relation between M index values and M semantic vectors, and each index value corresponds to one primary category;
and generating the text fusion vector according to the prior semantic vector and the text coding vector.
3. The method according to claim 2, wherein the generating a priori semantic vector from the first distribution vector based on the first-level category vector mapping relationship comprises:
determining the first K first element scores with the maximum probability value from the first distribution vector, wherein each first element score corresponds to an index value of one primary class, and K is an integer greater than 1 and less than M;
acquiring an index value corresponding to each first element score in the first K first element scores to obtain K index values;
acquiring corresponding K semantic vectors according to the K index values based on the primary category vector mapping relation;
for each index value in the K index values, carrying out weighted calculation on a semantic vector corresponding to the index value and a first element score corresponding to the index value to obtain an updated semantic vector corresponding to the index value;
and summing the updated semantic vectors corresponding to the K index values to obtain the prior semantic vector.
4. The method according to claim 2, wherein the generating a priori semantic vector from the first distribution vector based on the first-level category vector mapping relationship comprises:
determining a first element score with a maximum probability value from the first distribution vector, wherein the first element score corresponds to an index value of one level class;
if the first element score is larger than or equal to an element score threshold value, acquiring an index value corresponding to the first element score;
acquiring a corresponding semantic vector according to the index value based on the primary category vector mapping relation;
and performing weighted calculation on the semantic vector corresponding to the index value and the first element score corresponding to the index value to obtain the prior semantic vector.
5. The method according to claim 2, wherein the generating a priori semantic vector from the first distribution vector based on the first-level category vector mapping relationship comprises:
acquiring an index value corresponding to each first element score in the first distribution vector to obtain the M index values;
acquiring the corresponding M semantic vectors according to the M index values based on the primary category vector mapping relation;
for each index value in the M index values, performing weighted calculation on a semantic vector corresponding to the index value and a first element score corresponding to the index value to obtain an updated semantic vector corresponding to the index value;
and summing the updated semantic vectors corresponding to the K index values to obtain the prior semantic vector.
6. The method of claim 2, wherein the generating the text fusion vector from the a priori semantic vector and the text encoding vector comprises:
splicing the prior semantic vector and the text coding vector to obtain a text fusion vector;
or the like, or, alternatively,
generating the text fusion vector according to the prior semantic vector and the text encoding vector, including:
determining a first order vector mapping from the prior semantic vector, the text encoding vector and a first parameter matrix, wherein the prior semantic vector is represented as a p-dimensional vector, the text encoding vector is represented as a q-dimensional vector, the first parameter matrix is represented as a [ d (p + q) ] dimensional matrix, and p, q and d are integers greater than 1;
determining a second-order vector mapping according to the prior semantic vector, the text coding vector and a second parameter matrix, wherein the second parameter matrix is represented as a (d × p × q) -dimensional matrix;
and generating the text fusion vector according to the first-order vector mapping, the second-order vector mapping and the offset vector.
7. The method of claim 1, wherein generating a text-fused vector based on the first distribution vector and the text-encoding vector comprises:
splicing the first distribution vector and the text coding vector to obtain a text fusion vector;
or the like, or, alternatively,
generating a text fusion vector according to the first distribution vector and the text encoding vector, including:
determining a first order vector mapping from the first distribution vector, the text encoding vector and a first parameter matrix, wherein the first distribution vector is represented as a p-dimensional vector, the text encoding vector is represented as a q-dimensional vector, the first parameter matrix is represented as a [ d (p + q) ] dimensional matrix, and p, q and d are integers greater than 1;
determining a second-order vector mapping according to the first distribution vector, the text encoding vector and a second parameter matrix, wherein the second parameter matrix is represented as a (d × p × q) dimensional matrix;
and generating the text fusion vector according to the first-order vector mapping, the second-order vector mapping and the offset vector.
8. The method according to claim 1, wherein before the obtaining the text encoding vector corresponding to the target text information, the method further comprises:
acquiring target text information corresponding to a target video, wherein the target text information comprises at least one item of title information, abstract information, subtitle information and comment information of the target video;
or the like, or, alternatively,
acquiring target text information corresponding to a target picture, wherein the target text information comprises at least one of title information, author information, Optical Character Recognition (OCR) information and abstract information of the target picture;
or the like, or, alternatively,
acquiring target text information corresponding to a target commodity, wherein the target text information comprises at least one item of commodity name information, place of production information, comment information and commodity description information of the target commodity;
or the like, or, alternatively,
and acquiring the target text information corresponding to the target text, wherein the target text information comprises at least one item of title information, author information, abstract information, comment information and text information of the target text.
9. The method of determining according to any one of claims 1 to 8, further comprising:
receiving a category query instruction sent by terminal equipment aiming at the content to be searched;
responding to the category query instruction, and if the content to be searched is video content, sending a video search result to the terminal equipment;
responding to the category query instruction, and if the content to be searched is picture content, sending a picture search result to the terminal equipment;
responding to the category inquiry instruction, and if the content to be searched is commodity content, sending a commodity search result to the terminal equipment;
and responding to the category inquiry instruction, and if the content to be searched is text content, sending a text search result to the terminal equipment.
10. A method of model training, comprising:
acquiring a predictive text coding vector corresponding to text information to be trained, wherein the text information to be trained corresponds to a primary labeling category and a secondary labeling category;
based on the predictive text coding vector, obtaining a first predictive distribution vector through a first classifier to be trained included in a hierarchical classification model to be trained, wherein the first predictive distribution vector comprises M first element scores, each first element score is represented as a probability value of a primary class, and M is an integer greater than 1;
generating a predictive text fusion vector according to the first predictive distribution vector and the predictive text coding vector;
based on the prediction text fusion vector, obtaining a second prediction distribution vector through a second classifier to be trained included in the hierarchical classification model to be trained, wherein the second prediction distribution vector includes N second element scores, each second element score represents a probability value of a secondary category, the secondary category belongs to a next-level category of the primary category, and N is an integer greater than 1;
updating model parameters of the hierarchical classification model to be trained according to the first prediction distribution vector, the second prediction distribution vector, the primary labeling category and the secondary labeling category until model training conditions are met, and outputting the hierarchical classification model, wherein the hierarchical classification model comprises the first classifier and the second classifier as claimed in any one of claims 1 to 9.
11. The method of claim 10, wherein the updating the model parameters of the hierarchical classification model to be trained according to the first prediction distribution vector, the second prediction distribution vector, the primary labeling category, and the secondary labeling category comprises:
calculating by adopting a first classification loss function according to the first prediction distribution vector and the primary labeling category to obtain a first loss value of the text information to be trained;
calculating by adopting a second classification loss function according to the second prediction distribution vector and the secondary labeling category to obtain a second loss value of the text information to be trained;
determining a comprehensive loss value of the text information to be trained according to the first loss value and the second loss value;
and updating the model parameters of the hierarchical classification model to be trained according to the comprehensive loss value.
12. The method of claim 10, wherein the updating the model parameters of the hierarchical classification model to be trained according to the first prediction distribution vector, the second prediction distribution vector, the primary labeling category, and the secondary labeling category comprises:
calculating by adopting a first classification loss function according to the first prediction distribution vector and the primary labeling category to obtain a first loss value of the text information to be trained;
calculating by adopting a second classification loss function according to the second prediction distribution vector and the secondary labeling category to obtain a second loss value of the text information to be trained;
determining a first element prediction score corresponding to the primary label category from the first prediction distribution vector, and determining a second element prediction score corresponding to the secondary label category from the second prediction distribution vector;
calculating a third loss value of the text information to be trained by adopting a hinge loss function according to the first element prediction score, the second element prediction score and a target hyper-parameter;
determining a comprehensive loss value of the text information to be trained according to the first loss value, the second loss value and the third loss value;
and updating the model parameters of the hierarchical classification model to be trained according to the comprehensive loss value.
13. A multi-level category determination apparatus, comprising:
the acquisition module is used for acquiring a text coding vector corresponding to the target text information;
the obtaining module is further configured to obtain, based on the text encoding vector, a first distribution vector through a first classifier included in a hierarchical classification model, where the first distribution vector includes M first element scores, each first element score is represented by a probability value of one primary class, and M is an integer greater than 1;
the generating module is used for generating a text fusion vector according to the first distribution vector and the text coding vector;
the obtaining module is further configured to obtain a second distribution vector through a second classifier included in the hierarchical classification model based on the text fusion vector, where the second distribution vector includes N second element scores, each second element score represents a probability value of a secondary category, the secondary category belongs to a next-level category of the primary category, and N is an integer greater than 1;
and the determining module is used for determining a target primary category to which the target text information belongs according to the first distribution vector and determining a target secondary category to which the target text information belongs according to the second distribution vector.
14. A model training apparatus, comprising:
the device comprises an acquisition module, a prediction module and a prediction module, wherein the acquisition module is used for acquiring a predictive text coding vector corresponding to text information to be trained, and the text information to be trained corresponds to a primary labeling category and a secondary labeling category;
the obtaining module is further configured to obtain, based on the predictive text coding vector, a first predictive distribution vector through a first classifier to be trained included in a hierarchical classification model to be trained, where the first predictive distribution vector includes M first element scores, each first element score represents a probability value of one class, and M is an integer greater than 1;
the generating module is used for generating a predicted text fusion vector according to the first predicted distribution vector and the predicted text coding vector;
the obtaining module is further configured to obtain a second prediction distribution vector through a second classifier to be trained included in the hierarchical classification model to be trained based on the prediction text fusion vector, where the second prediction distribution vector includes N second element scores, each second element score represents a probability value of a secondary category, the secondary category belongs to a next-level category of the primary category, and N is an integer greater than 1;
a training module, configured to update a model parameter of the hierarchical classification model to be trained according to the first prediction distribution vector, the second prediction distribution vector, the first-level labeling category, and the second-level labeling category until a model training condition is met, and output a hierarchical classification model, where the hierarchical classification model includes the first classifier and the second classifier according to any one of claims 1 to 9.
15. A computer device, comprising: a memory, a processor, and a bus system;
wherein the memory is used for storing programs;
the processor is configured to execute the program in the memory, and the processor is configured to execute the method for determining the multi-level categories according to any one of claims 1 to 9 or the method for training the model according to any one of claims 10 to 12 according to instructions in the program code;
the bus system is used for connecting the memory and the processor so as to enable the memory and the processor to communicate.
16. A computer-readable storage medium comprising instructions which, when executed on a computer, cause the computer to perform the method of determining a multi-level category of any one of claims 1 to 9 or the method of model training of any one of claims 10 to 12.
17. A computer program product comprising a computer program and instructions, characterized in that the computer program/instructions, when executed by a processor, implements the method for determining multi-level categories according to any of claims 1 to 9 or performs the method for model training according to any of claims 10 to 12.
CN202111114531.XA 2021-09-23 2021-09-23 Multistage category determination method, model training method and related device Pending CN114328906A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111114531.XA CN114328906A (en) 2021-09-23 2021-09-23 Multistage category determination method, model training method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111114531.XA CN114328906A (en) 2021-09-23 2021-09-23 Multistage category determination method, model training method and related device

Publications (1)

Publication Number Publication Date
CN114328906A true CN114328906A (en) 2022-04-12

Family

ID=81044653

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111114531.XA Pending CN114328906A (en) 2021-09-23 2021-09-23 Multistage category determination method, model training method and related device

Country Status (1)

Country Link
CN (1) CN114328906A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114547313A (en) * 2022-04-22 2022-05-27 阿里巴巴达摩院(杭州)科技有限公司 Resource type identification method and device
CN115631205A (en) * 2022-12-01 2023-01-20 阿里巴巴(中国)有限公司 Method, device and equipment for image segmentation and model training
CN116030295A (en) * 2022-10-13 2023-04-28 中电金信软件(上海)有限公司 Article identification method, apparatus, electronic device and storage medium
CN117132777A (en) * 2023-10-26 2023-11-28 腾讯科技(深圳)有限公司 Image segmentation method, device, electronic equipment and storage medium
WO2024139290A1 (en) * 2022-12-28 2024-07-04 深圳云天励飞技术股份有限公司 Text classification method and apparatus, and computer device and medium
WO2024139291A1 (en) * 2022-12-30 2024-07-04 深圳云天励飞技术股份有限公司 Multi-level classification model classification method, training method and apparatus, device, and medium

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114547313A (en) * 2022-04-22 2022-05-27 阿里巴巴达摩院(杭州)科技有限公司 Resource type identification method and device
CN116030295A (en) * 2022-10-13 2023-04-28 中电金信软件(上海)有限公司 Article identification method, apparatus, electronic device and storage medium
CN115631205A (en) * 2022-12-01 2023-01-20 阿里巴巴(中国)有限公司 Method, device and equipment for image segmentation and model training
WO2024139290A1 (en) * 2022-12-28 2024-07-04 深圳云天励飞技术股份有限公司 Text classification method and apparatus, and computer device and medium
WO2024139291A1 (en) * 2022-12-30 2024-07-04 深圳云天励飞技术股份有限公司 Multi-level classification model classification method, training method and apparatus, device, and medium
CN117132777A (en) * 2023-10-26 2023-11-28 腾讯科技(深圳)有限公司 Image segmentation method, device, electronic equipment and storage medium
CN117132777B (en) * 2023-10-26 2024-03-22 腾讯科技(深圳)有限公司 Image segmentation method, device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN114328906A (en) Multistage category determination method, model training method and related device
CN110704661B (en) Image classification method and device
CN109783798A (en) Method, apparatus, terminal and the storage medium of text information addition picture
CN113254684B (en) Content aging determination method, related device, equipment and storage medium
CN113378556A (en) Method and device for extracting text keywords
CN110766081B (en) Interface image detection method, model training method and related device
CN115878841B (en) Short video recommendation method and system based on improved bald eagle search algorithm
CN111709398A (en) Image recognition method, and training method and device of image recognition model
CN113821589B (en) Text label determining method and device, computer equipment and storage medium
CN113821720A (en) Behavior prediction method and device and related product
CN111597804B (en) Method and related device for training entity recognition model
CN112685578B (en) Method and device for providing multimedia information content
CN113723378B (en) Model training method and device, computer equipment and storage medium
CN111738000B (en) Phrase recommendation method and related device
CN116935188B (en) Model training method, image recognition method, device, equipment and medium
CN118035945B (en) Label recognition model processing method and related device
CN113392644A (en) Model training method, text information processing method, system, device and storage medium
CN114357278A (en) Topic recommendation method, device and equipment
CN113269279B (en) Multimedia content classification method and related device
CN114281936A (en) Classification method and device, computer equipment and storage medium
CN115080840A (en) Content pushing method and device and storage medium
CN116775980B (en) Cross-modal searching method and related equipment
CN116955599A (en) Category determining method, related device, equipment and storage medium
CN113569043A (en) Text category determination method and related device
CN116205686A (en) Method, device, equipment and storage medium for recommending multimedia resources

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40071014

Country of ref document: HK