CN113627447A

CN113627447A - Label identification method, label identification device, computer equipment, storage medium and program product

Info

Publication number: CN113627447A
Application number: CN202111194237.4A
Authority: CN
Inventors: 王赟豪; 陈少华; 余亭浩; 张绍明; 侯昊迪
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-10-13
Filing date: 2021-10-13
Publication date: 2021-11-09
Anticipated expiration: 2041-10-13
Also published as: CN113627447B

Abstract

The application provides a tag identification method, a tag identification device, computer equipment, a storage medium and a program product, and relates to the technical fields of artificial intelligence, cloud technology, intelligent traffic, driving assistance and the like. Performing multi-type feature extraction on information to be identified through a feature extraction network to obtain multi-type features; respectively determining the matching degree between the information to be identified and each label based on the global features of each label in the multi-type features and the global label features, so as to determine the label of the information to be identified based on the matching degree; the global characteristics of at least two labels included in the global label characteristics are determined based on the initial characteristics of each label and the incidence relation between the at least two labels, the incidence relation existing between the labels in the range of the plurality of labels can be represented, the label identification is carried out by combining the global correlation among the plurality of labels, the problem of identification errors caused by the independent processing of a single label is avoided, and the accuracy of the label identification can be improved.

Description

Label identification method, label identification device, computer equipment, storage medium and program product

Technical Field

The application relates to the technical fields of artificial intelligence, cloud technology, intelligent traffic, auxiliary driving and the like, and relates to a tag identification method, a tag identification device, computer equipment, a storage medium and a program product.

Background

With the development of internet technology, many network platforms can push information streams to users, and the users spend a lot of time browsing the information streams every day. Therefore, the quality of the information stream is crucial to the user experience, and the label can be used in the art to describe the quality of the information stream, and how to identify the label included in a large amount of information becomes a key issue in the art.

In the related art, taking an image as an example, a neural network model is usually adopted to extract an image feature vector of the image, and then the image feature vector is used to further identify one or more labels that the image may include, for example, a classifier of a second classification of the label a determines whether the image includes the label a, so as to output the labels that the image may include based on the determination result. However, the neural network model is susceptible to data distribution and data quality, resulting in low accuracy of tag identification.

Disclosure of Invention

The application provides a method, a device, a computer device, a storage medium and a program product for label identification, which can solve the problem of low accuracy of label identification in the related technology. The technical scheme is as follows:

in one aspect, a tag identification method is provided, and the method includes:

acquiring information to be identified, and performing multi-type feature extraction on the information to be identified through a feature extraction network to obtain multi-type features of the information to be identified, wherein the information to be identified comprises at least two types of data, and the multi-type features are used for representing the data features of the at least two types of data;

respectively determining the matching degree between the information to be identified and each label based on the global features of each label in the multi-type features and the global label features, wherein the global label features comprise global features of at least two labels, and the global features of the at least two labels are determined based on the initial features of each label and the incidence relation between the at least two labels;

and determining the label of the information to be identified based on the matching degree between the information to be identified and each label.

In another aspect, there is provided a tag identification apparatus, the apparatus including:

the system comprises a feature extraction module, a feature extraction module and a feature extraction module, wherein the feature extraction module is used for acquiring information to be identified and extracting multi-type features of the information to be identified through a feature extraction network to obtain the multi-type features of the information to be identified, the information to be identified comprises at least two types of data, and the multi-type features are used for representing the data features of the at least two types of data;

a matching degree determining module, configured to determine a matching degree between the information to be identified and each tag based on a global feature of each tag in the multi-type features and global tag features, where the global tag features include global features of at least two tags, and the global features of the at least two tags are determined based on an initial feature of each tag and an association relationship between the at least two tags;

and the identification module is used for determining the label of the information to be identified based on the matching degree between the information to be identified and each label.

In one possible implementation, the global label feature is obtained through a graph convolution network of an object model, where the object model includes the feature extraction network and the graph convolution network; the apparatus also includes a model training module comprising:

a construction unit for constructing an initial model;

the global label association unit is used for inputting the initial features of the at least two labels into the initial graph convolution network and outputting initial global label features based on the initial features of the at least two labels and the feature correlation function of the initial graph convolution network;

a sample prediction unit, configured to input a sample set into the initial feature extraction network, and predict a sample label of the sample set based on a sample feature output by the initial feature extraction network and an initial global label feature output by the initial graph convolution network;

an adjusting unit, configured to adjust a feature correlation function of the initial graph convolution network based on a similarity between a true value label of the sample set and the sample label, and adjust a model parameter of the initial feature extraction network until the initial model reaches a first target condition, and stop adjusting to obtain the target model;

wherein the at least two tags belong to a specified category; the feature correlation function is used for indicating the correlation between the at least two labels, the initial model comprises an initial feature extraction network and an initial graph convolution network, and the sample set comprises a plurality of samples and truth labels of the plurality of samples.

In one possible implementation, the initial graph convolution network includes at least two initial graph convolution layers; the global tag association unit is configured to:

inputting initial features of the at least two labels into a first initial map convolutional layer;

for each initial map convolutional layer, performing global correlation processing on first features of the at least two labels through the association relationship between the at least two labels and the feature correlation function to obtain second features of the at least two labels, and inputting the second features of the at least two labels into a next initial map convolutional layer of the initial map convolutional layer, wherein the first features refer to the features of the at least two labels input into the initial map convolutional layer, and the second features refer to the features of the at least two labels output by the initial map convolutional layer;

and taking the second characteristics of at least two labels output by the last initial graph convolutional layer as the initial global label characteristics.

In one possible implementation, each map convolution layer of the map convolution network includes a label topology map, and the label topology map is used for representing the characteristics of the at least two labels and the association relationship between the at least two labels;

the global label association unit is used for constructing an initial label topological graph based on the association relationship between the at least two labels and the first characteristics of the at least two labels; according to the feature correlation function, calculating a product of the correlation matrix, the first vertex description matrix and a weight matrix of the feature correlation function to obtain a second vertex description matrix, wherein the correlation matrix comprises the probability of common occurrence of the at least two labels, and the second vertex description matrix is used for representing a second feature of the at least two labels;

the initial label topological graph comprises at least two vertexes and edges between the at least two vertexes, a first vertex description matrix corresponding to the at least two vertexes is used for representing first characteristics of the at least two labels, and a correlation matrix corresponding to the edges between the at least two vertexes is used for representing incidence relations between the at least two labels;

in a possible implementation manner, the adjusting unit is configured to adjust the weight matrix in the feature correlation function based on a similarity between a truth label of the sample set and the sample label.

In one possible implementation, the initial feature extraction network comprises an initial text network for extracting text features; the device further comprises:

the occlusion prediction module is configured to perform occlusion on words included in at least two sample texts in a first training data set to obtain at least two first sample texts, and predict occluded words in the at least two first sample texts through the initial text network to obtain predicted occlusion words, where the first training data set includes at least two sample texts, and each sample text includes at least two words;

the context prediction module is used for predicting context information of at least two sample text pairs included in a second training data set through the initial text network to obtain predicted context information of the at least two sample text pairs, the second training data set includes at least two sample text pairs with labeling labels, and the labeling labels include the context information of the sample text pairs;

and the pre-training module is used for adjusting the model parameters of the initial text network based on the similarity between the occluded words and the predicted occlusion words of the first training data set and the similarity between the label tags of the second training data set and the predicted context information until the initial text network reaches a second target condition, and obtaining a pre-trained text network.

In a possible implementation manner, the feature extraction module is further configured to perform feature extraction on the at least two types of data through at least two feature extraction networks, respectively, to obtain data features of the at least two types of data; and performing feature fusion on the data features of the at least two types of data through a multi-type fusion module of the target model to obtain the multi-type features.

In one possible implementation manner, the feature extraction module is further configured to, for each type of data, extract features of the data through a feature extraction network corresponding to the type; determining, by a classifier of a feature extraction network corresponding to the type, a tag confidence of the data according to the extracted features of the data, the tag confidence being used to indicate a likelihood that the data matches the at least two tags;

the feature extraction module is further configured to fuse the tag confidence degrees of the at least two types of data through the multi-type fusion module to obtain a fusion confidence degree of the information to be recognized.

In one possible implementation, the multi-type features include features of the information to be identified in a target dimension; the matching degree determining module is configured to determine, according to a feature weight of each tag in a target dimension included in the global tag feature and a feature of the target dimension included in the multi-type feature, a matching probability between each tag and the information to be identified, where the feature dimension of each tag is the same as the feature dimension of the multi-type feature.

In one possible implementation, the apparatus further comprises at least one of:

the cover image identification module is used for responding to a cover image acquisition request, and selecting an image without a label of a first target category from the determined images to be identified as a cover image based on the determined labels of the images to be identified, wherein the first target category is the category of negative feedback images of a user, and the information to be identified is the images to be identified;

the information recommendation module is used for responding to an information recommendation request, and if a label of information to be recommended belongs to a second target category, reducing a recommendation weight of the information to be recommended, wherein the recommendation weight is used for indicating the possibility of recommending the information to be recommended to a user, the second target category is a category of negative feedback information of the user, and the information to be identified is the information to be recommended to the user.

In another aspect, a computer device is provided, which includes a memory, a processor, and a computer program stored on the memory, the processor executing the computer program to implement the tag identification method described above.

In another aspect, a computer-readable storage medium is provided, on which a computer program is stored, which, when being executed by a processor, implements the above-mentioned tag identification method.

In another aspect, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the tag identification method described above.

The beneficial effect that technical scheme that this application provided brought is:

performing multi-type feature extraction on information to be identified through a feature extraction network, wherein the obtained multi-type features can represent data features of at least two types of data; the global features of at least two labels included in the global label features are determined based on the initial features of each label and the incidence relation between the at least two labels, so that the global label features can represent the incidence relation among the labels in the global label range, label identification is carried out by combining the global correlation among a plurality of labels, the problem of identification errors caused by the independent processing of a single label is avoided, and the accuracy of label identification can be improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments of the present application will be briefly described below.

Fig. 1 is a schematic diagram of an implementation environment of a tag identification method according to an embodiment of the present application;

fig. 2 is a schematic flowchart of a tag identification method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of an internal structure of a target model according to an embodiment of the present disclosure;

fig. 4 is a schematic diagram of a tag topology provided in an embodiment of the present application;

FIG. 5 is a schematic flowchart of a cover image determination method based on tag identification according to an embodiment of the present application;

FIG. 6 is a schematic diagram of an uncomfortable image provided by an embodiment of the present application;

fig. 7 is a schematic flowchart of an information recommendation method based on tag identification according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a tag identification apparatus according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

Embodiments of the present application are described below in conjunction with the drawings in the present application. It should be understood that the embodiments set forth below in connection with the drawings are exemplary descriptions for explaining technical solutions of the embodiments of the present application, and do not limit the technical solutions of the embodiments of the present application.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should be further understood that the terms "comprises" and/or "comprising," when used in this specification in connection with embodiments of the present application, specify the presence of stated features, information, data, steps, operations, elements, and/or components, but do not preclude the presence or addition of other features, information, data, steps, operations, elements, components, and/or groups thereof, as embodied in the art. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. The term "and/or" as used herein indicates at least one of the items defined by the term, e.g., "a and/or B" indicates either an implementation as "a", or an implementation as "a and B".

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

The embodiment of the application can be applied to various scenes, including but not limited to cloud technology, artificial intelligence, intelligent traffic, driving assistance and the like. For example, the tag identification method provided by the application can utilize a big data processing technology in an artificial intelligence technology to process a large amount of information to be identified so as to identify tags of a large amount of information; of course, the above-described machine learning may be performed to train an object model used for label recognition, and a large amount of information labels may be recognized by the trained object model.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning, automatic driving, intelligent traffic and the like.

Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.

Fig. 1 is a schematic diagram of an implementation environment of a tag identification method provided in an embodiment of the present invention, and referring to fig. 1, the implementation environment includes: at least one computer device, see FIG. 1, is illustrated by way of example only as the implementation environment including a plurality of computer devices. The plurality of computer devices may implement data interaction through a wired connection manner, and may also implement data interaction through a wireless network connection manner, which is not limited in this application.

In this embodiment, the computer device 101 may identify information to be identified, and obtain a tag of the information to be identified. In one possible implementation, the computer device 101 may store an object model, and the computer device 101 may perform tag recognition on the information to be recognized based on the object model. In another possible implementation manner, the computer device 101 may also call an object model on another computer device to perform tag identification on information to be identified, which is not limited in this embodiment of the application, and the following description takes the example that the computer device 101 stores the object model as an example.

In one possible scenario, the implementation environment may further include a terminal 102, where the terminal 102 may send an acquisition request to the computer device 101, for example, the acquisition request may specifically be a cover art acquisition request or an information recommendation request, and the computer device 101 identifies the image to be recognized by using the target model based on the cover art acquisition request, so as to select a cover art image based on the determined tag of the image to be recognized; alternatively, the computer device 101 identifies information to be recommended by using a target model based on the information recommendation request, so as to recommend information based on the determined tag of the information to be recommended.

The computer device 101 may be provided as a server, where the server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server or a server cluster providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a CDN (Content Delivery Network), and a big data and artificial intelligence platform. Such networks may include, but are not limited to: a wired network, a wireless network, wherein the wired network comprises: a local area network, a metropolitan area network, and a wide area network, the wireless network comprising: bluetooth, Wi-Fi, and other networks that enable wireless communication. The terminal may be a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a notebook computer, a digital broadcast receiver, an MID (Mobile Internet Devices), a PDA (personal digital assistant), a desktop computer, a smart home appliance, a vehicle-mounted terminal (e.g., a vehicle-mounted navigation terminal, a vehicle-mounted computer, etc.), a smart speaker, a smart watch, etc., and the terminal and the server may be directly or indirectly connected through wired or wireless communication, but are not limited thereto. The determination may also be based on the requirements of the actual application scenario, and is not limited herein.

Fig. 2 is a schematic flowchart of a tag identification method according to an embodiment of the present application. The execution subject of the method may be a computer device. As shown in fig. 2, the method includes the following steps.

Step 201, the computer device obtains information to be identified.

The information to be identified may include at least two types of data, which may include, but are not limited to: text, images, audio, video, etc. For example, the image to be recognized may include an image and a title of the image; the video to be recognized may include three types of data, such as images in the video, audio, and titles of the video.

In the application, the label identification can be carried out on the information to be identified through the target model so as to identify a plurality of labels matched with the information to be identified. The object model may include a feature extraction network, which may be used to extract multi-type features by the following step 202; the object model may further include a graph convolution network, and the computer device may learn an association relationship between tag systems including a plurality of tags, that is, learn a correlation between a large number of tags, by using the graph convolution network in advance, and calculate a matching degree between the information to be recognized and each tag through the following step 203, thereby obtaining one or more tags matched with the information to be recognized.

Step 202, the computer device performs multi-type feature extraction on the information to be identified through a feature extraction network to obtain multi-type features of the information to be identified.

The multi-type features are used for characterizing data features of the at least two types of data; the computer device may extract data features of at least two types of data through the feature extraction network and fuse the at least two types of features into the multi-type feature. In one possible implementation, the step may include: the computer equipment respectively extracts the characteristics of the data of the at least two types through at least two characteristic extraction networks to obtain the data characteristics of the data of the at least two types; and the computer equipment performs characteristic fusion on the data characteristics of the at least two types of data through a multi-type fusion module of the target model to obtain the multi-type characteristics. Wherein the multi-type features include features of the information to be identified in the target dimension.

In one possible example, the at least two tags belong to a specified category; the specified category may be a category of information negatively fed back by the user. In one possible example, the specified category may be an discomfort category that may include content that is likely to cause user discomfort, such as psychological fear, skin disorders, snakes, bugs, and the like. Such information is not suitable for recommendation to all users. As shown in fig. 3, the discomfort category labels may include, but are not limited to: pox, dense fear, skin diseases, snakes, psychological fear, etc. may be combined with all labels associated with the discomfort category to construct a label topology map of the graph convolution network to identify labels of the discomfort category that the image to be identified may match. Of course, the designated category may be configured based on needs, for example, the designated category may also be a holiday theme category, a workplace category, a popular movie and television show category, and the designated category is not particularly limited in this embodiment of the application.

In one possible example, the data characteristics of each type of data may include a tag confidence for the corresponding type of data. In this step, for each type of data, the computer device may extract features of the data through a feature extraction network corresponding to the type; determining a tag confidence of the data according to the extracted features of the data through a classifier of the feature extraction network, wherein the tag confidence is used for indicating the possibility that the data is matched with at least two tags; for example, the extracted features of the data may be semantic features of the data. The step of fusing data characteristics of the plurality of types of data by the computer device may comprise: and the computer equipment fuses the label confidence degrees of the at least two types of data through the multi-type fusion module to obtain the fusion confidence degree of the information to be identified. For example, when the image to be recognized includes both the image and the title, the data characteristic of the title of the text type may include the title tag confidence of the title. The data characteristic of the image type may include a text label confidence of the image. The computer device may fuse the confidence of the title tag and the confidence of the text tag by concatenation or a MoE (mixed experts) mechanism, etc., to obtain a fused confidence, so as to reduce the difference between different types of data.

In another possible example, the data characteristics of each type of data may also include semantic characteristics of the corresponding type of data. The computer device can fuse the semantic features and then send the fused features to the classifier to obtain a fusion confidence of the information to be recognized. In this step, for each type of data, the computer device may extract a feature of the data through a feature extraction network corresponding to the type, where the feature may be a semantic feature of the data. The step of the computer device performing feature fusion on the data features of the at least two types of data through the multi-type fusion module of the target model to obtain the multi-type features may include: the computer device performs feature fusion on the features of the at least two types of data through the multi-type fusion module to obtain fusion features of the information to be recognized, and determines a fusion confidence coefficient of the information to be recognized according to the fusion features of the information to be recognized through a classifier of the feature extraction network, wherein the fusion confidence coefficient is used for indicating the possibility of matching the information to be recognized with at least two labels.

The feature extraction network may include an image network model for extracting image data and a text network model for extracting text features. For example, the text Network may employ a pre-training model BERT (Bidirectional Encoder Representation based on converters) model, the image Network model may employ a pre-training model BiT (Big Transfer, which is a neural Network applied to migration learning based on large-scale pre-training) model as a model for extracting image features, for example, the BiT model may include a Resnet50 (Residual Network) model pre-trained on ImageNet22K (a large dataset containing 22k classes); the BERT model and the BiT model may be pre-trained prior to training the target model to improve the characterization capability of the network.

In the step, the information to be identified is extracted with multiple types of features, so that the semantics of the information to be identified is accurately positioned from multiple types, and the data features of multiple types can be fused into multiple types of features through feature fusion, so that the multiple types of features can comprehensively and accurately represent each dimension feature of the information to be identified, and the multiple types of features can be in the form of the tag confidence coefficient of the information to be identified, so that the matching condition between each tag and each type of data is preliminarily obtained, the global tag features are subsequently utilized for further identification, and the accuracy of subsequent tag identification is improved.

Step 203, the computer device determines the matching degree between the information to be identified and each label based on the global feature of each label in the multi-type features and the global label features.

The global tag feature comprises global features of at least two tags, and the global features of the at least two tags are determined based on the initial features of each tag and the association relationship between the at least two tags. The degree of match between the information to be identified and each tag is used to indicate the likelihood that the semantics represented by the information to be identified correspond to the tag. For example, the global feature of each tag may represent the tag feature of the tag and the association between the tag and other tags in the global tag.

In the present application, the global tag feature may be obtained in advance through a graph convolution network, for example, when the target model needs to be used to identify the information to be identified, the global tag feature may be read, and the multi-type feature may be extracted through the above step 202, and the matching probability between each tag and the information to be identified may be determined according to the multi-type feature and the global tag feature. In one possible implementation, the feature dimension of each tag is the same as the feature dimension of the multi-type features. For example, the global tag feature may include a feature weight of each of the at least two tags in the target dimension, and the multi-type feature includes a feature of the information to be identified in the target dimension, this step may include: and the computer equipment respectively determines the matching probability between each label and the information to be identified according to the feature weight of each label in the target dimension included in the global label features and the features of the target dimension included in the multi-type features. For example, the multi-type features may include features of each of at least two types of data in the target dimension. For example, the computer device may calculate a product between the global feature matrix of each tag and the multi-type feature matrix, and perform feature weighting on the multi-type feature matrix through the global feature matrix to obtain a matching probability that the information to be identified coincides with each tag.

In one possible implementation, the global label features may be obtained through a graph volume network of an object model, and the object model may include a feature extraction network and a graph volume network. The graph convolution network comprises a label topological graph which is constructed in advance and trained; for example, an initial label topological graph can be constructed by using initial features of a plurality of labels in advance, and the initial label topological graph is trained by combining a feature extraction network to obtain the label topological graph. In one possible example, the multi-type features of the sample set can be extracted through an initial feature extraction network, the initial global label features of a plurality of labels are obtained through an initial graph convolution network, and the sample labels of the sample set are predicted according to the multi-type features and the initial global label features; and training the initial feature extraction network and the initial graph convolution network based on the difference between the sample label and the truth label to obtain a target model comprising the graph convolution network and the feature extraction network.

In one possible example, the training mode of the target model may be implemented through the following steps S1 to S4.

Step S1, the computer device builds an initial model.

The initial model includes an initial feature extraction network and an initial graph convolution network. The initial feature extraction network may include at least two initial feature extraction networks for extracting data features of at least two types of data correspondingly.

Step S2, the computer device inputs the initial features of the at least two labels into the initial graph convolutional network, and outputs an initial global label feature based on the initial features of the at least two labels and the feature correlation function of the initial graph convolutional network.

The feature correlation function is used to indicate a correlation between the at least two tags. In one possible implementation manner, the computer device may perform feature correlation processing on the initial features of the at least two tags according to the correlation between the at least two tags indicated by the feature correlation function, to obtain an initial global tag feature representing an association relationship between the at least two tags.

In one possible implementation, the initial graph convolution network may include one or more initial graph convolution layers. When the initial graph convolutional network comprises at least two initial graph convolutional layers, the feature correlation process may be a calculation of global correlation of features of the labels input to each layer by the feature correlation function; in one possible example, the initial graph convolution network may include at least two initial graph convolution layers; this step may include: the computer device inputting the initial characteristics of the at least two labels into a first initial map convolutional layer; for each initial map convolutional layer, the computer device performs global correlation processing on first features of the at least two labels through the association relationship between the at least two labels and the feature correlation function to obtain second features of the at least two labels, and inputs the second features of the at least two labels into a next initial map convolutional layer of the initial map convolutional layer, wherein the first features refer to the features of the at least two labels input into the initial map convolutional layer, and the second features refer to the features of the at least two labels output by the initial map convolutional layer; the computer device takes the second features of the at least two labels output by the last initial graph convolutional layer as the initial global label features. The initial global label features include initial feature weights for at least two labels in a target dimension. Illustratively, the output of the last initial map convolutional layer is the input of the next initial map convolutional layer. The first feature input to the first initial map convolutional layer may be an initial feature of the at least two labels, and a global correlation calculation may be performed on the initial features of the at least two labels using a feature correlation function and input to the next initial map convolutional layer. Of course, if the initial graph convolution network includes an initial graph convolution layer, the computer device directly inputs the initial features of the at least two labels into the initial graph convolution layer, and performs global correlation processing on the initial features of the at least two labels through the correlation between the feature correlation function and the at least two labels, so as to directly obtain the initial global features of the at least two labels output by the initial graph convolution layer.

In one possible example, the association relationship between the various tags may be represented in the form of a topological graph. Illustratively, each map convolutional layer of the map convolutional network comprises a label topological graph, and the label topological graph is used for representing the characteristics of the at least two labels and the association relationship between the at least two labels. For example, the Graph Convolution Network may be a GCN (Graph Convolution Network). The computer device may first construct an initial label topology map using the features of each label and optimize the initial label topology map during the model training process. For each initial map convolutional layer, the computer device performs global correlation on the first features of the at least two labels, and obtaining the second features may include: the computer equipment constructs an initial label topological graph based on the incidence relation between at least two labels of the at least two labels and the first characteristics of the at least two labels; the initial label topological graph comprises at least two vertexes and edges between the at least two vertexes, a first vertex description matrix corresponding to the at least two vertexes is used for representing first characteristics of the at least two labels, and a correlation matrix corresponding to the edges between the at least two vertexes is used for representing incidence relation between the at least two labels; and the computer equipment calculates the product of the correlation matrix, the first vertex description matrix and the weight matrix of the characteristic correlation function according to the characteristic correlation function to obtain a second vertex description matrix. Wherein the correlation matrix comprises probabilities of co-occurrence between the at least two labels, and the second vertex description matrix is used to represent second features of the at least two labels. In one possible example, the feature correlation function may include a first feature of the at least two tags, an association between the at least two tags, and a relational expression between the second features. For example, the feature correlation function may be in the form of a relational expression shown in the following formula one, where the input of the feature correlation function is a first vertex description matrix and a correlation matrix, and a second vertex description matrix output to a next layer is obtained after the feature correlation function is processed:

the formula I is as follows:

；

wherein the content of the first and second substances,

representing the first vertex description matrix, i.e. the second

Vertex description matrices of the layers;

representing the second vertex description matrix, i.e. the first

Vertex description matrices of the layers; a represents a correlation matrix, and each element in A represents the probability of common occurrence between every two labels; f denotes a feature correlation function. At one endIn one possible example, the feature correlation function may include a weight matrix representing a magnitude of correlation between at least two tags; for example, the following equation two shows one possible form of the feature correlation function f:

the formula II is as follows:

；

wherein the content of the first and second substances,

a matrix of the correlation is represented and,

the correlation matrix A may be a matrix obtained by normalizing the correlation matrix A;

representing a weight matrix; as an example, a transformation matrix that can be learned may be used as the weight matrix, for example, the weight matrix may be a Laplacian matrix (Laplacian matrix); h denotes a non-linear activation function, for example, h may be a Leakey ReLU (modified Linear Unit), where a ReLU (modified Linear Unit). Through the feature correlation function, complex internal relations between the vertexes can be learned and modeled in a stacked multi-layer graph convolution mode.

It should be noted that the input of the graph convolution network is the initial feature of the global tag, and the initial feature may be an embedding matrix of the tag. The initial feature may be obtained in three ways as exemplified in the following example one to example three. Example one, a word vector of a label may be obtained from a corpus as an embedding matrix of the label; for example, for a certain tag, according to different language representations of the tag, an embedding matrix representing the chinese corpus of the tag or an embedding matrix representing the english corpus of the tag may be obtained from the chinese corpus. Example two, the initial feature of the tag may be obtained through the image feature; for example, the image features of each image in the image set can be extracted through a neural network, and the initial features of each label are counted according to the label of each image and the image features of each image; for example, for each label, the average value of the image feature matrix of each image including the label is used as the embedding matrix of the label. Example three, the initial characteristics of the tag can be obtained through the probability characteristics of data of multiple data types; for example, for an image set, text features of the image set may be extracted through a neural network, and image features of the image set may be extracted; the text features comprise the probability of matching the text with each label, and the image features comprise the probability of matching the image with each label; then, counting the initial characteristics of each label according to the text characteristics and the image characteristics; for example, the text may include a title of each image in the set of images, and the text features of the set of images may include title features of each image in the set of images. In example three, the text features and the image features of the images in the image set may be stitched to obtain a stitching probability matrix of each image; and for each label, taking the average value of the splicing probability matrix of each image comprising the label as an embedding matrix of the label.

It should be noted that, as shown in fig. 3, in a first graph convolution layer of the GCN network, a large number of initial feature vectors describing the features of the tags are input, for example, a word embedding matrix of C tags may be an initial feature matrix with a dimension d, and in the initial graph convolution network, an initial tag topological graph is first constructed based on a vertex description matrix H and a correlation matrix corresponding to each tag; as shown in fig. 4, constructing a label topological graph with a vertex description matrix H of the label as a vertex and a correlation matrix a as an edge; using a feature correlation function calculation, e.g. in the second initial map convolution layer, the input dimension is d^＇A feature matrix of (a); the global label feature with the dimension D is finally output through a plurality of initial graph convolution layers; and the feature vector of each tag is aligned with the dimension of the feature vector of the multi-type feature, for example, the global tag feature may be a tag of D × C (feature vector dimension × number of classes)A system matrix, wherein D can represent a target dimension, and C represents the number of labels, namely the number of categories; and finally, multiplying the label relation matrix with a characteristic vector matrix of the multi-type characteristics to obtain the matching probability of each label, namely the probability that the semanteme of the information to be identified is consistent with each label in the C labels.

Step S3, the computer device inputs the sample set into the initial feature extraction network, and predicts the sample label of the sample set based on the sample feature output by the initial feature extraction network and the initial global label feature output by the initial graph convolution network.

The set of samples includes a plurality of samples and truth labels for the plurality of samples. The truth label for each sample may be the label that truly matches the sample. The feature dimensions of at least two of the initial global label features are the same as the feature dimensions of the sample feature. The sample feature may be a multi-type feature of the sample, and is used for characterizing data features of at least two types of data included in the sample. Inputting the sample set labeled with the truth label into an initial feature extraction network by the computer equipment to extract the sample feature of each sample in the sample set; and respectively determining the matching probability between each label and the sample according to the initial characteristic weight of at least two labels in the initial global label characteristic in the target dimension and the characteristic of the target dimension included in the sample characteristic. For example, the label with the matching probability greater than the matching threshold can be used as the predicted sample label of the sample set.

In one possible example, the initial feature extraction network may be a pre-training model, and a pre-training process of the initial feature extraction network may be trained using a large-scale pre-training corpus. Wherein the initial feature extraction network comprises an initial text network for extracting text features; taking the pre-training process of the initial text network as an example, the process may include: the computer equipment shields words included in at least two sample texts in a first training data set to obtain at least two first sample texts, and predicts the shielded words in the at least two first sample texts through the initial text network to obtain predicted shielded words, wherein the first training data set includes at least two sample texts, and each sample text includes at least two words; the computer equipment predicts context information of at least two sample text pairs included in a second training data set through the initial text network to obtain predicted context information of the at least two sample text pairs, wherein the second training data set includes at least two sample text pairs with labeling labels, and the labeling labels include the context information of the sample text pairs; and the computer equipment adjusts the model parameters of the initial text network based on the similarity between the occluded words and the predicted occlusion words of the first training data set and the similarity between the label tags of the second training data set and the predicted context information, and stops adjusting until the initial text network reaches a second target condition to obtain a pre-trained text network. For example, the computer device may pre-acquire a first training data set and a second training data set. For example, the sample text in the first training dataset may be a sentence including a plurality of words, for example, the sentence may have certain language logic, for example, the sentence may be a sentence with a subject-predicate relationship, and the occluded word is a word in the sentence. The pair of sample texts in the second training data set may be a sentence pair comprising two sentences. The annotation tag can be the contextual relationship of the two sentences. For example, the annotation tag may be: the two sentences have a context relationship, and sentence A is the next sentence of sentence B; alternatively, sentence a and sentence B do not have a contextual relationship, etc. The ability of the initial text network to predict certain occluded words in the sentences and the ability to predict the context information of the sentence pairs can be trained, so that the initial text network can fully learn the context semantic features.

For example, a text network may employ a pre-training model BERT model, and for the pre-training process of the initial BERT model, the following two tasks may be trained on large-scale unsupervised corpus using a bidirectional Transformer (Transformer) structure:

task one: mask LM (mask Language Model) tasks (randomly blocking part of input words and then predicting those blocked words); for example, 15% of tokens (words) in the total corpus are randomly occluded, where 80% of tokens are replaced with occluded words, 10% of tokens are replaced with arbitrary tokens, and the remaining 10% of tokens remain unchanged. When the network is trained, the initial text network needs to predict the corresponding value of token (shielded) removed by mask through context semantics.

And a second task: next sequence Prediction, that is, given two sentences A and B, where B has a 50% probability of being the Next utterance of A, the initial text network in training the network needs to predict whether B is the Next utterance of A.

It should be noted that the BERT model can make the network learn sufficient context semantic features by training the two tasks on the large-scale unsupervised corpus, and in the pre-training stage, because no data is needed to be labeled, a huge amount of unsupervised corpus can be collected for pre-training, so that the labor cost is saved, and after pre-training, the representation capability of the initial text network and the initial image network is greatly improved, after pre-training, only a finetune BERT model is needed during the training of the target model, that is, when the finetune BERT model is trained in the subsequent training process of the target model, a good training effect can be obtained only by labeling a small amount of samples.

In one possible example, the image network may adopt a pre-training model BiT model as a module for extracting image features; because the BiT model is optimized aiming at the pre-training, the pre-training corpus with larger scale is used; for example, substituting GN (Group Normalization) and Weight Normalization for BN (Batch Normalization) in the pre-training stage reduces the impact of Batch size on training; a Hyperrule mechanism is proposed to reduce the call participation work of the finetune stage. The representation capability of the BiT model is greatly improved through the optimization of pre-training, a good effect can be achieved only by performing finetune on fewer labeled samples on a downstream fine-tuning training task, and the model training efficiency and the accuracy of a target model identification label obtained through training are improved.

Step S4, the computer device adjusts the feature correlation function of the graph convolution network based on the similarity between the truth label of the sample set and the sample label, and adjusts the model parameter of the feature extraction network until the initial model reaches the first target condition, and then stops adjusting to obtain the target model.

The computer equipment can calculate the similarity between the true value label of the sample and the predicted sample label by using the loss function, and adjust the characteristic correlation function of the graph convolution network and the model parameter of the characteristic extraction network according to the calculated similarity; for example, the initial model may be optimized by a gradient descent method to obtain a target model including a trained graph convolution network and a feature extraction network; therefore, the global label features can be obtained by using the graph convolution network trained in the target model, and the multi-type features of the information to be identified are obtained through the feature extraction network. The target model may further include a multi-type fusion module, and the multi-type fusion module may include a feature fusion network for performing feature fusion on data features of multiple and multi-type data. Of course, in the training phase, the model parameters of the feature fusion network may also be optimized. The first target condition may include, but is not limited to: the training time exceeds a preset time threshold, the iteration number exceeds a preset number threshold, the difference between the true value label and the sample label is less than a threshold difference threshold, and the like.

In one possible implementation, the feature correlation function includes a weight matrix, and the computer device may adjust the weight matrix in the feature correlation function based on a similarity between the truth labels of the set of samples and the sample labels.

It should be noted that, through the above training process of the target model, a graph convolution network capable of accurately outputting global label features among a large number of labels is obtained; in the training process, by inputting the initial features of the global labels into the initial graph convolution network, for example, the global labels may include all labels configured based on needs, and the initial features may be word embedding matrices of the labels; obtaining an initial global label characteristic; and extracting the sample characteristics of the sample set through the initial characteristic extraction network, and predicting the sample label of the sample by using the initial global label characteristics and the sample characteristics. Therefore, the feature correlation function in the initial graph convolution network can be adjusted based on the difference between the sample label and the truth label, for example, the weight matrix included in the feature correlation function can be adjusted; the weight matrix represents the magnitude of the correlation between at least two labels, and the initial global label characteristics are continuously optimized in the data transmission process of the layer-by-layer graph convolution layer by adjusting the weight matrix, so that a matrix capable of accurately representing the correlation between the labels is obtained.

And the topological relation among the labels is modeled by using a graph convolution network GCN, the correlation among the multiple labels is established by using a graph topological structure, the nodes in the topological graph are all different labels required to be used in the multi-label classification task, the edges among the nodes in the topological graph are the interrelations among the different labels, the weight of the edges is the correlation size of the different related labels, the incidence relation among the labels can be flexibly established by the graph convolution network, and the number of the labels can be expanded as required, so that the label identification expansibility and the flexibility of the application are improved.

And the initial characteristics of the labels are used as prior characteristics, and a machine learning technology is utilized in a model training stage to obtain a dependent global label characteristic matrix of DxC, wherein the global label characteristic matrix is equivalent to a target classifier corresponding to the C labels. Since the mapping parameters of each graph convolution layer between the initial features of the label and the global label feature matrix of the final output dxc are shared among all classes (namely, global labels), the feature correlation function of the GCN is adjusted based on the difference between the true value label and the sample label when the target model is trained, so that the gradient is transmitted back to the initial graph convolution network during training, thereby realizing effective optimization of the initial graph convolution network and implicit modeling of the correlation of the label.

And step 204, the computer equipment determines the label of the information to be identified based on the matching degree between the information to be identified and each label.

The computer device obtains the matching probability of each label by calculating the product between the global feature matrix of each label and the multi-type feature matrix. The matching probability is a probability that the semantic meaning of the information to be recognized matches the label, for example, the matching probability represents a probability that the image content in the information to be recognized includes the label, a probability that the image content is related to the label, and the like. The computer device may determine, through the classifier, a tag whose matching probability meets the third target condition as the tag of the information to be identified. For example, the third target condition may include, but is not limited to: the match probability is greater than a target match threshold (e.g., the match probability is greater than 0.5, the match probability is greater than 0.7, etc.), the match probability is within a target range threshold (e.g., the match probability is greater than 0.5 and less than 1, the match probability is greater than 0.4 and less than 0.9, etc.), and so on.

As shown in fig. 3, the overall structure of the target model is as shown in fig. 3, and may include a Representation learning module, a Graph relational Network (Graph convolution Network) multi-label relation modeling module, and a multi-type fusion module. The Representation learning module comprises a BiT model and a BERT model. Inputting an image to be recognized into a BiT model, extracting image characteristics of the cartoon image in the figure 3 through the BiT model, and further obtaining the image characteristics D1 of the image through a classifier. The title "explain psychological fear cartoon" xxxxxxx "of the cartoon image to be recognized, 4 th talk: XXXXXXX in the night meeting XXXXXXX, the puzzle of XX uncovering XX! ", input the BERT model, extract the text feature through BERT model, and further get the text feature D2 of the title through the classifier. D1 and D2 are fused by a multi-type fusion module in a splicing or MoE mechanism mode and the like to obtain D-dimensional multi-type characteristics; for example, the text features can be input into the classifier through a text task and an image task to obtain text label confidence degrees corresponding to the text features, the image features are input into the classifier to obtain image label confidence degrees corresponding to the image features, and the text label confidence degrees and the image label confidence degrees are fused to obtain a D-dimensional fusion confidence degree matrix. Or, feature fusion can be performed on the image features and the text features, and the fused features are input into a classifier to obtain a D-dimensional fusion confidence matrix representing the multi-type features. Alternatively, the fused features may be directly used as a D-dimensional fused feature matrix representing multi-type features of the multi-type features. Of course, besides feature fusion by using MoE mechanism, other ways may be used for feature fusion, for example, ways that may be used in a multi-type fusion module include, but are not limited to: feature Fusion modes such as MoE, Concat, Mix Concat, Attention, LMF (Low-rank Multimodal Fusion), gcfrorest (deep forest model), and the like. Through the GCN network, a global label matrix D × C of C labels is obtained (D indicates that the global label matrix has the same dimension as the multi-type features, e.g., the global label matrix may include C D-dimensional matrices). And calculating the product of matrix multiplication between the D-dimensional matrix corresponding to the multi-type features and the global label matrix D multiplied by C, and further obtaining the matching probability between the image to be identified and each label in the C labels.

According to the tag identification method, multi-type feature extraction can be performed on information to be identified through the feature extraction network, and the obtained multi-type features can represent data features of at least two types of data; the global features of at least two labels included in the global label features are determined based on the initial features of each label and the incidence relation between the at least two labels, so that the global label features can represent the incidence relation among the labels in the global label range, label identification is carried out by combining the global correlation among a plurality of labels, the problem of identification errors caused by the independent processing of a single label is avoided, and the accuracy of label identification can be improved.

And the topological relation among the labels is modeled by using the graph convolution network, the incidence relation among the labels can be flexibly constructed, and the number of the labels can be enlarged as required, so that the method and the device can be suitable for identifying the matching degree between the labels of any scale and the information to be identified, and the expansibility and the flexibility of label identification of the method and the device are improved.

In one possible implementation scenario, the information to be identified may be an image to be identified.

Fig. 5 is a flowchart of a cover image determination method based on tag identification according to an embodiment of the present application, and as shown in fig. 5, an execution subject of the method may be a server, and the method may include the following steps:

step 501, responding to a cover picture acquiring request, and acquiring at least one image to be identified by a server.

Upon selecting a cover art for the user, the server may tag the images to filter out inappropriate cover art images based on the identified tags. In one possible scenario, the server receives a cover art acquisition request sent by the terminal, acquires one or more images to be identified based on the cover art acquisition request, and identifies tags of the images to be identified based on step 502 and step 504, so as to select an appropriate cover art image based on the tags of the images to be identified.

Step 502, the server performs multi-type feature extraction on each image to be identified through a feature extraction network to obtain multi-type features of each image to be identified.

The server can identify the image features of the image to be identified and the text features of the image to be identified through the feature extraction network, and perform feature fusion on the image features and the text features to obtain the multi-type features.

Step 503, for each image to be recognized, the server determines the matching degree between the information to be recognized and each label based on the global feature of each label in the multi-type features and the global label features.

The server can obtain the global label features through a graph convolution network of the target model, wherein the global label features comprise global features of at least two labels, and the global features of the at least two labels are determined based on the initial features of each label and the incidence relation between the at least two labels. Therefore, the global feature of each tag is used to represent the tag feature of the tag and the association relationship between the tag and other tags in the globally included tags. The server can calculate the product between the feature matrix of the multi-type features and the feature matrix of the global label features to obtain the matching degree between the multi-type features of the image to be recognized and the global features of each label.

Step 504, the server determines the label of each image to be recognized based on the matching degree between the image to be recognized and each label.

The server may use the label whose matching degree meets the third target condition as the label of the image to be recognized. It should be noted that the implementation manner of the steps 502-504 is the same as the process of the steps 201-204, and the details are not repeated here.

Step 505, the server selects an image not including the tag of the first object category as a cover image from the determined at least one image to be recognized based on the determined tag of each image to be recognized.

The server may transmit the cover image to the terminal. The first target category is a category of a negative feedback image of a user; for example, the first target category may be a discomfort image category, which may include images of content that is likely to cause user discomfort, such as psychological fear images or images of content including skin diseases, snakes, bugs, etc. Such information is not suitable for recommendation to all users. As shown in fig. 3, a label topology map of the image convolution network may be constructed in conjunction with all labels associated with the uncomfortable image categories to identify labels for the uncomfortable image categories that the image to be identified may match. Of course, if the label matching the image to be recognized has labels such as bugs and pox, the label is not suitable for being selected as the cover image. As shown in fig. 6, the image has both the dense fear label and the insect label, and the label topological graph of the graph convolution network is constructed in fig. 3, so that the associated relationship between the dense fear and the insect can be mined.

In one possible example, the server obtains at least one candidate image that does not include a tag of the first target category and sends the at least one candidate image to the terminal; the terminal may transmit a candidate image selected by the user among the at least one candidate image to the server, and the server may use the candidate image selected by the user as the cover image.

In yet another possible implementation scenario, the information to be identified may be information to be recommended to the user, such as information of videos, articles, audios, images, and the like to be recommended.

Fig. 7 is a flowchart of an information recommendation method based on tag identification according to an embodiment of the present application, and as shown in fig. 7, an execution subject of the method may be a server, and the method may include the following steps:

step 701, responding to an information recommendation request, and acquiring at least one piece of information to be recommended by a server.

When recommending information for a user, for example, pushing a video stream, the computer device may perform tag identification on information to be recommended, so as to adjust a recommendation policy in due time based on the identified tag, for example, filter out information unsuitable for recommendation, or reduce a recommendation weight of the information to be recommended based on the identified tag. The information to be recommended may be text, images, video streams, audio, and the like. In one possible example, the server receives an information recommendation request sent by the terminal, obtains at least one piece of information to be recommended based on the information recommendation request, and identifies the tag of the information to be recommended based on the

step

702 and 704 to perform information recommendation based on the identified tag.

Step 702, the server performs multi-type feature extraction on each piece of information to be recommended through a feature extraction network to obtain multi-type features of each piece of information to be recommended.

The server can extract the data characteristics of various types of data such as image characteristics of the information to be recommended, text characteristics of the information to be recommended, audio characteristics and the like through a characteristic extraction network, and performs characteristic fusion on the image characteristics, the text characteristics, the audio characteristics and the like to obtain various types of characteristics of each information to be recommended.

Step 703, for each piece of information to be recommended, the server determines the matching degree between the piece of information to be recommended and each label based on the global features of each label in the multi-type features and the global label features.

The server can obtain the global label characteristics through a graph convolution network of the target model, and the server can calculate the product of the characteristic matrix of the multi-type characteristics and the characteristic matrix of the global label characteristics to obtain the matching degree between the information to be recommended and each label.

Step 704, the server determines the label of each piece of information to be recommended based on the matching degree between the piece of information to be recommended and each label.

The server may use the tag whose matching degree meets the third target condition as the tag of the information to be recommended. It should be noted that the implementation manner of the steps 702-704 is the same as the process of the steps 201-204, and the description thereof is omitted here.

Step 705, if the label of the information to be recommended belongs to the second target category, the server reduces the recommendation weight of the information to be recommended.

The recommendation weight is used to indicate a possibility of recommending the information to be recommended to the user, for example, the recommendation weight may be a probability of recommending the information to be recommended to the user. The second target category is a category of negative feedback information for the user. For example, the second target class may be a discomfort image class. Alternatively, the second target category may be a category that is not of interest to the user. The server reduces the recommendation weight of the determined information to be recommended, including the information of the tags of the second target category, so that the recommendation probability of recommending the information is reduced in the subsequent recommendation process. For example, the server may decrease the recommendation weight of the information to be recommended according to a certain weight decreasing coefficient, for example, the weight decreasing coefficient may be 0.5, and then the product value between the recommendation weight 0.8 of the video a and the weight decreasing coefficient may be calculated to be 0.8 × 0.5=0.4, that is, the recommendation weight of the video a is decreased to 0.4.

In a possible example, the server adjusts the recommendation weight of each piece of information to be recommended through step 705, and after step 705, the server may push, to the user, target information in the information to be recommended, where a corresponding recommendation weight meets a fourth target condition, according to the adjusted recommendation weight, where the fourth target condition may include, but is not limited to: a recommended weight of not less than 0.5, a recommended weight of more than 0.6 and less than 0.9, and the like.

Fig. 8 is a schematic structural diagram of a tag identification apparatus according to an embodiment of the present application. As shown in fig. 8, the apparatus includes:

the feature extraction module 801 is configured to acquire information to be identified, and perform multi-type feature extraction on the information to be identified through a feature extraction network to obtain multi-type features of the information to be identified, where the information to be identified includes at least two types of data, and the multi-type features are used to represent data features of the at least two types of data;

a matching degree determining module 802, configured to determine, based on the global features of each of the multi-type features and the global tag features, a matching degree between the information to be identified and each of the tags, respectively, where the global tag features include global features of at least two tags, and the global features of the at least two tags are determined based on the initial features of each of the tags and an association relationship between the at least two tags;

an identifying module 803, configured to determine a tag of the information to be identified based on the matching degree between the information to be identified and each tag.

In one possible implementation, the global label feature is obtained through a graph convolution network of an object model, and the object model includes the feature extraction network and the graph convolution network; the apparatus also includes a model training module comprising:

a construction unit for constructing an initial model;

the global label association unit is used for inputting the initial features of the at least two labels into the initial graph convolutional network and outputting initial global label features based on the initial features of the at least two labels and the feature correlation function of the initial graph convolutional network;

the sample prediction unit is used for inputting a sample set into the initial feature extraction network, and predicting sample labels of the sample set based on sample features output by the initial feature extraction network and initial global label features output by the initial graph convolution network;

inputting the initial characteristics of the at least two labels into a first initial map convolutional layer;

the global label association unit is used for constructing an initial label topological graph based on the association relationship between the at least two labels and the first characteristics of the at least two labels; according to the characteristic correlation function, calculating a product of the correlation matrix, the first vertex description matrix and a weight matrix of the characteristic correlation function to obtain a second vertex description matrix, wherein the correlation matrix comprises the probability of common occurrence of the at least two labels, and the second vertex description matrix is used for representing a second characteristic of the at least two labels;

the initial label topological graph comprises at least two vertexes and edges between the at least two vertexes, a first vertex description matrix corresponding to the at least two vertexes is used for representing first characteristics of the at least two labels, and a correlation matrix corresponding to the edges between the at least two vertexes is used for representing incidence relation between the at least two labels;

in one possible implementation, the adjusting unit is configured to adjust the weight matrix in the feature correlation function based on a similarity between the truth label of the sample set and the sample label.

In one possible implementation, the initial feature extraction network includes an initial text network for extracting text features; the device also includes:

the occlusion prediction module is used for occluding words included in at least two sample texts in a first training data set to obtain at least two first sample texts, predicting occluded words in the at least two first sample texts through the initial text network to obtain predicted occluded words, wherein the first training data set includes at least two sample texts, and each sample text includes at least two words;

In a possible implementation manner, the feature extraction module 801 is further configured to perform feature extraction on the at least two types of data through at least two feature extraction networks, respectively, to obtain data features of the at least two types of data; and performing feature fusion on the data features of the at least two types of data through a multi-type fusion module of the target model to obtain the multi-type features.

In one possible implementation manner, the feature extraction module 801 is further configured to, for each type of data, extract features of the data through a feature extraction network corresponding to the type; determining, by a classifier of the feature extraction network corresponding to the type, a tag confidence of the data according to the extracted features of the data, the tag confidence indicating a possibility that the data matches the at least two tags;

the feature extraction module 801 is further configured to fuse the tag confidence degrees of the at least two types of data through the multi-type fusion module to obtain a fusion confidence degree of the information to be identified.

In one possible implementation, the multi-type features include features of the information to be identified in a target dimension; the matching degree determining module 802 is configured to determine, according to the feature weight of each tag in the target dimension included in the global tag feature and the feature of the target dimension included in the multi-type feature, a matching probability between each tag and the information to be identified, where the feature dimension of each tag is the same as the feature dimension of the multi-type feature.

a cover image identification module, configured to select, in response to a cover image acquisition request, an image that does not include a tag of a first target category from the determined images to be identified as a cover image based on the determined tags of the images to be identified, where the first target category is a category of a negative feedback image of a user, and the information to be identified is the image to be identified;

the information recommendation module is used for responding to the information recommendation request, if the label of the information to be recommended belongs to a second target category, the recommendation weight of the information to be recommended is reduced, the recommendation weight is used for indicating the possibility of recommending the information to be recommended to the user, the second target category is the category of negative feedback information of the user, and the information to be identified is the information to be recommended to the user.

The tag identification device of the embodiment of the application can extract the multi-type features of the information to be identified through the feature extraction network, and the obtained multi-type features can represent the data features of at least two types of data; the global features of at least two labels included in the global label features are determined based on the initial features of each label and the incidence relation between the at least two labels, so that the global label features can represent the incidence relation among the labels in the global label range, label identification is carried out by combining the global correlation among a plurality of labels, the problem of identification errors caused by the independent processing of a single label is avoided, and the accuracy of label identification can be improved.

The tag identification apparatus of this embodiment can execute the tag identification method shown in the above embodiments of this application, and the implementation principles thereof are similar, and are not described herein again.

Fig. 9 is a schematic structural diagram of a computer device provided in an embodiment of the present application. As shown in fig. 9, the computer apparatus includes: a memory and a processor; at least one program stored in the memory for execution by the processor, which when executed by the processor, implements:

In an alternative embodiment, a computer device is provided, as shown in FIG. 9, the computer device 900 shown in FIG. 9 comprising: a processor 901 and a memory 903. Wherein the processor 901 is coupled to the memory 903, such as via a bus 902. Optionally, the computer device 900 may further include a transceiver 904, and the transceiver 904 may be used for data interaction between the computer device and other computer devices, such as transmission of data and/or reception of data, and the like. It should be noted that the transceiver 904 is not limited to one in practical applications, and the structure of the computer device 900 is not limited to the embodiment of the present application.

The Processor 901 may be a CPU (Central Processing Unit), a general-purpose Processor, a DSP (Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), an FPGA (field programmable Gate Array) or other programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 901 may also be a combination of computing functions, e.g., comprising one or more microprocessors, DSPs, and microprocessors, among others.

Bus 902 may include a path that transfers information between the above components. The bus 902 may be a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus 902 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 9, but this does not indicate only one bus or one type of bus.

The Memory 903 may be a ROM (Read Only Memory) or other type of static storage device that can store static information and instructions, a RAM (Random Access Memory) or other type of dynamic storage device that can store information and instructions, an EEPROM (Electrically Erasable Programmable Read Only Memory), a CD-ROM (Compact Disc Read Only Memory) or other optical Disc storage, optical Disc storage (including Compact Disc, laser Disc, optical Disc, digital versatile Disc, blu-ray Disc, etc.), a magnetic Disc storage medium or other magnetic storage device, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to these.

The memory 903 is used for storing application program codes (computer programs) for executing the present application, and the processor 901 controls the execution. The processor 901 is configured to execute application program code stored in the memory 903 to implement the content shown in the foregoing method embodiments.

Among these, computer devices include, but are not limited to: a server, a terminal, or any electronic device capable of performing tag identification using a model.

The present application provides a computer-readable storage medium, on which a computer program is stored, which, when running on a computer, enables the computer to execute the corresponding contents of the tag identification method in the foregoing method embodiments.

Embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the tag identification method described above.

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

The foregoing is only a partial embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A method of tag identification, the method comprising:

2. The tag identification method according to claim 1, wherein the global tag feature is obtained through a graph convolution network of an object model, the object model including the feature extraction network and the graph convolution network; the training mode of the target model comprises the following steps:

constructing an initial model, inputting the initial characteristics of the at least two labels into an initial graph convolution network, and outputting initial global label characteristics based on the initial characteristics of the at least two labels and a characteristic correlation function of the initial graph convolution network;

inputting a sample set into an initial feature extraction network, and predicting sample labels of the sample set based on sample features output by the initial feature extraction network and initial global label features output by the initial graph convolution network;

based on the similarity between the truth value label of the sample set and the sample label, adjusting the characteristic correlation function of the initial graph convolution network, and adjusting the model parameters of the initial characteristic extraction network until the initial model reaches a first target condition, and obtaining the target model;

3. The label identification method of claim 2 wherein the initial graph convolution network includes at least two initial graph convolution layers; inputting the initial features of the at least two tags into an initial graph convolution network, and outputting an initial global tag feature based on the initial features of the at least two tags and a feature correlation function of the initial graph convolution network, including:

4. The tag identification method according to claim 3, wherein each graph convolution layer of the graph convolution network comprises a tag topology map, and the tag topology map is used for representing the characteristics of the at least two tags and the association relationship between the at least two tags;

the performing global correlation processing on the first features of the at least two tags through the association relationship between the at least two tags and the feature correlation function to obtain the second features of the at least two tags includes:

constructing an initial label topological graph based on the incidence relation between the at least two labels and the first characteristics of the at least two labels;

and calculating a product of the correlation matrix, the first vertex description matrix and a weight matrix of the feature correlation function according to the feature correlation function to obtain a second vertex description matrix, wherein the correlation matrix comprises the probability of common occurrence of the at least two labels, and the second vertex description matrix is used for representing a second feature of the at least two labels.

5. The label identification method of claim 4, wherein the adjusting the characteristic correlation function of the graph convolution network based on the similarity between the truth labels and the sample labels of the sample set comprises:

adjusting a weight matrix in the feature correlation function based on a similarity between the truth labels and the sample labels of the sample set.

6. The tag identification method of claim 3, wherein the initial feature extraction network comprises an initial text network for extracting text features; before the inputting the sample set into an initial feature extraction network and predicting the sample labels of the sample set based on the sample features output by the initial feature extraction network and the initial global label features output by the initial graph convolution network, the method further includes:

blocking words included in at least two sample texts in a first training data set to obtain at least two first sample texts, predicting blocked words in the at least two first sample texts through the initial text network to obtain predicted blocking words, wherein the first training data set includes at least two sample texts, and each sample text includes at least two words;

predicting context information of at least two sample text pairs included in a second training data set through the initial text network to obtain predicted context information of the at least two sample text pairs, wherein the second training data set includes at least two sample text pairs with labeling labels, and the labeling labels include the context information of the sample text pairs;

and adjusting the model parameters of the initial text network based on the similarity between the occluded words and the predicted occluded words of the first training data set and the similarity between the label labels of the second training data set and the predicted context information, and stopping adjusting until the initial text network reaches a second target condition to obtain a pre-trained text network.

7. The tag identification method according to claim 1, wherein said performing multi-type feature extraction on the information to be identified through a feature extraction network to obtain multi-type features of the information to be identified comprises:

respectively extracting the features of the at least two types of data through at least two feature extraction networks to obtain the data features of the at least two types of data;

and performing feature fusion on the data features of the at least two types of data through a multi-type fusion module of the target model to obtain the multi-type features.

8. The tag identification method according to claim 7, wherein the extracting features of the at least two types of data through at least two feature extraction networks to obtain the data features of the at least two types of data respectively comprises:

for each type of data, extracting the features of the data through a feature extraction network corresponding to the type;

determining, by a classifier of a feature extraction network corresponding to the type, a tag confidence of the data according to the extracted features of the data, the tag confidence being used to indicate a likelihood that the data matches the at least two tags;

correspondingly, the performing feature fusion on the data features of the at least two types of data through the multi-type fusion module of the target model to obtain the multi-type features includes:

and fusing the label confidence degrees of the at least two types of data through the multi-type fusion module to obtain the fusion confidence degree of the information to be identified.

9. The tag identification method according to claim 1, wherein the multi-type features include features of the information to be identified in a target dimension; the determining, based on the global features of each tag in the multi-type features and the global tag features, a matching degree between the information to be identified and each tag, respectively, includes:

and respectively determining the matching probability between each label and the information to be identified according to the feature weight of each label in the target dimension included in the global label features and the features of the target dimension included in the multi-type features, wherein the feature dimension of each label is the same as the feature dimension of the multi-type features.

10. The tag identification method of claim 1, further comprising at least one of:

in response to a cover image acquisition request, selecting an image not including a tag of a first target category from the determined images to be recognized as a cover image based on the determined tags of the images to be recognized, wherein the first target category is a category of a negative feedback image of a user, and the information to be recognized is the images to be recognized;

in response to an information recommendation request, if a tag of information to be recommended belongs to a second target category, reducing a recommendation weight of the information to be recommended, wherein the recommendation weight is used for indicating the possibility of recommending the information to be recommended to a user, the second target category is a category of negative feedback information of the user, and the information to be identified is the information to be recommended to the user.

11. A tag identification device, the device comprising:

12. A computer device comprising a memory, a processor and a computer program stored on the memory, wherein the processor executes the computer program to implement the tag identification method of any one of claims 1 to 10.

13. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the tag identification method of any one of claims 1 to 10.

14. A computer program product comprising a computer program, characterized in that the computer program realizes the tag identification method of any of claims 1 to 10 when executed by a processor.