WO2022121485A1 - Image multi-tag classification method and apparatus, computer device, and storage medium - Google Patents

Image multi-tag classification method and apparatus, computer device, and storage medium Download PDF

Info

Publication number
WO2022121485A1
WO2022121485A1 PCT/CN2021/122741 CN2021122741W WO2022121485A1 WO 2022121485 A1 WO2022121485 A1 WO 2022121485A1 CN 2021122741 W CN2021122741 W CN 2021122741W WO 2022121485 A1 WO2022121485 A1 WO 2022121485A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
label
matrix
graph
processed
Prior art date
Application number
PCT/CN2021/122741
Other languages
French (fr)
Chinese (zh)
Inventor
罗彤
郭彦东
李亚乾
杨林
Original Assignee
Oppo广东移动通信有限公司
上海瑾盛通信科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司, 上海瑾盛通信科技有限公司 filed Critical Oppo广东移动通信有限公司
Publication of WO2022121485A1 publication Critical patent/WO2022121485A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features

Definitions

  • the embodiments of the present application relate to the technical field of image processing, and in particular, to a method, apparatus, computer equipment, and storage medium for multi-label classification of images.
  • the machine learning model provided by the artificial intelligence technology can intelligently determine the type of the object contained in the image, so as to label the image accordingly.
  • the existing model will have problems such as increased error rate or slow operation speed.
  • Embodiments of the present application provide a multi-label classification method, apparatus, computer device, and storage medium for images.
  • the technical solution is as follows:
  • a multi-label classification method for images comprising:
  • the image features are processed by a graph feature matrix to obtain the data to be activated.
  • the graph feature matrix is a matrix obtained after a knowledge graph is processed by a graph convolutional neural network, and the knowledge graph is used to indicate the attributes of the first label itself, and , the relationship between at least two of the first tags;
  • the data to be activated is processed by the activation layer in the label classification model to obtain at least two second labels;
  • At least two of the second tags are determined as tags of the image to be processed, and the second tags belong to the first tags.
  • a multi-label classification device for images comprising:
  • a first acquisition module used for acquiring the image to be processed
  • a feature extraction module for extracting image features of the to-be-processed image
  • the second obtaining module is configured to obtain the data to be activated according to the image feature and the graph feature matrix, the graph feature matrix is a matrix obtained after the knowledge graph is processed by the graph convolutional neural network, and the knowledge graph is used to indicate the first An attribute of a tag itself, and a relationship between at least two of the first tags;
  • a label determination module configured to obtain at least two second labels according to the data to be activated, and determine the at least two second labels as labels of the to-be-processed image, where the second labels belong to the first label Label.
  • a terminal includes a processor and a memory, the memory stores at least one instruction, and the instruction is loaded and executed by the processor to implement the method as described in the present application
  • a multi-label classification method for images provided by various aspects.
  • a computer-readable storage medium having stored therein at least one instruction, the instruction being loaded and executed by a processor to implement image processing as provided by various aspects of the present application Multi-label classification methods.
  • a computer program product comprising computer instructions stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the methods provided in the various optional implementations of the above-described aspect of multi-label classification of images.
  • FIG. 1 is an architecture diagram of a label classification model provided by an embodiment of the present application.
  • Fig. 2 is an architecture diagram of a label classification model provided based on the embodiment shown in Fig. 1;
  • FIG. 3 is a flowchart of a multi-label classification method for images provided by an embodiment of the present application
  • FIG. 4 is a schematic diagram of a global feature provided based on the embodiment shown in FIG. 3;
  • Fig. 6 is a kind of visual interface after image processing provided based on the embodiment shown in Fig. 3;
  • FIG. 7 is a flowchart of a multi-label classification method for images provided by another exemplary embodiment of the present application.
  • FIG. 8 is a schematic diagram of an image post-processing provided based on the embodiment shown in FIG. 7;
  • FIG. 9 is a schematic diagram of a process for automatically generating an album provided by an embodiment of the present application.
  • FIG. 10 is a structural block diagram of an apparatus for multi-label classification of images provided by an exemplary embodiment of the present application.
  • FIG. 11 is a structural block diagram of a terminal provided by an exemplary embodiment of the present application.
  • FIG. 12 is a schematic structural diagram of a server provided by an embodiment of the present application.
  • a plurality means two or more.
  • “And/or”, which describes the association relationship of the associated objects means that there can be three kinds of relationships, for example, A and/or B, which can mean that A exists alone, A and B exist at the same time, and B exists alone.
  • the character “/” generally indicates that the associated objects are an "or" relationship.
  • the present application provides an image multi-label classification method, wherein the method includes: extracting image features of an image to be processed through a feature extraction layer in a label classification model, where the label classification model is used for the to-be-processed image.
  • a neural network model that adds at least two labels to an image; the image features are processed through a graph feature matrix to obtain data to be activated, and the graph feature matrix is a matrix obtained by processing a knowledge graph through a graph convolutional neural network, and the knowledge graph used to indicate the attributes of the first label itself, and the relationship between at least two of the first labels;
  • the data to be activated is processed by the activation layer in the label classification model to obtain at least two second labels; At least two of the second tags are determined as tags of the image to be processed, and the second tags belong to the first tags.
  • the processing of the image features by the atlas feature matrix to obtain the data to be activated includes: multiplying the image feature matrix and the atlas feature matrix to obtain the data matrix to be activated;
  • the activation layer in the classification model processes the to-be-activated data to obtain at least two second labels, including: processing the to-be-activated data matrix through the activation layer in the label classification model to obtain at least two of the second labels Two labels.
  • the knowledge graph includes a label relationship matrix and a node information matrix
  • the method further includes: inputting the label relationship matrix into the graph convolutional neural network, where the label relationship matrix is used to indicate at least two the relationship between the first labels; input the node information matrix into the graph convolutional neural network, the node information matrix is used to indicate the attributes of the first label itself; process through the graph convolutional neural network The label relationship matrix and the node information matrix are used to obtain the graph feature matrix.
  • the method further includes: completing the update in response to the data in the knowledge graph, and obtaining an updated knowledge graph; processing the updated knowledge graph through the graph convolutional neural network, and obtaining an updated knowledge graph.
  • the atlas feature matrix; the atlas feature matrix in the label classification model is updated through the updated atlas feature matrix.
  • the scale of the image feature matrix is N*1
  • the scale of the graph feature matrix is C*N
  • the scale of the data matrix to be activated is C*1
  • C is the number of the first label. number
  • N is the feature dimension
  • C and N are both positive integers.
  • the to-be-processed image includes a first image and a second image
  • the method further includes: acquiring the corresponding second label in response to the first image and the second image having acquired the corresponding second label.
  • shooting time relationship information between the first image and the second image the shooting time relationship information is used to indicate the time sequence relationship between the first image and the second image at the shooting time, or the The shooting time relationship information is used to indicate the time period between the shooting time of the first image and the shooting time of the second image; in response to the shooting time relationship information meeting the preset condition, the target second tag is added to all the second label corresponding to the second image, the target second label is the second label corresponding to the first image and not corresponding to the second image.
  • the adding the target second label to the second label corresponding to the second image in response to the photographing moment relationship information meeting a preset condition includes: in response to the target duration being less than a second threshold, The target second label is added as the second label corresponding to the second image, and the target duration is the duration between the shooting time of the first image and the shooting time of the second image.
  • the adding the target second label to the second label corresponding to the second image in response to the photographing moment relationship information meeting a preset condition includes: responding to the number of the first images is 2k, among the 2k first images, k first images are images taken before the second image, k first images are images taken after the second image, Acquiring the shooting moment of the first image; in response to the length of the interval where the shooting moments of the 2k first images are located is less than a third threshold, adding a target second label to the second label corresponding to the second image , the target second label is the label corresponding to the 2k first images and the target second label is the second label not corresponding to the second image, and k is an integer greater than or equal to 1.
  • the present application provides a multi-label classification method for images, which can input the to-be-processed image into a label classification model after acquiring the to-be-processed image, obtain image features corresponding to the image, obtain a map feature matrix according to a knowledge map, and combine image features and map features
  • the matrix obtains the data to be activated, and then at least two second labels corresponding to the image to be processed are obtained according to the data to be activated.
  • the knowledge graph is used to indicate the relationship between tags and the attributes of the tags themselves. Since the label classification model uses the information provided by the knowledge graph when adding multiple labels to the image to be processed, the present application improves the reliability of the multiple labels obtained from the image to be processed, while reducing the complexity of acquiring multiple labels .
  • Pending Image The image used to be tagged.
  • the image to be processed is an image captured by the terminal.
  • the image to be processed is an image captured by other computer equipment, and the terminal adds a tag to the image.
  • the image to be processed may also be a virtual image generated by other computer equipment according to a specified algorithm or other image tools.
  • the present application may divide the acquisition manner of the image to be processed into two approaches.
  • the first approach is to use the device of the multi-label classification method for images provided by the present application, and to shoot through the image acquisition component equipped by the device itself.
  • the second way is to obtain the device other than the device applying the multi-label classification method of images provided by the present application, obtain it by means such as image transmission, and then transmit it to the device for multi-label classification.
  • the image to be processed may be an image captured by the terminal through its own camera.
  • the image to be processed may be an image acquired by the server through a network and transmitted by the terminal, and the image is still an image captured by the terminal through a camera.
  • Neural network model It is a complex network system formed by interconnecting a large number of simple processing units. Among them, the processing unit may also be referred to as a neuron.
  • the neural network model can reflect many basic features of human brain function, and is essentially a nonlinear dynamic learning system.
  • a neural network model is a mathematical model to which a neural network structure is applied.
  • a part of the neural network model adopts a neural network structure, and the other part may adopt other data structures, and the above parts cooperate with each other to process the data and obtain the result desired by the designer.
  • the label classification model used in this application can add at least two second labels to the image to be processed, so as to realize the multi-label classification capability of the image.
  • CNN Convolutional Neural Networks
  • Feedforward Neural Networks feedforward Neural Networks
  • It is one of the most widely used algorithms in deep learning. .
  • CNN includes input layer, hidden layer and output layer.
  • the input layer is used to receive data that is fed into the CNN.
  • the input layer can process multi-dimensional data.
  • the three-dimensional input data received by the input layer is used to indicate the coordinates of the pixels and the RGB (Red Green Blue, red, green and blue) channels.
  • RGB Red Green Blue, red, green and blue
  • normalization can be performed to normalize the value of the RGB channel of the pixel from [0, 255] to [0, 1], so as to improve the learning efficiency and reasoning of the CNN ability.
  • the hidden layer can include three common structures: convolutional layer, pooling layer and fully connected layer.
  • the convolutional kernel For the introduction of the convolutional layer, it can be carried out from three perspectives: the convolutional kernel, the parameters of the convolutional layer and the excitation function.
  • A. Convolution kernel The convolutional layer contains multiple convolution kernels. For a convolution kernel, it includes several elements, and each element corresponds to a weight coefficient and a bias vector. Among them, the elements are similar to neurons in a feedforward neural network.
  • the parameters of the convolutional layer include the size of the convolution kernel, the stride and the padding.
  • the above three parameters together determine the size of the output feature map of the convolutional layer, which is the hyperparameter of the convolutional neural network.
  • the kernel size is an arbitrary value smaller than the input image size.
  • the larger the convolution kernel the more complex the extractable input features.
  • the convolution stride defines the distance between the positions of the convolution kernel when it scans the feature map twice adjacently. In response to a convolution stride of 1, the convolution kernel will sweep through the elements of the feature map one by one, and a stride of n will skip n-1 elements in the next scan. It should be noted that the purpose of padding is to maintain the feature dimension processed by the convolution kernel.
  • the activation function The role of the excitation function is to assist in expressing more complex features.
  • the output feature map is passed to the pooling layer for feature selection and information filtering.
  • the pooling layer is set with a preset pooling function, and the function of the pooling layer is to replace the result of a single point in the feature map with the feature map statistics of its adjacent areas.
  • the fully connected layer is used to non-linearly combine the features extracted by the aforementioned layers to obtain the output, and output the data to the output layer.
  • the upstream of the output layer in a CNN is usually a fully connected layer.
  • the output layer uses a logistic function or a normalized exponential function (softmax function) to output the classification labels.
  • softmax function a normalized exponential function
  • the output layer can be designed to output the center coordinates, size and classification of objects.
  • semantic segmentation the output layer outputs the classification result of each pixel.
  • GCN Graph Convolutional Neural Network
  • Knowledge graph a graph data used to indicate the respective attributes of multiple nodes and the relationship between multiple nodes.
  • the knowledge graph includes a label relationship matrix and a node information matrix. The sum of the above two matrices can be called a knowledge graph.
  • the present application provides a multi-label classification method for images, which can effectively improve the problems of high error rate or slow operation speed caused by multi-label classification of a single image in the related art. It should be noted that, in the related art, since only the feature extraction is performed on the image, and which tags are closer to the specific tags are determined according to the features in the image, multiple accurate tags can only be determined when the corresponding features of the tags are more prominent or obvious. Label. If the feature of an object to be labeled in the image is not obvious, it is difficult for the related art to determine the corresponding label. However, the solution provided by the present application will be able to identify the above-mentioned tags that need to be added, please refer to the introduction of the following embodiments.
  • a label classification model can be constructed in combination with the structure of a neural network, so as to realize the above-mentioned multi-label classification method for images. It should be noted that before the label classification model is applied, that is, before the inference stage, it needs to go through a training process, which is described as follows.
  • FIG. 1 is an architecture diagram of a label classification model provided by an embodiment of the present application.
  • the label classification model 100 includes a convolutional neural network 110 , a matrix multiplication module 120 and an activation layer 130 .
  • the convolutional neural network 110 is used to receive the to-be-processed image 1a, and after the to-be-processed image 1a is processed by the convolutional neural network 110, a corresponding image feature matrix 1b is obtained. Subsequently, the image feature matrix 1b and the map feature matrix 1c are multiplied in the matrix multiplication module 120 to obtain the data to be activated 1d, which is input into the activation layer 130, and the activation layer 130 processes the data to be activated to obtain a second label Group 1e.
  • the second tag group 1e includes 3 second tags.
  • FIG. 2 is an architecture diagram of a label classification model provided based on the embodiment shown in FIG. 1 .
  • the knowledge graph includes a label relationship matrix 2a and a node information matrix 2b.
  • the knowledge graph can be input into the graph convolutional neural network 200 to obtain the graph feature matrix 1c.
  • the computer device can re-acquire the label relation matrix 2a and the node information matrix 2b from the updated knowledge graph, and input the label relation matrix 2a and the node information matrix 2b into the graph convolutional neural network 200, so that Obtain the map feature matrix 1c.
  • the graph feature matrix 1c is only updated when the knowledge graph changes.
  • the graph feature matrix 1c keeps the value obtained by the last calculation and participates in the calculation process shown in FIG. 1 .
  • the computer device can train the architecture shown in FIG. 2 when constructing the label classification model.
  • the structure shown in FIG. 2 is also referred to as a dual-branch architecture.
  • the knowledge graph needs to be constructed first.
  • it can be divided into keyword collection stage and knowledge graph construction stage.
  • the server in the cloud can collect a large amount of data of users using mobile phone photo albums.
  • the data collected by the server for the use of the album is desensitized data, and does not involve any user's private information.
  • the server can extract the keywords frequently searched by the user.
  • the keywords frequently searched by the user may be the top n keywords that appear most frequently among the keywords collected by the server.
  • keywords can include entities, scenes, behaviors, or events.
  • entities can include entity objects such as cats, dogs, flowers, vehicles, cakes, balloons, dishes, drinks, shops, rivers, beaches, and oceans.
  • the scene can include scene information such as sunrise and sunset, banquet, playground or sports scene.
  • Behaviors include information such as walking, running, eating, and standing.
  • Events include information such as travel, shopping, or eating.
  • the server may build a tag list including the above keywords. It should be noted that the tag in the tag list here may be the first tag.
  • the server will build a knowledge graph from the list of tags.
  • the server can implement the construction of the knowledge graph by performing the following steps a) to h).
  • Step a) extract the textual label relationship from the textual knowledge graph.
  • the text-based knowledge graph may include a knowledge graph such as ConceptNet or WordNet.
  • the textual label relationship can include the semantically own relationship of the label, such as the inclusion relationship or the predicate relationship.
  • the server will preselect the text-based knowledge graph, and extract the text-based label relationship from the text-based knowledge graph. It should be noted that, in this step, the server can select a knowledge graph with a better use effect of the textual label relationship in the current field.
  • the above-mentioned specific knowledge graph is only an exemplary introduction, and this application does not perform any specific use of the textual indication graph. limited.
  • Step b) extract the interrelationship of labels in the image from the specified image class dataset.
  • the correlation may be a conditional probability.
  • conditional probability please refer to the following formula for the calculation method of the conditional probability.
  • B) is the conditional probability that label A appears when label B appears
  • P(AB) is the probability that label A and label B appear at the same time
  • P(B) is the probability that label B appears.
  • Step c) calculate the weight between the labels.
  • the weight between the text class relationship labels refer to the weight in the text class knowledge graph used in step a).
  • the weights in knowledge graphs such as ConceptNet or WordNet. If multiple text label relationships are merged, the weight of the merged relationship will be weighted and averaged as the merged relationship weight; image labels refer to the conditional probability calculation method in step b), and generally do not merge; if there is no text label relationship weight, the weight will be filled with a value of 0 or 1, 0 means there is no relationship between the two nodes, and 1 means there is a relationship between the two nodes. It should be noted that 0 and 1 are used to fill the relationship between two nodes with a logical relationship.
  • the nodes in the knowledge graph are used to represent labels. For example, the name of a node in the knowledge graph represents the name of a label mentioned in this application.
  • Step d) merge the relationship between the text class label and the image class label relationship as the edge of the knowledge graph.
  • Step e manually sorting attributes such as definitions, keywords, and synonyms of tags.
  • this step technicians read and sort out the knowledge map to logically determine whether the knowledge map is close to the situation in real life, and manually adjust the abnormal data.
  • the purpose of this step is to improve the ability of the knowledge graph to describe the correlation between photos in real life.
  • Step f use the specified algorithm to implement embedding (embedding) on the label, and obtain the word embedding (word embedding).
  • the specified algorithm may be an algorithm with embedding capability such as Glove.
  • step g definitions, keywords, synonyms or word vectors are obtained from the above data as node attributes of the knowledge graph.
  • Step h merge edges and nodes to obtain the established knowledge graph.
  • each node represents a label
  • edges represent the relationship between the labels
  • these relationships include but are not limited to the upper-lower relationship, the correlation relationship, the position relationship in the image and the predicate relationship, etc.
  • the upper-lower relationship is used to indicate the relationship between the upper-level concept and the lower-level concept.
  • Embedding is the word vector obtained by the tag embedding the tag name through NLP (Natural Language Processing) related algorithms.
  • Node types can include objects, scenes, or events, etc.
  • the knowledge graph can be constructed through the above process.
  • the relevant data in the knowledge graph is also fixed.
  • the server can train the label classification model applied in this application according to the architecture shown in FIG. 2 to obtain a label classification model that can be used for inference.
  • the training process of the entire label classification model is introduced.
  • the data in the label classification model that needs to be updated during the training phase are the parameters in the convolutional neural network 110 and the parameters in the graph convolutional neural network 200 .
  • the parameters in the convolutional neural network 110 and the parameters in the graph convolutional neural network 200 are fixed.
  • each graph convolutional layer can be represented by the following formula.
  • H (l) is the input of the current graph convolutional layer.
  • H (1) is the input of the first graph convolutional layer in the graph convolutional neural network 200, that is, H (1) is the node information matrix of the input graph convolutional neural network 200.
  • W (l) is the parameter to be learned during training and ⁇ (.) is the activation function.
  • each graph convolution layer will process the node information output by the previous graph convolution layer to obtain new node information and output it to the next graph convolution layer. There is no change in the convolutional neural network 200 .
  • the computer device can use the model to perform the multi-label classification method for images shown in this application.
  • the model can be used to perform the multi-label classification method for images shown in this application.
  • FIG. 3 is a flowchart of a multi-label classification method for images provided by an embodiment of the present application.
  • FIG. 3 can be applied to a computer device.
  • the computer device can be either a terminal or a server. During the execution of this method, please refer to the following introduction.
  • a computer device can acquire images to be processed.
  • the manner of acquiring the image to be processed may be different according to the specific implementation manner of the computer device.
  • the terminal when the computer device is a terminal, the terminal can directly use the image acquisition component to capture images, and use the captured images as the images to be processed.
  • the terminal may acquire images from other computer devices, and use the acquired images as the images to be processed.
  • the terminal can also synthesize a virtual image according to a specified instruction and data through an installed image synthesizing application, and use the virtual image as the image to be processed.
  • the server may receive an image uploaded by the terminal, and use the image as an image to be processed.
  • the server can also synthesize a virtual image through an installed image synthesizing application according to specified instructions and data, and use the virtual image as the image to be processed.
  • the number of images to be processed may be one or multiple.
  • the computer device may choose to process the multiple images to be processed in a serial manner or in a parallel manner.
  • the computer device In serial mode, the computer device will process the next image after an image has been successfully tagged with at least two second labels.
  • the computer device will process several images simultaneously, and several images will simultaneously obtain their corresponding second labels.
  • Step 310 Extract the image features of the image to be processed through the feature extraction layer in the label classification model, where the label classification model is a neural network model for adding at least two labels to the image to be processed.
  • the label classification model is a neural network model for adding at least two labels to the image to be processed.
  • the computer device after acquiring the image to be processed, the computer device will be able to extract image features from the image to be processed.
  • the computer device extracts the image features of the image to be processed through the feature extraction layer in the label classification model.
  • the label classification model provided in this application is used to provide at least two labels for an image to be processed.
  • the image features may include global features and local features according to different application scenarios.
  • the computer device In response to the image feature being the global feature, the computer device will use the entire image as a material, and extract the feature of the entire image as the image feature of the image to be processed.
  • the computer device In response to the image feature being a local feature, the computer device will use the identified one or more local regions in the entire image as materials, and extract the corresponding special diagnosis as the image feature of the image to be processed.
  • FIG. 4 is a schematic diagram of a global feature provided based on the embodiment shown in FIG. 3 .
  • each pixel in the to-be-processed image 400 is used as a material, and after being processed by a computer device, a global feature 420 is extracted, and the global feature 420 is used to indicate the feature of the to-be-processed image 400 .
  • FIG. 5 is a schematic diagram of another partial feature provided based on the embodiment shown in FIG. 3 .
  • the image 400 to be processed is processed by the computer equipment, three candidate frames appear, and then the computer equipment continues processing, and obtains three sets of local features according to the local images in the three candidate frames, which are respectively the local feature 510 , the local feature 520 and local features 530.
  • the sum of the local features 510, the local features 520 and the local features 530 is referred to as an image feature.
  • step 320 the image features are processed by the graph feature matrix to obtain the data to be activated.
  • the graph feature matrix is a matrix obtained after the knowledge graph is processed by the graph convolutional neural network.
  • the knowledge graph is used to indicate the attributes of the first label itself, and at least two The relationship between the first labels.
  • the computer device After the computer device obtains the image features, the computer device will obtain the atlas feature matrix.
  • the graph feature matrix is a specified matrix obtained from the knowledge graph.
  • the knowledge graph does not change or is not updated, the graph feature matrix will not change. That is, the computer device will update the corresponding graph feature matrix when the internally stored knowledge graph is updated.
  • the knowledge graph stored in the computer device has not changed, the original stored graph feature matrix is not updated.
  • the computer device When the computer device obtains the image feature and the map feature matrix at the same time, the computer device will process the image feature through the map feature matrix to obtain the data to be activated. It should be noted that the calculation method can be adjusted according to the form of the image features.
  • the computer equipment When the image features are in the form of matrices, the computer equipment performs matrix multiplication of the image features and the atlas feature matrix, and uses the result obtained after the multiplication as the data to be activated.
  • the knowledge graph applied in the embodiment of the present application is used to indicate not only the attribute of the first tag itself, but also the relationship between at least two first tags.
  • Step 330 Process the data to be activated through the activation layer in the label classification model to obtain at least two second labels.
  • the computer equipment processes the data to be activated through the activation layer in the label classification model to obtain at least two second labels.
  • the second label is used to indicate a feature in the image to be processed, and each first label is used to indicate that there is a feature in the image to be processed that matches the label. It should be noted that the second label is a label filtered from the first label.
  • the second label may be 3 labels as shown in Table 2.
  • the second label belongs to the first label, and the second label is the label in the first label that best matches the characteristics of the image to be processed.
  • the first tag may also include a person, and the tag of the person may be specific to the person's name, or may only be a tag that represents the person's age, gender, occupation, and other characteristics.
  • the shown tags are the second tags screened out from the first tags shown in Table 1 in the embodiment of the present application, including 4 second tags in total.
  • the computer device obtains that the second tags ocean, beach, dog and landscape are all tags that conform to the characteristics of the image to be processed.
  • the image to be processed if the image includes oceans and dogs with obvious features, it also includes beaches with less obvious features.
  • the solution in the related art only two labels of ocean and dog are marked on the image to be processed with a high probability.
  • the graph feature matrix It can actually provide a strong correlation between the ocean and the beach, as well as the strong correlation between the ocean and the landscape, as well as the strong correlation between the beach and the landscape, that is, the method provided by this application is more likely to identify the ocean.
  • Beach, Dog, and Landscape are also used as second tags for the images to be processed.
  • each first label has its own corresponding threshold.
  • the probability value of the first label obtained by the activation layer is greater than its corresponding threshold, the activation layer determines the first label as the first label.
  • Two labels Illustratively, the data shown in Table 1 and Table 2 are taken as examples for introduction.
  • Table 3 Please refer to Table 3.
  • the data shown in Table 3 are based on the probability value indicated by the first tag shown in Table 1 according to the activation data and the preset threshold.
  • the measured probability of each first label is obtained, and the measured probability is included in the data to be activated processed by the activation layer.
  • the preset threshold corresponding to each first label may be pre-stored in the activation layer.
  • the activation layer can obtain the measured probability between the image to be processed and each label and compare it with a preset threshold, and determine the first label whose measured probability is higher than the preset threshold as the second label. For example, according to the data shown in Table 3, the first tags corresponding to serial numbers 1, 2, 5 and 8 are determined as the second tags.
  • Step 340 at least two second tags are determined as tags of the image to be processed, and the second tags belong to the first tags.
  • the computer device determines at least two second tags, they are used as tags of the image to be processed.
  • the second label may be displayed on the processed image to be processed as visual information.
  • FIG. 6 is a visual interface after image processing provided based on the embodiment shown in FIG. 3 .
  • three second labels attached to it can be displayed below, which are the first second label 620, the second second label 630 and the third second label 640, respectively. They are "Beach”, “Trees” and "Ocean”.
  • the second label may not be used as visual information, but as a kind of attribute information of the image.
  • the attribute information may be stored in the attribute frame of the image, or may be additionally stored in a file designated by the computer device. Among them, the attribute frame of the image, as a part of the image, is copied with the copying of the image, and disappears with the deletion of the image.
  • the computer device can intelligently generate albums according to the multiple tags. For example, when “beach”, “ocean” and “landscape” appear in several images, these images are intelligently combined into a photo album named "Seaside Play". It should be noted that, the operation of intelligently generating an album can be completed on the terminal side or on the server side.
  • the terminal can upload the image captured by the local end to the server through cloud backup or other forms.
  • the server realizes the operation of intelligently generating a photo album for multiple images.
  • the multi-label classification method for images provided by this application can obtain the data to be activated in combination with the atlas feature matrix after extracting the image features of the to-be-processed image, and obtain at least two second labels according to the data to be activated, Use the second label as the label of the image to be processed.
  • the graph feature matrix is a matrix obtained after the knowledge graph is processed by the graph convolutional neural network, and the knowledge graph is used to indicate the attributes of the first label itself, and the relationship between at least two first labels.
  • a knowledge graph reflecting the relationship between each first label is introduced, and the graph feature matrix obtained from the knowledge graph is used to assist in determining the second label, avoiding the need for some second labels to be processed.
  • an embodiment of the present application provides a multi-label classification method for images based on the label classification model.
  • the present application can obtain a more accurate multi-label classification result for an image to be processed.
  • FIG. 7 is a flowchart of a method for classifying images with multiple labels according to another exemplary embodiment of the present application.
  • the multi-label classification method of the image can be applied to the terminal or server shown above.
  • the multi-label classification method for this image includes:
  • Step 711 acquiring the image to be processed.
  • the server acquires the image to be processed from the data transmitted from the terminal.
  • the manner in which the terminal transmits data to the server may include scenarios such as cloud album synchronization, smart album creation, or cloud backup.
  • the terminal when the image to be processed is acquired by the terminal, the terminal will extract the image to be processed from the locally stored gallery, and the image can be either shot by the terminal itself or sent to the terminal after being shot by other terminals. image.
  • the implementation process of the embodiment shown in FIG. 7 is introduced by taking the method applied to the terminal as an example.
  • Step 712 input the image to be processed into the convolutional neural network.
  • the image to be processed can be directly input into the convolutional neural network, and the image is processed through the convolutional neural network.
  • Step 713 Process the image to be processed through a convolutional neural network to obtain an image feature matrix.
  • the convolutional neural network includes several layers of structures, and the convolutional neural network sequentially passes through the above-mentioned several layers of structures to obtain the image feature matrix.
  • the label classification model includes an input layer, a convolutional layer and a pooling layer.
  • the process of processing the image to be processed through the convolutional neural network may include inputting the image to be processed into the input layer, and through the above layer-by-layer processing, an image feature matrix is finally obtained.
  • the computer equipment can input the image to be processed into the input layer to obtain the first intermediate data; input the first intermediate data into the convolution layer to obtain the second intermediate data; input the second intermediate data into the pooling layer to obtain the image features matrix.
  • the first intermediate data is obtained after processing by the input layer.
  • the input layer in the neural network is connected with the convolution layer, and the convolution layer processes the first intermediate data to obtain the second intermediate data.
  • the pooling layer is connected to the convolutional layer. After the pooling layer processes the second intermediate data, the image feature matrix is obtained.
  • the computer device may execute steps 721 to 723 when the atlas feature matrix has not been stored to obtain the atlas feature matrix.
  • the computer device can directly use the stored graph feature matrix in the process of marking the image to be processed with multiple second labels, without the need for Steps 721 to 723 are performed.
  • Step 721 input the label relationship matrix into the graph convolutional neural network, where the label relationship matrix is used to indicate the relationship between at least two first labels.
  • the label relationship matrix is used as one of the inputs, which will be input into the graph convolutional neural network in this embodiment of the present application.
  • the label relationship matrix is used to indicate the relationship between at least two first labels.
  • Step 722 Input the node information matrix into the graph convolutional neural network, where the node information matrix is used to indicate the attributes of the first label itself.
  • the computer device when inputting the label relationship matrix into the graph convolutional neural network, can also input the node information matrix into the graph convolutional neural network.
  • the node information matrix and the label relationship matrix together form a knowledge graph.
  • Step 723 Process the label relationship matrix and the node information matrix through a graph convolutional neural network to obtain a graph feature matrix.
  • the computer device in response to the update of the knowledge graph that generates the graph feature matrix, the computer device will regenerate a new graph feature matrix according to the updated knowledge graph, and store the new graph feature matrix to process the image features and obtain the to-be-activated data.
  • the computer equipment completes the update in response to the data in the knowledge graph, and obtains the updated knowledge graph; processes the updated knowledge graph through the graph convolutional neural network, and obtains the updated graph feature matrix; through the updated graph feature matrix, which updates the graph feature matrix in the label classification model.
  • the update of the knowledge graph can be performed on the server side, and the server calculates the updated graph feature matrix after the knowledge graph is updated, and pushes the updated graph feature matrix to the terminal as new information.
  • the terminal processes the image features according to the new atlas feature matrix, and obtains the data to be activated.
  • the scale of the graph feature matrix is C*N, where C is the number of first labels, N is the feature dimension, and both C and N are positive integers.
  • the size of the image feature matrix is N*1
  • the size of the map feature matrix is C*N
  • the size of the data matrix to be activated is C*1
  • C is the number of first labels
  • N is the feature dimension
  • C and N are positive integers.
  • the size of the data matrix to be activated is C*1.
  • each row of data corresponds to the data after a first label is activated.
  • Step 731 Multiply the image feature matrix and the atlas feature matrix to obtain a data matrix to be activated.
  • Step 732 Process the data matrix to be activated through the activation layer in the label classification model to obtain at least two second labels.
  • the terminal may perform step (3a), step (3b) and step (3c) instead of realizing the effect of obtaining at least two second tags shown in step 732.
  • Step (3a) input the data matrix to be activated into the activation layer.
  • step (3b) the data to be activated is processed by the activation layer to obtain a probability value corresponding to the first label, and the probability value is used to indicate the probability that the first label conforms to the image to be processed.
  • Step (3c) in response to the probability value being higher than the corresponding first threshold, determine the corresponding first label as the second label, and the first threshold is a threshold for judging whether the first label conforms to the image to be processed.
  • the first threshold may be in one-to-one correspondence with the first label.
  • the number of first thresholds is also i.
  • the image to be processed has obtained at least two second tags.
  • Each image to be processed can obtain at least two second labels to which it belongs through the above process.
  • the computer device can achieve the effect of multi-label classification by performing steps 711 to 732 provided in this embodiment of the present application on multiple images to be processed.
  • the embodiment of the present application may also add an image post-processing process, which determines whether to add a specified second label to the to-be-processed image based on features other than the image content of the to-be-processed image. .
  • the computer device acquires the shooting time relationship information between the first image and the second image in response to the first image and the second image having acquired their corresponding second labels.
  • Both the first image and the second image are images to be processed to which the second label has been added.
  • Table IV shows a situation of the second label after the first image and the second image are processed.
  • the second image includes three second labels, namely "ocean”, “dog” and “landscape”, after a plurality of second labels are applied through the scheme shown in this application.
  • the first image is marked with a plurality of second labels through the solution shown in this application, it includes four second labels, namely "ocean", “beach”, "dog” and "landscape".
  • the computer device will acquire the shooting time relationship information between the first image and the second image.
  • the shooting time relationship information is used to indicate the time sequence relationship between the first image and the second image at the shooting time, or the shooting time relationship information is used to indicate the duration between the shooting time of the first image and the shooting time of the second image .
  • the shooting time relationship information indicates a timing relationship.
  • the time sequence relationship includes two cases. The first case is that the shooting time of the first image is earlier than the shooting time of the second image. The second case is that the shooting time of the first image is later than the shooting time of the second image. It should be noted that, because the first image and the second image processed by this application are images captured by the same terminal by default. Therefore, in terms of image capture logic, there is no capture moment of the first image equal to the capture moment of the second image.
  • the first image and the second image are images captured by the same terminal through the same set of cameras.
  • the camera of the smart phone usually includes two groups of a front camera group and a rear camera group, and the smart phone can select one group of cameras to capture images.
  • the smartphone includes only one set of cameras, but the set of cameras can be flipped to capture the front side or flip to capture the back side.
  • the first image and the second image shown in the embodiment of the present application are two images captured by the camera group facing the same side. Among them, the smartphone can determine the current orientation of the camera through the status information.
  • the shooting time relationship information indicates a kind of duration information
  • the duration information is the duration between the shooting time of the first image and the shooting time of the second image
  • the information can be Is a fixed value
  • the precision can be minutes, seconds or milliseconds, etc. It should be noted that the precision value may vary according to the application scenario.
  • the accuracy may be on the order of seconds.
  • the accuracy may be milliseconds, such as a high-speed moving person, a vehicle, or particles in a microscopic scene.
  • the accuracy may be minutes.
  • the embodiments of the present application only provide a schematic introduction to the accuracy of the shooting relationship information, and do not limit the actual scene.
  • the computer equipment can add the target second label to the second label corresponding to the second image when the shooting time relationship information meets the preset conditions, and the target second label is corresponding to the second label of the second image.
  • the preset condition may be used to indicate that the shooting time of the first image is close to the shooting time of the second image. That is, when the shooting time relationship information indicates that the shooting time of the first image and the second image are close, the corresponding shooting time relationship information meets the preset condition.
  • the computer device implements the operation of adding a label to the second image through steps (4a) and (4b).
  • Step (4a) in response to the first image and the second image having acquired the respective corresponding second labels, acquire the target duration.
  • the target duration is the duration between the shooting time of the first image and the shooting time of the second image.
  • Step (4b) in response to the target duration being less than the second threshold, adding the target second label to the second label corresponding to the second image.
  • the computer device can use the label not in the second image and the label in the first image as the second label, and also print it on the second image.
  • the computer device implements the operation of adding a label to the second image through steps (5a) and (5b).
  • Step (5a) in response to the first image and the second image having acquired their corresponding second labels, and the number of the first images is 2k, the k first images are images taken before the second image, and the k first images are the images taken before the second image.
  • the image is an image captured after the second image, and the capturing moment of the first image is obtained.
  • step (5b) in response to the length of the interval where the 2k first images are taken at the time of shooting is less than the third threshold, the target second label is added to the second label corresponding to the second image, where k is an integer greater than or equal to 1.
  • the number of first images may be selected as a total of 2k images, and these first images are images continuously captured by the terminal before and after capturing the second image.
  • FIG. 8 is a schematic diagram of an image post-processing provided based on the embodiment shown in FIG. 7 .
  • the value of k is 3.
  • the terminal continuously shoots the first first image 811, the second first image 812, the third first image 813, Second image 820 , fourth first image 814 , fifth first image 815 , and sixth first image 816 .
  • Table 5 shows the shooting time of each image.
  • the 6 first images all have the second label "beach", and the second image 820 does not correspond to the second label "beach”.
  • the second labels of the second image 820 are “tree” and “ocean”.
  • the second labels of the other 6 first images are “tree”, “ocean” and “beach”.
  • the computer device determines that the three first images are located before the shooting time of the second image, and the other three first images are located after the shooting time of the second image, and obtains the shooting time of each first image .
  • the third threshold is 60 seconds
  • the duration between the shooting period of the first first image 811 and the shooting time of the sixth first image 816 is 46 seconds. That is, if the length of the interval in which the six first images are taken is less than the third threshold of 60 seconds, the computer device will use the second label "beach" in the six first images as the second processing stage 8B.
  • the target second label copied to the second label corresponding to the second image. It should be noted that, in the first processing stage 8A and the second processing stage 8B, the second label corresponding to the first image does not change. Therefore, the first image is not shown repeatedly in FIG. 8 .
  • FIG. 9 is a schematic diagram of a process of automatically generating an album provided by an embodiment of the present application.
  • the computer device acquires several images to be processed that need to be processed. If the computer device is a server, the image acquisition stage 9A may be to receive photos uploaded by the terminal, and the process of landing may be that the terminal performs cloud backup or album backup in the server. If the computer device is a terminal, the image acquisition stage 9A may be a process of taking pictures. After the pictures are taken and stored, the computer device has obtained several pieces of devices to be processed.
  • the computer device may add at least two second labels to the to-be-processed image through the label classification model provided in the present application in the multi-label determination stage 9B.
  • the computer device can divide the images to be processed into the first image and the second image in the image post-processing stage 9C, and classify the images according to the first image and the second image. Whether the photographing time relationship information between the images meets a predetermined condition determines whether the second image is supplemented with a target second label, which is a label corresponding to the first image and not corresponding to the second image.
  • the computer device can generate a designated album according to a preset strategy, and the album includes the to-be-processed image.
  • the computer device selects m tags, generates the first album according to the images to be processed including the m tags, and generates the name of the album according to the selected m tags.
  • the computer device will define a specified shooting location and m tags, and generate a second album of similar content shot at the specified shooting location.
  • the computer device will define a specified shooting time and m tags to generate a third album of similar content shot at the specified shooting time.
  • the solutions provided by the embodiments of the present application can add multiple tags to the images to be processed under the premise of high accuracy, and can intelligently generate corresponding albums based on this, which improves the efficiency and accuracy of automatically generating albums , reducing the occurrence of missing images that actually meet the standards of the album when generating the album.
  • the label classification model used in this embodiment includes the structure of a convolutional neural network.
  • the convolutional neural network is used to extract the image content in the image to be processed.
  • the convolutional neural network extracts the image feature matrix, , which can be processed by the map feature matrix derived from the knowledge map to obtain the data to be activated.
  • the data to be activated is processed by the activation layer, at least two second labels can be obtained, and the effect of marking multiple labels for the image to be processed has been completed.
  • the embodiment of the present application can also introduce a graph convolutional neural network to process the knowledge graph, so as to obtain a graph feature matrix for processing the image feature matrix, so that when multiple second labels are added to the to-be-processed image, the entry can be activated.
  • the data before the layer is checked and balanced by the mutual relationship between the first nodes in the knowledge graph, thereby avoiding the omission of inconspicuous labels in the image to be processed, and improving the accuracy of adding multiple second labels to the image to be processed.
  • the computer equipment will detect whether there is a second label not marked on the image to be processed in the image adjacent to the image to be processed, if the image adjacent to the image to be processed is different from the image to be processed at the time of shooting If the shooting times of the images are relatively close, the present application marks the images adjacent to the images to be processed with second labels that are not marked on the to-be-processed images on the to-be-processed images to improve the accuracy of the second label labeling.
  • the The second label will be marked on the image to be processed, thereby further improving the accuracy of the second label labeling.
  • FIG. 10 is a structural block diagram of an apparatus for classifying images with multiple labels according to an exemplary embodiment of the present application.
  • the image multi-label classification device can be implemented as all or a part of the terminal through software, hardware or a combination of the two.
  • the apparatus includes a feature extraction module 1010 , a first acquisition module 1020 , a label acquisition module 1030 and a label determination module 1040 . The specific functions of the above modules are introduced.
  • the feature extraction module 1010 is configured to extract image features of the image to be processed through a feature extraction layer in a label classification model, where the label classification model is a neural network model for adding at least two labels to the image to be processed.
  • the first acquisition module 1020 is configured to process the image features through a graph feature matrix to obtain the data to be activated.
  • the graph feature matrix is a matrix obtained after the knowledge graph is processed by a graph convolutional neural network, and the knowledge graph is used to indicate The properties of the first tag itself, and the relationship between at least two of the first tags.
  • the label obtaining module 1030 is configured to process the data to be activated through the activation layer in the label classification model to obtain at least two second labels.
  • the label determination module 1040 is configured to determine at least two of the second labels as labels of the image to be processed, and the second labels belong to the first labels.
  • the first obtaining module 1020 is configured to multiply the image feature matrix and the atlas feature matrix to obtain a data matrix to be activated.
  • the label obtaining module 1030 is configured to process the data matrix to be activated through the activation layer in the label classification model to obtain at least two second labels.
  • the knowledge graph involved in the apparatus includes a label relationship matrix and a node information matrix
  • the apparatus further includes a first input module, a second input module, and a second acquisition module.
  • the first input module is configured to input the label relationship matrix into the graph convolutional neural network, where the label relationship matrix is used to indicate the relationship between at least two of the first labels;
  • the second input a module for inputting the node information matrix into the graph convolutional neural network, where the node information matrix is used to indicate the attributes of the first label itself;
  • the second acquisition module is used for passing the graph volume
  • the product neural network processes the label relationship matrix and the node information matrix to obtain the graph feature matrix.
  • the apparatus further includes a third acquiring module, a fourth acquiring module and a matrix updating module.
  • the third acquisition module is used to complete the update in response to the data in the knowledge map, and acquire the updated knowledge map;
  • the fourth acquisition module is used to process the updated knowledge map through the graph convolutional neural network
  • the updated knowledge graph is obtained, and the updated graph feature matrix is obtained;
  • the matrix update module is configured to update the graph feature matrix in the label classification model through the updated graph feature matrix.
  • the scale of the graph feature matrix involved in the device is C*N, where C is the number of the first labels, N is the feature dimension, and both C and N are positive integer.
  • the size of the image feature matrix involved in the apparatus is N*1
  • the size of the atlas feature matrix is C*N
  • the size of the data matrix to be activated is C*1
  • C is the number of the first label
  • N is the feature dimension
  • both C and N are positive integers.
  • the to-be-processed images involved in the apparatus include a first image and a second image
  • the apparatus further includes a post-processing module, where the post-processing module is configured to respond to the first image An image and the second image have acquired the respective corresponding second labels, and acquired the shooting time relationship information between the first image and the second image, and the shooting time relationship information is used to indicate the the time sequence relationship between the first image and the second image at the shooting time, or the shooting time relationship information is used to indicate the duration between the shooting time of the first image and the shooting time of the second image;
  • the target second label is added to the second label corresponding to the second image, and the target second label corresponds to the first image and does not correspond to the second label on the second image.
  • the post-processing module is configured to, in response to the target duration being less than a second threshold, add the target second label to the second label corresponding to the second image, the The target duration is the duration between the capture moment of the first image and the capture moment of the second image.
  • the post-processing module is configured to respond that the number of the first images is 2k, and among the 2k first images, the k first images are the The images taken before the second image, the k first images are images taken after the second image, and the shooting time of the first image is obtained; in response to the interval in which the shooting time of the 2k first images is located is less than the third threshold, the target second label is added to the second label corresponding to the second image, the target second label is the label corresponding to the 2k first images, and the target second label is the label corresponding to the first image.
  • the second label is the second label not corresponding to the second image, and k is an integer greater than or equal to 1.
  • the label classification model used in this embodiment includes the structure of a convolutional neural network.
  • the convolutional neural network is used to extract the image content in the image to be processed.
  • the convolutional neural network extracts the image feature matrix, , which can be processed by the map feature matrix derived from the knowledge map to obtain the data to be activated.
  • the data to be activated is processed by the activation layer, at least two second labels can be obtained, and the effect of marking multiple labels for the image to be processed has been completed.
  • the embodiment of the present application can also introduce a graph convolutional neural network to process the knowledge graph, so as to obtain a graph feature matrix for processing the image feature matrix, so that when multiple second labels are added to the to-be-processed image, the entry can be activated.
  • the data before the layer is checked and balanced by the mutual relationship between the first nodes in the knowledge graph, thereby avoiding the omission of inconspicuous labels in the image to be processed, and improving the accuracy of adding multiple second labels to the image to be processed.
  • the computer equipment will detect whether there is a second label not marked on the image to be processed in the image adjacent to the image to be processed, if the image adjacent to the image to be processed is different from the image to be processed at the time of shooting If the shooting times of the images are relatively close, the present application marks the images adjacent to the images to be processed with second labels that are not marked on the to-be-processed images on the to-be-processed images to improve the accuracy of the second label labeling.
  • the The second label will be marked on the image to be processed, thereby further improving the accuracy of the second label labeling.
  • the multi-label classification method for images shown in the embodiments of the present application may be applied to a computer device, and the computer device may be a terminal having a display screen and a multi-label image classification function.
  • Terminals can include mobile phones, tablet computers, laptop computers, desktop computers, computer all-in-one computers, servers, workstations, TVs, set-top boxes, smart glasses, smart watches, digital cameras, MP4 playback terminals, MP5 playback terminals, learning machines, point-of-view computer, electronic paper book, electronic dictionary, vehicle terminal, virtual reality (Virtual Reality, VR) playback terminal or augmented reality (Augmented Reality, AR) playback terminal, etc.
  • VR Virtual Reality
  • AR Augmented Reality
  • FIG. 11 is a structural block diagram of a terminal provided by an exemplary embodiment of the present application.
  • the terminal includes a processor 1120 and a memory 1140 , and the memory 1140 stores at least one instruction , the instructions are loaded and executed by the processor 1120 to implement the multi-label classification method for images according to the various method embodiments of the present application.
  • the processor 1120 may include one or more processing cores.
  • the processor 1120 uses various interfaces and lines to connect various parts in the entire terminal 110, and executes the terminal by running or executing the instructions, programs, code sets or instruction sets stored in the memory 1140, and calling the data stored in the memory 1140. 110 various functions and processing data.
  • the processor 1120 may use at least one of digital signal processing (Digital Signal Processing, DSP), field-programmable gate array (Field-Programmable Gate Array, FPGA), and programmable logic array (Programmable Logic Array, PLA).
  • DSP Digital Signal Processing
  • FPGA Field-Programmable Gate Array
  • PLA programmable logic array
  • a hardware form is implemented.
  • the processor 1120 may integrate one or a combination of a central processing unit (Central Processing Unit, CPU), a graphics processing unit (Graphics Processing Unit, GPU), a modem, and the like.
  • the CPU mainly handles the operating system, user interface, and application programs; the GPU is used to render and draw the content that needs to be displayed on the display screen; the modem is used to handle wireless communication. It can be understood that, the above-mentioned modem may not be integrated into the processor 1120, and is implemented by a single chip.
  • the memory 1140 may include random access memory (Random Access Memory, RAM), or may include read-only memory (Read-Only Memory, ROM).
  • the memory 1140 includes a non-transitory computer-readable storage medium.
  • Memory 1140 may be used to store instructions, programs, codes, sets of codes, or sets of instructions.
  • the memory 1140 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playback function, an image playback function, etc.), Instructions, etc., used to implement the following method embodiments; the storage data area can store data and the like involved in the following method embodiments.
  • the computer device may also be a server, and the structure of the server may refer to the structure shown in FIG. 12 .
  • FIG. 12 is a schematic structural diagram of a server provided by an embodiment of the present application.
  • the server is used to implement the application deployment method provided by the above embodiment. Specifically:
  • the server 1200 includes a central processing unit (CPU) 1201, a system memory 1204 including a random access memory (RAM) 1202 and a read only memory (ROM) 1203, and a system bus 1205 connecting the system memory 1204 and the central processing unit 1201.
  • the server 1200 also includes a basic input/output system (Input/Output, I/O system) 1206 that helps to transfer information between various devices in the computer, and is used to store the operating system 1213, application programs 1214 and other program modules 1215
  • the basic input/output system 1206 includes a display 1208 for displaying information and input devices 1209 such as a mouse, keyboard, etc., for user input of information.
  • the display 1208 and the input device 1209 are both connected to the central processing unit 1201 through the input and output controller 1210 connected to the system bus 1205.
  • the basic input/output system 1206 may also include an input output controller 1210 for receiving and processing input from a number of other devices such as a keyboard, mouse, or electronic stylus.
  • input output controller 1210 also provides output to a display screen, printer, or other type of output device.
  • the mass storage device 1207 is connected to the central processing unit 1201 through a mass storage controller (not shown) connected to the system bus 1205 .
  • the mass storage device 1207 and its associated computer-readable media provide non-volatile storage for the server 1200 . That is, the mass storage device 1207 may include a computer-readable medium (not shown) such as a hard disk or a CD-ROM (Compact Disc Read-Only Memory) drive.
  • Computer-readable media can include computer storage media and communication media.
  • Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.
  • Computer storage media include RAM, ROM, EPROM (Electrical Programmable Read Only Memory), EEPROM (Electrically Erasable Programmable Read Only Memory, electrically erasable programmable read only memory), flash memory or other solid-state storage technologies, CD-ROM, DVD (Digital Video Disc, High Density Digital Video Disc) or other optical storage, cassette, magnetic tape, magnetic disk storage or other magnetic storage device.
  • RAM random access memory
  • ROM read only Memory
  • EPROM Electrical Programmable Read Only Memory
  • EEPROM Electrically Erasable Programmable Read Only Memory
  • flash memory or other solid-state storage technologies
  • CD-ROM DVD (Digital Video Disc, High Density Digital Video Disc) or other optical storage, cassette, magnetic tape, magnetic disk storage or other magnetic storage device.
  • the server 1200 may also be operated by connecting to a remote computer on the network through a network such as the Internet. That is, the server 1200 can be connected to the network 1212 through the network interface unit 1211 connected to the system bus 1205, or can also use the network interface unit 1211 to connect to other types of networks or remote computer systems.
  • Embodiments of the present application further provide a computer-readable medium, where the computer-readable medium stores at least one instruction, and the at least one instruction is loaded and executed by the processor to realize the multiplexing of images according to the above embodiments. Label classification method.
  • the apparatus for multi-label classification of images provided in the above embodiments executes the method for multi-label classification of images
  • only the division of the above-mentioned functional modules is used as an example for illustration. In practical applications, the above-mentioned functions may be allocated as required. It is completed by different functional modules, that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above.
  • the apparatus for multi-label classification of images provided by the above embodiments and the embodiments of the multi-label classification method for images belong to the same concept, and the specific implementation process is detailed in the method embodiments, which will not be repeated here.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Animal Behavior & Ethology (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)

Abstract

The embodiments of the present application disclose an image multi-tag classification method and apparatus, a computer device, and a storage medium, belonging to the technical field of image processing. The present application provides an image multi-tag classification method, and can, upon acquisition of an image to be processed, input said image to a tag classification model, so as to obtain image features corresponding to said image, can obtain a graph feature matrix according to a knowledge graph, and combine the image features and the graph feature matrix to obtain data to be activated, and then can obtain, according to said data, at least two second tags corresponding to said image. The knowledge graph is used to indicate the relationship between tags and the attributes of the tags themselves. When adding multiple tags to said image, the tag classification model uses information provided by the knowledge graph, and thus the present application improves the reliability of the multiple tags obtained for said image, and reduces the complexity of acquisition of the multiple tags.

Description

图像的多标签分类方法、装置、计算机设备及存储介质Image multi-label classification method, device, computer equipment and storage medium
本申请要求于2020年12月9日提交的申请号为202011451978.1、发明名称为“图像的多标签分类方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with the application number 202011451978.1 and the invention titled "Multi-label classification method, apparatus, computer equipment and storage medium for images" filed on December 9, 2020, the entire contents of which are incorporated by reference in this application.
技术领域technical field
本申请实施例涉及图像处理技术领域,特别涉及一种图像的多标签分类方法、装置、计算机设备及存储介质。The embodiments of the present application relate to the technical field of image processing, and in particular, to a method, apparatus, computer equipment, and storage medium for multi-label classification of images.
背景技术Background technique
随着人工智能技术的快速发展,终端智能分类相册中的图像的能力也越来越强。With the rapid development of artificial intelligence technology, the ability of terminals to intelligently classify images in albums is getting stronger and stronger.
相关技术中,人工智能技术提供的机器学习模型能够智能地判断出图像中包含的对象的种类,从而给该图像打上相应的标签。然而,在面对需要对单幅图像打上多个标签的场景时,现有模型将会出现错误率上升或者运算速度慢等问题。In the related art, the machine learning model provided by the artificial intelligence technology can intelligently determine the type of the object contained in the image, so as to label the image accordingly. However, when faced with a scene where multiple labels need to be added to a single image, the existing model will have problems such as increased error rate or slow operation speed.
发明内容SUMMARY OF THE INVENTION
本申请实施例提供了一种图像的多标签分类方法、装置、计算机设备及存储介质。所述技术方案如下:Embodiments of the present application provide a multi-label classification method, apparatus, computer device, and storage medium for images. The technical solution is as follows:
根据本申请的一方面内容,提供了一种图像的多标签分类方法,所述方法包括:According to an aspect of the present application, there is provided a multi-label classification method for images, the method comprising:
通过标签分类模型中的特征提取层提取待处理图像的图像特征,所述标签分类模型是用于为所述待处理图像添加至少两个标签的神经网络模型;Extract the image features of the image to be processed through the feature extraction layer in the label classification model, where the label classification model is a neural network model for adding at least two labels to the image to be processed;
通过图谱特征矩阵处理所述图像特征,获得待激活数据,所述图谱特征矩阵是知识图谱经过图卷积神经网络处理后得到的矩阵,所述知识图谱用于指示第一标签自身的属性,以及,至少两个所述第一标签之间的关系;The image features are processed by a graph feature matrix to obtain the data to be activated. The graph feature matrix is a matrix obtained after a knowledge graph is processed by a graph convolutional neural network, and the knowledge graph is used to indicate the attributes of the first label itself, and , the relationship between at least two of the first tags;
通过所述标签分类模型中的激活层处理所述待激活数据,得到至少两个第二标签;The data to be activated is processed by the activation layer in the label classification model to obtain at least two second labels;
将至少两个所述第二标签确定为所述待处理图像的标签,所述第二标签属于所述第一标签。At least two of the second tags are determined as tags of the image to be processed, and the second tags belong to the first tags.
根据本申请的另一方面内容,提供了一种图像的多标签分类装置,所述装置包括:According to another aspect of the present application, there is provided a multi-label classification device for images, the device comprising:
第一获取模块,用于获取待处理图像;a first acquisition module, used for acquiring the image to be processed;
特征提取模块,用于提取所述待处理图像的图像特征;a feature extraction module for extracting image features of the to-be-processed image;
第二获取模块,用于根据所述图像特征和图谱特征矩阵,获得待激活数据,所述图谱特征矩阵是知识图谱经过图卷积神经网络处理后得到的矩阵,所述知识图谱用于指示第一标签自身的属性,以及,至少两个所述第一标签之间的关系;The second obtaining module is configured to obtain the data to be activated according to the image feature and the graph feature matrix, the graph feature matrix is a matrix obtained after the knowledge graph is processed by the graph convolutional neural network, and the knowledge graph is used to indicate the first An attribute of a tag itself, and a relationship between at least two of the first tags;
标签确定模块,用于根据所述待激活数据,获得至少两个第二标签,将至少两个所述第二标签确定为所述待处理图像的标签,所述第二标签属于所述第一标签。A label determination module, configured to obtain at least two second labels according to the data to be activated, and determine the at least two second labels as labels of the to-be-processed image, where the second labels belong to the first label Label.
根据本申请的另一方面内容,提供了一种终端,所述终端包括处理器和存储器,所述存储器中存储有至少一条指令,所述指令由所述处理器加载并执行以实现如本申请各个方面提供的图像的多标签分类方法。According to another aspect of the present application, a terminal is provided, the terminal includes a processor and a memory, the memory stores at least one instruction, and the instruction is loaded and executed by the processor to implement the method as described in the present application A multi-label classification method for images provided by various aspects.
根据本申请的另一方面内容,提供了一种计算机可读存储介质,所述存储介质中存储有至少一条指令,所述指令由处理器加载并执行以实现如本申请各个方面提供的图像的多标签分类方法。According to another aspect of the present application, there is provided a computer-readable storage medium having stored therein at least one instruction, the instruction being loaded and executed by a processor to implement image processing as provided by various aspects of the present application Multi-label classification methods.
根据本申请的一个方面,提供了一种计算机程序产品,该计算机程序产品包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行上述图像的多标签分类方面的各种可选实现方式中提供的方法。According to one aspect of the present application, there is provided a computer program product comprising computer instructions stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the methods provided in the various optional implementations of the above-described aspect of multi-label classification of images.
附图说明Description of drawings
为了更清楚地介绍本申请实施例中的技术方案,下面将对本申请实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。In order to introduce the technical solutions in the embodiments of the present application more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments of the present application. Obviously, the drawings in the following description are only some embodiments of the present application. , for those of ordinary skill in the art, other drawings can also be obtained based on these drawings without any creative effort.
图1是本申请实施例提供的一种标签分类模型的架构图;1 is an architecture diagram of a label classification model provided by an embodiment of the present application;
图2是基于图1所示实施例提供的一种标签分类模型的架构图;Fig. 2 is an architecture diagram of a label classification model provided based on the embodiment shown in Fig. 1;
图3是本申请实施例提供的一种图像的多标签分类方法的流程图;3 is a flowchart of a multi-label classification method for images provided by an embodiment of the present application;
图4是基于图3所示实施例提供的一种全局特征的示意图;4 is a schematic diagram of a global feature provided based on the embodiment shown in FIG. 3;
图5是基于图3所示实施例提供的另一种局部特征的示意图;Fig. 5 is a schematic diagram of another local feature provided based on the embodiment shown in Fig. 3;
图6是基于图3所示的实施例提供的一种图像处理后的可视界面;Fig. 6 is a kind of visual interface after image processing provided based on the embodiment shown in Fig. 3;
图7是本申请另一个示例性实施例提供的一种图像的多标签分类方法流程图;7 is a flowchart of a multi-label classification method for images provided by another exemplary embodiment of the present application;
图8是基于图7所示实施例提供的一种图像后处理的示意图;FIG. 8 is a schematic diagram of an image post-processing provided based on the embodiment shown in FIG. 7;
图9是本申请实施例提供的一种自动生成相册的过程示意图;9 is a schematic diagram of a process for automatically generating an album provided by an embodiment of the present application;
图10是本申请一个示例性实施例提供的一种图像的多标签分类装置的结构框图;10 is a structural block diagram of an apparatus for multi-label classification of images provided by an exemplary embodiment of the present application;
图11是本申请一个示例性实施例提供的一种终端的结构框图;11 is a structural block diagram of a terminal provided by an exemplary embodiment of the present application;
图12是本申请一个实施例提供的一种服务器的结构示意图。FIG. 12 is a schematic structural diagram of a server provided by an embodiment of the present application.
具体实施方式Detailed ways
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。In order to make the objectives, technical solutions and advantages of the present application clearer, the embodiments of the present application will be further described in detail below with reference to the accompanying drawings.
下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本申请相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本申请的一些方面相一致的装置和方法的例子。Where the following description refers to the drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the illustrative examples below are not intended to represent all implementations consistent with this application. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present application as recited in the appended claims.
在本申请的描述中,需要理解的是,术语“第一”、“第二”等仅用于描述目的,而不能理解为指示或暗示相对重要性。在本申请的描述中,需要说明的是,除非另有明确的规定和限定,术语“相连”、“连接”应做广义理解,例如,可以是固定连接,也可以是可拆卸连接,或一体地连接;可以是机械连接,也可以是电连接;可以是直接相连,也可以通过中间媒介间接相连。对于本领域的普通技术人员而言,可以具体情况理解上述术语在本申请中的具体含义。此外,在本申请的描述中,除非另有说明,“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。In the description of the present application, it should be understood that the terms "first", "second" and the like are used for descriptive purposes only, and should not be construed as indicating or implying relative importance. In the description of this application, it should be noted that, unless otherwise expressly specified and limited, the terms "connected" and "connected" should be understood in a broad sense, for example, it may be a fixed connection, a detachable connection, or an integrated connection. Ground connection; it can be a mechanical connection or an electrical connection; it can be directly connected or indirectly connected through an intermediate medium. For those of ordinary skill in the art, the specific meanings of the above terms in this application can be understood in specific situations. Also, in the description of the present application, unless otherwise specified, "a plurality" means two or more. "And/or", which describes the association relationship of the associated objects, means that there can be three kinds of relationships, for example, A and/or B, which can mean that A exists alone, A and B exist at the same time, and B exists alone. The character "/" generally indicates that the associated objects are an "or" relationship.
本申请提供了一种图像的多标签分类方法,其中,所述方法包括:通过标签分类模型中的特征提取层提取待处理图像的图像特征,所述标签分类模型是用于为所述待处理图像添加至少两个标签的神经网络模型;通过图谱特征矩阵处理所述图像特征,获得待激活数据,所述图谱特征矩阵是知识图谱经过图卷积神经网络处理后得到的矩阵,所述知识图谱用于指示第一标签自身的属性,以及,至少两个所述第一标签之间的关系;通过所述标签分类模型中的激活层处理所述待激活数据,得到至少两个第二标签;将至少两个所述第二标签确定为所述待处理图像的标签,所述第二标签属于所述第一标签。The present application provides an image multi-label classification method, wherein the method includes: extracting image features of an image to be processed through a feature extraction layer in a label classification model, where the label classification model is used for the to-be-processed image. A neural network model that adds at least two labels to an image; the image features are processed through a graph feature matrix to obtain data to be activated, and the graph feature matrix is a matrix obtained by processing a knowledge graph through a graph convolutional neural network, and the knowledge graph used to indicate the attributes of the first label itself, and the relationship between at least two of the first labels; the data to be activated is processed by the activation layer in the label classification model to obtain at least two second labels; At least two of the second tags are determined as tags of the image to be processed, and the second tags belong to the first tags.
可选的,所述通过图谱特征矩阵处理所述图像特征,获得待激活数据,包括:令所述图像特征矩阵和所述图谱特征矩阵相乘,得到待激活数据矩阵;所述通过所述标签分类模型中的激活层处理所述待激活数据,得到至少两个第二标签,包括:通过所述标签分类模型中的所述激活层处理所述待激活数据矩阵,得到至少两个所述第二标签。Optionally, the processing of the image features by the atlas feature matrix to obtain the data to be activated includes: multiplying the image feature matrix and the atlas feature matrix to obtain the data matrix to be activated; The activation layer in the classification model processes the to-be-activated data to obtain at least two second labels, including: processing the to-be-activated data matrix through the activation layer in the label classification model to obtain at least two of the second labels Two labels.
可选的,所述知识图谱包括标签关系矩阵和节点信息矩阵,所述方法还包括:将所述标签关系矩阵输入所述图卷积神经网络,所述标签关系矩阵用于指示至少两个所述第一标签之间的关系;将所述节点信息矩阵输入所述图卷积神经网络,所述节点信息矩阵用于指示所述第一标签自身的属性;通过所述图卷积神经网络处理所述标签关系矩阵和所述节点信息矩阵,获得所述图谱特征矩阵。Optionally, the knowledge graph includes a label relationship matrix and a node information matrix, and the method further includes: inputting the label relationship matrix into the graph convolutional neural network, where the label relationship matrix is used to indicate at least two the relationship between the first labels; input the node information matrix into the graph convolutional neural network, the node information matrix is used to indicate the attributes of the first label itself; process through the graph convolutional neural network The label relationship matrix and the node information matrix are used to obtain the graph feature matrix.
可选的,所述方法还包括:响应于所述知识图谱中的数据完成更新,获取更新后的知识图谱;通过所述图卷积神经网络处理所述更新后的知识图谱,获得更新后的所述图谱特征矩阵;通过所述更新后的所述图谱特征矩阵,更新所述标签分类模型中的所述图谱特征矩阵。Optionally, the method further includes: completing the update in response to the data in the knowledge graph, and obtaining an updated knowledge graph; processing the updated knowledge graph through the graph convolutional neural network, and obtaining an updated knowledge graph. The atlas feature matrix; the atlas feature matrix in the label classification model is updated through the updated atlas feature matrix.
可选的,所述图谱特征矩阵的规模是C*N,其中,C是所述第一标签的个数,N是特征维数,C和N均为正整数。Optionally, the scale of the graph feature matrix is C*N, where C is the number of the first labels, N is the feature dimension, and both C and N are positive integers.
可选的,所述图像特征矩阵的规模是N*1,所述图谱特征矩阵的规模是C*N,所述待激活数据矩阵的规模是C*1,C是所述第一标签的个数,N是特征维数,C和N均为正整数。Optionally, the scale of the image feature matrix is N*1, the scale of the graph feature matrix is C*N, the scale of the data matrix to be activated is C*1, and C is the number of the first label. number, N is the feature dimension, and C and N are both positive integers.
可选的,所述待处理图像包括第一图像和第二图像,所述方法还包括:响应于所述第一图像和所述第二图像已获取各自对应的所述第二标签,获取所述第一图像和所述第二图像之间的拍摄时刻关系信息,所述拍摄时刻关系信息用于指示所述第一图像和所述第二图像在拍摄时刻上的时序关系,或者,所述拍摄时刻关系信息用于指示所述第一图像的拍摄时刻和所述第二图像的拍摄时刻之间的时长;响应于所述拍摄时刻关系信息符合预设条件,将目标第二标签增加为所述第二图像对应的所述第二标签,所述目标第二标签是对应于所述第一图像且不对应于所述第二图像的所述第二标签。Optionally, the to-be-processed image includes a first image and a second image, and the method further includes: acquiring the corresponding second label in response to the first image and the second image having acquired the corresponding second label. shooting time relationship information between the first image and the second image, the shooting time relationship information is used to indicate the time sequence relationship between the first image and the second image at the shooting time, or the The shooting time relationship information is used to indicate the time period between the shooting time of the first image and the shooting time of the second image; in response to the shooting time relationship information meeting the preset condition, the target second tag is added to all the second label corresponding to the second image, the target second label is the second label corresponding to the first image and not corresponding to the second image.
可选的,所述响应于所述拍摄时刻关系信息符合预设条件,将目标第二标签增加为所述第二图像对应的所述第二标签,包括:响应于目标时长小于第二阈值,将所述目标第二标签增加为所述第二图像对应的所述第二标签,所述目标时长是所述第一图像的拍摄时刻和所述第二图像的拍摄时刻之间的时长。Optionally, the adding the target second label to the second label corresponding to the second image in response to the photographing moment relationship information meeting a preset condition includes: in response to the target duration being less than a second threshold, The target second label is added as the second label corresponding to the second image, and the target duration is the duration between the shooting time of the first image and the shooting time of the second image.
可选的,所述响应于所述拍摄时刻关系信息符合预设条件,将目标第二标签增加为所述第二图像对应的所述第二标签,包括:响应于所述第一图像的数量为2k张,在2k张所述第一图像中,k张所述第一图像是所述第二图像之前拍摄的图像,k张所述第一图像是所述第二图像之后拍摄的图像,获取所述第一图像的拍摄时刻;响应于2k张所述第一图像的拍摄时刻所在区间的长度小于第三阈值,将目标第二标签增加为所述第二图像对应的所述第二标签,所述目标第二标签是2k张所述第一图像均对应的标签且所述目标第二标签是所述第二图像不对应的所述第二标签,k为大于或等于1的整数。Optionally, the adding the target second label to the second label corresponding to the second image in response to the photographing moment relationship information meeting a preset condition includes: responding to the number of the first images is 2k, among the 2k first images, k first images are images taken before the second image, k first images are images taken after the second image, Acquiring the shooting moment of the first image; in response to the length of the interval where the shooting moments of the 2k first images are located is less than a third threshold, adding a target second label to the second label corresponding to the second image , the target second label is the label corresponding to the 2k first images and the target second label is the second label not corresponding to the second image, and k is an integer greater than or equal to 1.
本申请提供一种图像的多标签分类方法,能够在获取待处理图像后将待处理图像输入标签分类模型,得到图像对应的图像特征,同时根据知识图谱获得图谱特征矩阵,结合图像特征和图谱特征矩阵得到待激活数据,再依据待激活数据得到待处理图像对应的至少两个第二标签。其中,知识图谱用于指示标签之间的关系以及标签自身的属性。由于标签分类模型在为待处理图像添加多个标签时使用了知识图谱提供的信息,因此,本申请提高了待处理图像得到的多个标签的可靠性,同时降低了获取多个标签的复杂度。The present application provides a multi-label classification method for images, which can input the to-be-processed image into a label classification model after acquiring the to-be-processed image, obtain image features corresponding to the image, obtain a map feature matrix according to a knowledge map, and combine image features and map features The matrix obtains the data to be activated, and then at least two second labels corresponding to the image to be processed are obtained according to the data to be activated. Among them, the knowledge graph is used to indicate the relationship between tags and the attributes of the tags themselves. Since the label classification model uses the information provided by the knowledge graph when adding multiple labels to the image to be processed, the present application improves the reliability of the multiple labels obtained from the image to be processed, while reducing the complexity of acquiring multiple labels .
为了本申请实施例所示方案易于理解,下面对本申请实施例中出现的若干名词进行介绍。In order to facilitate the understanding of the solutions shown in the embodiments of the present application, several terms appearing in the embodiments of the present application are introduced below.
待处理图像:用于被添加标签的图像。一种可能的方式中,待处理图像是终端拍摄的图像。另一种可能的方式中,待处理图像是其它计算机设备拍摄的图像,终端为该图像添加标签。另一种可能的方式中,待处理图像还可以是其它计算机设备按照指定算法或者其它图像工具生成的虚拟图像。Pending Image: The image used to be tagged. In a possible manner, the image to be processed is an image captured by the terminal. In another possible manner, the image to be processed is an image captured by other computer equipment, and the terminal adds a tag to the image. In another possible manner, the image to be processed may also be a virtual image generated by other computer equipment according to a specified algorithm or other image tools.
可选地,当待处理图像是设备拍摄的真实图像时,本申请可以将待处理图像的获取方式划分为两个途径。示意性的,第一个途径是应用本申请提供的图像的多标签分类方法的设备,通过设备自身配备的图像采集组件进行拍摄的途径。第二个途径是应用本申请提供的图像的多标签分类方法的设备以外的设备,通过图像传输等手段获取,进而再传输给进行多标签分类的设备的途径。Optionally, when the image to be processed is a real image captured by a device, the present application may divide the acquisition manner of the image to be processed into two approaches. Illustratively, the first approach is to use the device of the multi-label classification method for images provided by the present application, and to shoot through the image acquisition component equipped by the device itself. The second way is to obtain the device other than the device applying the multi-label classification method of images provided by the present application, obtain it by means such as image transmission, and then transmit it to the device for multi-label classification.
举例来看,当本申请实施例应用在例如手机之类的终端中时,待处理图像可以是终端通过自身的摄像头拍摄的图像。当本申请实施例应用在例如服务器之类的设备中时,待处理图像可以是是服务器通过网络获取终端传输的图像,该图像仍然是终端通过摄像头拍摄的图像。For example, when the embodiments of the present application are applied to a terminal such as a mobile phone, the image to be processed may be an image captured by the terminal through its own camera. When the embodiment of the present application is applied to a device such as a server, the image to be processed may be an image acquired by the server through a network and transmitted by the terminal, and the image is still an image captured by the terminal through a camera.
神经网络模型:是由大量且简单的处理单元互相连接而形成的复杂网络系统。其中,处理单元又可称之为神经元。该神经网络模型能够反应人脑功能的许多基本特征,实质是一个非线性动力学习系统。在本申请中,神经网络模型是应用了神经网络结构的数学模型。一种可能的实现方式中,神经网络模型中的一部分采用了神经网络结构,另一部分可以是采用其他数据结构,上述部分互相配合,以对数据进行处理,得到设计人员希望得到的结果。例如,本申请使用的标签分类模型能够对待处理图像打上至少两个第二标签,实现图像的多标签分类能力。Neural network model: It is a complex network system formed by interconnecting a large number of simple processing units. Among them, the processing unit may also be referred to as a neuron. The neural network model can reflect many basic features of human brain function, and is essentially a nonlinear dynamic learning system. In this application, a neural network model is a mathematical model to which a neural network structure is applied. In a possible implementation manner, a part of the neural network model adopts a neural network structure, and the other part may adopt other data structures, and the above parts cooperate with each other to process the data and obtain the result desired by the designer. For example, the label classification model used in this application can add at least two second labels to the image to be processed, so as to realize the multi-label classification capability of the image.
卷积神经网络(Convolutional Neural Networks,CNN):是一类包含卷积计算且具有深度结构的前馈神经网络(Feedforward Neural Networks),是深度学习(deep learning)中的应用最为广泛的算法之一。Convolutional Neural Networks (CNN): It is a kind of Feedforward Neural Networks (Feedforward Neural Networks) that includes convolutional computation and has a deep structure. It is one of the most widely used algorithms in deep learning. .
示意性的,CNN通常的结构包括输入层、隐含层和输出层。Illustratively, the usual structure of CNN includes input layer, hidden layer and output layer.
首先,输入层用于接收输入CNN的数据。通常而言,输入层能够处理多维数据,当输入的数据是图像时,输入层接收的三维输入数据,该三维输入数据用于指示像素点的坐标以及RGB(Red Green Blue,红绿蓝)通道。可选地,在像素点的数据进入输入层之前,可以进行归一化处理,将像素点RGB通道的值从[0,255]归一化至[0,1],以便提高CNN的学习效率以及推理能力。First, the input layer is used to receive data that is fed into the CNN. Generally speaking, the input layer can process multi-dimensional data. When the input data is an image, the three-dimensional input data received by the input layer is used to indicate the coordinates of the pixels and the RGB (Red Green Blue, red, green and blue) channels. . Optionally, before the pixel data enters the input layer, normalization can be performed to normalize the value of the RGB channel of the pixel from [0, 255] to [0, 1], so as to improve the learning efficiency and reasoning of the CNN ability.
其次,隐含层可以包括卷积层、池化层和全连接层3种常见的结构。Secondly, the hidden layer can include three common structures: convolutional layer, pooling layer and fully connected layer.
(1)卷积层(convolutional layer),功能在于提取输入数据的特征。(1) Convolutional layer, whose function is to extract the features of the input data.
针对卷积层的介绍,可以从卷积核(convolutional kernel)、卷积层参数和激励函数三个角度进行。For the introduction of the convolutional layer, it can be carried out from three perspectives: the convolutional kernel, the parameters of the convolutional layer and the excitation function.
A、卷积核。卷积层中包含多个卷积核。针对一个卷积核,包括若干个元素,每一个元素对应有一个权重系数和一个偏差量(bias vector)。其中,元素类似于前馈神经网络中的神经元(neuron)。A. Convolution kernel. The convolutional layer contains multiple convolution kernels. For a convolution kernel, it includes several elements, and each element corresponds to a weight coefficient and a bias vector. Among them, the elements are similar to neurons in a feedforward neural network.
B、卷积层参数。卷积层参数包括卷积核大小、步长和填充,上述三个参数共同决定了卷积层输出特征图的尺寸,是卷积神经网络的超参数。其中,卷积核大小是小于输入图像尺寸的任意值。相应的,卷积核越大,可提取的输入特征也越复杂。卷积步长定义了卷积核相邻两次扫过特征图时位置的距离。响应于卷积步长为1时,卷积核将会逐个扫过特征图的元素,步长为n时会在下一次扫描时跳过n-1个元素。需要说明的是,填充的目的在于保持经过卷积核处理后的特征维度。B. Convolutional layer parameters. The parameters of the convolutional layer include the size of the convolution kernel, the stride and the padding. The above three parameters together determine the size of the output feature map of the convolutional layer, which is the hyperparameter of the convolutional neural network. where the kernel size is an arbitrary value smaller than the input image size. Correspondingly, the larger the convolution kernel, the more complex the extractable input features. The convolution stride defines the distance between the positions of the convolution kernel when it scans the feature map twice adjacently. In response to a convolution stride of 1, the convolution kernel will sweep through the elements of the feature map one by one, and a stride of n will skip n-1 elements in the next scan. It should be noted that the purpose of padding is to maintain the feature dimension processed by the convolution kernel.
C、激励函数(activation function)。激励函数的作用在于协助表达较为复杂的特征。C. The activation function. The role of the excitation function is to assist in expressing more complex features.
(2)池化层(pooling layer)。(2) Pooling layer.
在卷积层进行特征提取之后,输出的特征图会被传递至池化层进行特征选择和信息过滤。其中,池化层设置有预设定的池化函数,池化层的作用是将特征图中单个点的结果替换为其相邻区域的特征图统计量。After the convolutional layer performs feature extraction, the output feature map is passed to the pooling layer for feature selection and information filtering. Among them, the pooling layer is set with a preset pooling function, and the function of the pooling layer is to replace the result of a single point in the feature map with the feature map statistics of its adjacent areas.
(3)全连接层(fully-connected layer)。(3) Fully-connected layer.
全连接层用于对前述层提取的特征进行非线性组合以得到输出,将数据输出至输出层。The fully connected layer is used to non-linearly combine the features extracted by the aforementioned layers to obtain the output, and output the data to the output layer.
再次,CNN中的输出层的上游通常是全连接层。针对图像分类问题,输出层使用逻辑函数或归一化指数函数(softmax function)输出分类标签。在物体识别(object detection)问题中,输出层可被设计为输出物体的中心坐标、大小和分类。在语义分割中,输出层输出每个像素的分类结果。Again, the upstream of the output layer in a CNN is usually a fully connected layer. For image classification problems, the output layer uses a logistic function or a normalized exponential function (softmax function) to output the classification labels. In the object detection problem, the output layer can be designed to output the center coordinates, size and classification of objects. In semantic segmentation, the output layer outputs the classification result of each pixel.
图卷积神经网络(Graph Convolutional Network,GCN):是一种用于对图数据进行特征提取的卷积神经网络。Graph Convolutional Neural Network (GCN): It is a convolutional neural network used for feature extraction from graph data.
知识图谱:一种用于指示多个节点各自的属性以及多个节点互相之间的关系的图数据。在一种可能的方式中,知识图谱包括标签关系矩阵和节点信息矩阵。上述两种矩阵的总和即可被称为知识图谱。Knowledge graph: a graph data used to indicate the respective attributes of multiple nodes and the relationship between multiple nodes. In a possible way, the knowledge graph includes a label relationship matrix and a node information matrix. The sum of the above two matrices can be called a knowledge graph.
本申请提供了一种图像的多标签分类方法,针对相关技术中对单幅图像进行多标签分类产生的错误率高或者运算速度慢等问题有较好的改善作用。需要说明的是,相关技术中由于仅对图像进行特征提取,根据图像中的特征确定与具体哪些标签更贴近,因此仅能在标签对应的特征比较突出或者比较明显的时候确定出多个准确的标签。若在图像中某个需要添加标签的对象的特征不明显,则相关技术难以确定出对应的标签。然而,本申请提供的方案将能够识别出上述需要添加的标签,请参见下述实施例的介绍。The present application provides a multi-label classification method for images, which can effectively improve the problems of high error rate or slow operation speed caused by multi-label classification of a single image in the related art. It should be noted that, in the related art, since only the feature extraction is performed on the image, and which tags are closer to the specific tags are determined according to the features in the image, multiple accurate tags can only be determined when the corresponding features of the tags are more prominent or obvious. Label. If the feature of an object to be labeled in the image is not obvious, it is difficult for the related art to determine the corresponding label. However, the solution provided by the present application will be able to identify the above-mentioned tags that need to be added, please refer to the introduction of the following embodiments.
本申请实施例能够结合神经网络的结构构建标签分类模型,来实现上述图像的多标签分类方法。需要说明的是,该标签分类模型在应用之前,也即在推理阶段之前,需要先经过训练过程,介绍如下。In the embodiments of the present application, a label classification model can be constructed in combination with the structure of a neural network, so as to realize the above-mentioned multi-label classification method for images. It should be noted that before the label classification model is applied, that is, before the inference stage, it needs to go through a training process, which is described as follows.
请参见图1,图1是本申请实施例提供的一种标签分类模型的架构图。在图1中,标签分类模型100包括卷积神经网络110、矩阵相乘模块120和激活层130。Please refer to FIG. 1. FIG. 1 is an architecture diagram of a label classification model provided by an embodiment of the present application. In FIG. 1 , the label classification model 100 includes a convolutional neural network 110 , a matrix multiplication module 120 and an activation layer 130 .
在图1所示的标签分类模型100中,卷积神经网络110用于接收待处理图像1a,待处理图像1a经过卷积神经网络110处理后,得到对应的图像特征矩阵1b。随后,图像特征矩阵1b和图谱特征矩阵1c在矩阵相乘模块120中相乘,得到待激活数据1d,待激活数据1d输入到激活层130中,激活层130处理待激活数据后得到第二标签组1e。在图1中,第二标签组1e包括3个第二标签。In the label classification model 100 shown in FIG. 1 , the convolutional neural network 110 is used to receive the to-be-processed image 1a, and after the to-be-processed image 1a is processed by the convolutional neural network 110, a corresponding image feature matrix 1b is obtained. Subsequently, the image feature matrix 1b and the map feature matrix 1c are multiplied in the matrix multiplication module 120 to obtain the data to be activated 1d, which is input into the activation layer 130, and the activation layer 130 processes the data to be activated to obtain a second label Group 1e. In FIG. 1, the second tag group 1e includes 3 second tags.
需要说明的是,图1中的图谱特征矩阵1c是随着知识图谱的更新而更新的数据。结合图谱特征矩阵1c的更新途径,可以得到另一种标签分类模型的架构。请参见图2,图2是基于图1所示实施例提供的一种标签分类模型的架构图。It should be noted that the graph feature matrix 1c in FIG. 1 is data updated with the update of the knowledge graph. Combined with the update approach of the graph feature matrix 1c, another architecture of the label classification model can be obtained. Please refer to FIG. 2 . FIG. 2 is an architecture diagram of a label classification model provided based on the embodiment shown in FIG. 1 .
在图2中,知识图谱包括标签关系矩阵2a和节点信息矩阵2b。知识图谱可以输入图卷积神经网络200中,得到图谱特征矩阵1c。当知识图谱产生更新后,计算机设备可以从更新后的知识图谱中重新获取标签关系矩阵2a和节点信息矩阵2b,并将标签关系矩阵2a和节点信息矩阵2b输入至图卷积神经网络200,以便得到图谱特征矩阵1c。示意性的,图谱特征矩阵1c仅在知识图谱发生变化时进行更新。图谱特征矩阵1c在知识图谱不发生变化时,保持上一次计算得到的数值参与图1所示的计算过程中。In Figure 2, the knowledge graph includes a label relationship matrix 2a and a node information matrix 2b. The knowledge graph can be input into the graph convolutional neural network 200 to obtain the graph feature matrix 1c. After the knowledge graph is updated, the computer device can re-acquire the label relation matrix 2a and the node information matrix 2b from the updated knowledge graph, and input the label relation matrix 2a and the node information matrix 2b into the graph convolutional neural network 200, so that Obtain the map feature matrix 1c. Illustratively, the graph feature matrix 1c is only updated when the knowledge graph changes. When the knowledge graph does not change, the graph feature matrix 1c keeps the value obtained by the last calculation and participates in the calculation process shown in FIG. 1 .
基于图1和图2所示的分类的架构中,计算机设备可以在构成该标签分类模型时,对图2所示的架构进行训练。在另一种可能的方式,图2所示的结构又称为双分支架构。In the architecture based on the classification shown in FIG. 1 and FIG. 2 , the computer device can train the architecture shown in FIG. 2 when constructing the label classification model. In another possible way, the structure shown in FIG. 2 is also referred to as a dual-branch architecture.
针对图2所示的模型进行训练时,首先需要构建知识图谱。针对知识图谱的构建过程,可以分为关键字收集阶段和知识图谱构建阶段。When training the model shown in Figure 2, the knowledge graph needs to be constructed first. For the construction process of knowledge graph, it can be divided into keyword collection stage and knowledge graph construction stage.
在关键字收集阶段中,处于云端的服务器能够收集海量的用户使用手机端相册的数据。需要说明的是,服务器收集的该使用相册的数据是经过脱敏的数据,不涉及任何用户的隐私信息。当服务器获取该使用相册的数据时,服务器能够提取出用户经常搜索的关键字。其中,用户经常搜索的关键字可以是服务器收集的关键字中出现次数最多的前n个关键字。针对关键字的类型而言,可以包括实体、场景、行为或事件等种类。其中,实体可以包括猫、狗、花、车辆、蛋糕、气球、菜肴、饮品、商店、河流、沙滩、海洋等实体对象。场景可包括日出日落、宴会、游乐场或运动场景等场景信息。行为包括走、跑、吃、站等信息。事件包括旅行、逛街或吃饭等信息。In the keyword collection stage, the server in the cloud can collect a large amount of data of users using mobile phone photo albums. It should be noted that the data collected by the server for the use of the album is desensitized data, and does not involve any user's private information. When the server obtains the data of the use album, the server can extract the keywords frequently searched by the user. The keywords frequently searched by the user may be the top n keywords that appear most frequently among the keywords collected by the server. For the types of keywords, it can include entities, scenes, behaviors, or events. Among them, entities can include entity objects such as cats, dogs, flowers, vehicles, cakes, balloons, dishes, drinks, shops, rivers, beaches, and oceans. The scene can include scene information such as sunrise and sunset, banquet, playground or sports scene. Behaviors include information such as walking, running, eating, and standing. Events include information such as travel, shopping, or eating.
在服务器确定用户经常搜索的关键字之后,服务器可以建立包括上述关键字的标签列表。需要说明的是,这里的标签列表中的标签可以是第一标签。After the server determines the keywords frequently searched by the user, the server may build a tag list including the above keywords. It should be noted that the tag in the tag list here may be the first tag.
接下来,服务器将根据标签列表构建知识图谱。在该阶段中,服务器可以通过执行下述步骤a)至步骤h)以实现知识图谱的构建。Next, the server will build a knowledge graph from the list of tags. In this stage, the server can implement the construction of the knowledge graph by performing the following steps a) to h).
步骤a),从文本类知识图谱中提取文本类标签关系。Step a), extract the textual label relationship from the textual knowledge graph.
示意性的,文本类知识图谱可以包括ConceptNet或WordNet等知识图谱。文本类标签关系可以包 括标签语义上自有的关系,例如包含关系或谓语关系等。在本步骤中,服务器将预先选定文本类知识图谱,从该文本类知识图谱中提取文本类标签关系。需要说明的,服务器在该步骤中,可选用当前领域中文本类标签关系使用效果较好的知识图谱,上述具体的知识图谱仅为示例性介绍,本申请不对具体的使用的文本类指示图谱进行限定。Illustratively, the text-based knowledge graph may include a knowledge graph such as ConceptNet or WordNet. The textual label relationship can include the semantically own relationship of the label, such as the inclusion relationship or the predicate relationship. In this step, the server will preselect the text-based knowledge graph, and extract the text-based label relationship from the text-based knowledge graph. It should be noted that, in this step, the server can select a knowledge graph with a better use effect of the textual label relationship in the current field. The above-mentioned specific knowledge graph is only an exemplary introduction, and this application does not perform any specific use of the textual indication graph. limited.
步骤b),从指定图像类数据集中提取标签在图像中的互相关系。Step b), extract the interrelationship of labels in the image from the specified image class dataset.
示意性的,互相关系可以是条件概率。响应于互相关系是条件概率,该条件概率的计算方法请参考如下公式。Illustratively, the correlation may be a conditional probability. In response to the mutual relationship being the conditional probability, please refer to the following formula for the calculation method of the conditional probability.
Figure PCTCN2021122741-appb-000001
Figure PCTCN2021122741-appb-000001
其中,P(A|B)是标签B出现时,标签A出现的条件概率,P(AB)为标签A和标签B同时出现的概率,P(B)为标签B出现的概率。Among them, P(A|B) is the conditional probability that label A appears when label B appears, P(AB) is the probability that label A and label B appear at the same time, and P(B) is the probability that label B appears.
步骤c),计算标签之间的权重。Step c), calculate the weight between the labels.
在本步骤中,文本类关系标签之间的权重,参照步骤a)中所使用的文本类知识图谱中的权重。例如,参照ConceptNet或WordNet等知识图谱中的权重。如果合并多个文本类标签关系,则将合并的关系权重做加权平均,作为合并后的关系权重;图像类标签参照步骤b)的条件概率计算方法,一般不做合并;如果文本类标签关系没有权重,则权重将通过0或1的数值填充,0表示两个节点之间没有关系,1表示两个节点之间存在关系。需要说明的是,0与1用于填充具有逻辑关系的两个节点之间的关系。在本申请实施例中,知识图谱中的节点用于表示标签。例如,知识图谱中的一个节点的名称表示本申请中提及的一个标签的名称。In this step, for the weight between the text class relationship labels, refer to the weight in the text class knowledge graph used in step a). For example, refer to the weights in knowledge graphs such as ConceptNet or WordNet. If multiple text label relationships are merged, the weight of the merged relationship will be weighted and averaged as the merged relationship weight; image labels refer to the conditional probability calculation method in step b), and generally do not merge; if there is no text label relationship weight, the weight will be filled with a value of 0 or 1, 0 means there is no relationship between the two nodes, and 1 means there is a relationship between the two nodes. It should be noted that 0 and 1 are used to fill the relationship between two nodes with a logical relationship. In the embodiments of the present application, the nodes in the knowledge graph are used to represent labels. For example, the name of a node in the knowledge graph represents the name of a label mentioned in this application.
步骤d),合并文本类标签关系和图像类标签关系,作为知识图谱的边。Step d), merge the relationship between the text class label and the image class label relationship as the edge of the knowledge graph.
步骤e),人工整理标签的定义、关键词、同义词等属性。Step e), manually sorting attributes such as definitions, keywords, and synonyms of tags.
在该步骤中,技术人员通过阅读梳理知识图谱,从逻辑上确定知识图谱和实际生活中的情况是否贴近,对出现异常的数据进行人工调整。本步骤的目的在于提高知识图谱的描述实际生活中的照片之间的关联性的能力。In this step, technicians read and sort out the knowledge map to logically determine whether the knowledge map is close to the situation in real life, and manually adjust the abnormal data. The purpose of this step is to improve the ability of the knowledge graph to describe the correlation between photos in real life.
步骤f),使用指定算法实现对标签执行embedding(嵌入),得到词向量(word embedding)。Step f), use the specified algorithm to implement embedding (embedding) on the label, and obtain the word embedding (word embedding).
其中,指定算法可以是Glove等具备embedding能力的算法。The specified algorithm may be an algorithm with embedding capability such as Glove.
步骤g),从上述数据中获取定义、关键词、同义词或词向量,作为知识图谱的节点属性。In step g), definitions, keywords, synonyms or word vectors are obtained from the above data as node attributes of the knowledge graph.
步骤h),合并边和节点,得到建立的知识图谱。Step h), merge edges and nodes to obtain the established knowledge graph.
在上述建立的知识图谱中,每个节点代表一个标签,边代表标签之间的关系,这些关系包括但不限于上下义关系、相关关系、在图像中的位置关系和谓语关系等。其中,上下义关系用于指示上位概念和下位概念之间的关系。In the above-established knowledge graph, each node represents a label, and edges represent the relationship between the labels, these relationships include but are not limited to the upper-lower relationship, the correlation relationship, the position relationship in the image and the predicate relationship, etc. Among them, the upper-lower relationship is used to indicate the relationship between the upper-level concept and the lower-level concept.
例如,“宠物”是“猫”的上义概念,“猫”是“宠物”的下义概念。相关关系用于指示两个标签同时出现在一张图像中的概率。在图像中的位置关系用于指示两个标签在图像中的位置关系,例如“苹果”在“桌子”的上方,“地板”在“桌子”的下方。谓语关系用于指示一些标签的定义等。例如,“苹果”是“食物”。For example, "pet" is a superordinate concept of "cat", and "cat" is a subordinate concept of "pet". Correlation is used to indicate the probability of two labels appearing in an image at the same time. The positional relationship in the image is used to indicate the positional relationship of the two labels in the image, for example, "apple" is above "table" and "floor" is below "table". Predicate relationships are used to indicate the definition of some labels, etc. For example, "apple" is "food".
针对每个节点的属性而言,上述属性包括但不限于下列内容Embedding、节点类型和同义词。其中,Embedding是标签通过NLP(Natural Language Processing,自然语言处理)相关算法对标签名称作词嵌入得到的词向量。节点类型可以包括物体、场景或事件等。For the properties of each node, the above properties include but are not limited to the following content Embedding, node type and synonyms. Among them, Embedding is the word vector obtained by the tag embedding the tag name through NLP (Natural Language Processing) related algorithms. Node types can include objects, scenes, or events, etc.
综上所述,知识图谱可以通过上述过程完成构建,当一个知识图谱完成构建之后,该知识图谱中的相关数据也被固定下来。接下来,服务器将可以根据图2所示的架构对本申请所应用的标签分类模型进行训练,得到可以用于推理的标签分类模型。To sum up, the knowledge graph can be constructed through the above process. When a knowledge graph is constructed, the relevant data in the knowledge graph is also fixed. Next, the server can train the label classification model applied in this application according to the architecture shown in FIG. 2 to obtain a label classification model that can be used for inference.
以图2所示的标签分类模型为例,介绍整个标签分类模型的训练过程。标签分类模型中需要在训练阶段被更新的数据是卷积神经网络110中的参数和图卷积神经网络200中的参数。响应于训练过程结束,卷积神经网络110中的参数和图卷积神经网络200中的参数固定下来。Taking the label classification model shown in Figure 2 as an example, the training process of the entire label classification model is introduced. The data in the label classification model that needs to be updated during the training phase are the parameters in the convolutional neural network 110 and the parameters in the graph convolutional neural network 200 . In response to the end of the training process, the parameters in the convolutional neural network 110 and the parameters in the graph convolutional neural network 200 are fixed.
在图卷积神经网络200训练时,每个图卷积层可以通过以下公式表示。When the graph convolutional neural network 200 is trained, each graph convolutional layer can be represented by the following formula.
Figure PCTCN2021122741-appb-000002
Figure PCTCN2021122741-appb-000002
在上述公式中,H (l)是当前图卷积层的输入。H (1)是图卷积神经网络200中的第一层图卷积层的 输入,也即,H (1)是输入图卷积神经网络200节点信息矩阵。
Figure PCTCN2021122741-appb-000003
是标签关系矩阵A加上自连接之后的矩阵,即
Figure PCTCN2021122741-appb-000004
Figure PCTCN2021122741-appb-000005
Figure PCTCN2021122741-appb-000006
的度矩阵,
Figure PCTCN2021122741-appb-000007
相当于对邻接矩阵
Figure PCTCN2021122741-appb-000008
做归一化。W (l)是训练过程中要学习的参数,σ(.)为激活函数。
In the above formula, H (l) is the input of the current graph convolutional layer. H (1) is the input of the first graph convolutional layer in the graph convolutional neural network 200, that is, H (1) is the node information matrix of the input graph convolutional neural network 200.
Figure PCTCN2021122741-appb-000003
is the label relationship matrix A plus the matrix after self-connection, that is
Figure PCTCN2021122741-appb-000004
Figure PCTCN2021122741-appb-000005
for
Figure PCTCN2021122741-appb-000006
The degree matrix of ,
Figure PCTCN2021122741-appb-000007
Equivalent to the adjacency matrix
Figure PCTCN2021122741-appb-000008
Do normalization. W (l) is the parameter to be learned during training and σ(.) is the activation function.
在训练过程中,每个图卷积层会将上一层图卷积层输出的节点信息进行处理,得到新的节点信息输出给下一个图卷积层,而依据的图结构A在整个图卷积神经网络200中是不改变的。During the training process, each graph convolution layer will process the node information output by the previous graph convolution layer to obtain new node information and output it to the next graph convolution layer. There is no change in the convolutional neural network 200 .
当标签分类模型完成训练之后,计算机设备能够利用该模型进行本申请所示的图像的多标签分类方法,详情请参见图3的介绍。After the training of the label classification model is completed, the computer device can use the model to perform the multi-label classification method for images shown in this application. For details, please refer to the introduction of FIG. 3 .
请参考图3,图3是本申请实施例提供的一种图像的多标签分类方法的流程图。图3可以应用在计算机设备中,在申请实施例中,计算机设备既可以是终端,也可以是服务器。在本方法的执行过程中,请参见如下介绍。Please refer to FIG. 3 . FIG. 3 is a flowchart of a multi-label classification method for images provided by an embodiment of the present application. FIG. 3 can be applied to a computer device. In the embodiment of the application, the computer device can be either a terminal or a server. During the execution of this method, please refer to the following introduction.
在本申请中,计算机设备可以获取待处理图像。In this application, a computer device can acquire images to be processed.
其中,关于获取待处理图像的方式,可以根据计算机设备的具体实现方式而有所区别。The manner of acquiring the image to be processed may be different according to the specific implementation manner of the computer device.
示意性的,当计算机设备是终端时,终端可以直接通过图像采集组件拍摄图像,将拍摄到的图像作为待处理图像。在另一种可能的方式中,终端可以从其他计算机设备中获取图像,将获取到的图像作为待处理图像。在又一种可能的方式中,终端还可以通过已安装的图像合成应用,按照指定的指令和数据合成虚拟的图像,并将该虚拟的图像作为待处理图像。Illustratively, when the computer device is a terminal, the terminal can directly use the image acquisition component to capture images, and use the captured images as the images to be processed. In another possible manner, the terminal may acquire images from other computer devices, and use the acquired images as the images to be processed. In another possible manner, the terminal can also synthesize a virtual image according to a specified instruction and data through an installed image synthesizing application, and use the virtual image as the image to be processed.
示意性的,当计算机设备是服务器时,服务器可以接收终端上传的图像,将该图像作为待处理图像。或者,服务器也可以通过已安装的图像合成应用,按照指定的指令和数据合成虚拟的图像,并将该虚拟的图像作为待处理图像。Illustratively, when the computer device is a server, the server may receive an image uploaded by the terminal, and use the image as an image to be processed. Alternatively, the server can also synthesize a virtual image through an installed image synthesizing application according to specified instructions and data, and use the virtual image as the image to be processed.
针对待处理图像的数量而言,待处理图像可以是一张,也可以多张。当待处理图像是多张时,计算机设备可以选择串行方式或者并行方式处理多张待处理图像。Regarding the number of images to be processed, the number of images to be processed may be one or multiple. When there are multiple images to be processed, the computer device may choose to process the multiple images to be processed in a serial manner or in a parallel manner.
在串行方式中,计算机设备将在一张图像被成功打上至少两个第二标签后,在对下一张图像进行处理。In serial mode, the computer device will process the next image after an image has been successfully tagged with at least two second labels.
在并行方式中,计算机设备将同时处理若干张图像,若干张图像将同时获得各自对应的第二标签。In the parallel manner, the computer device will process several images simultaneously, and several images will simultaneously obtain their corresponding second labels.
需要说明的是,本申请实施例对待处理图像的数量以及计算机设备采用的串行方式或并行方式,将根据实际应用的场景而有所不同,本申请实施例对此不作限定。It should be noted that the number of images to be processed and the serial mode or parallel mode adopted by the computer device in the embodiment of the present application will vary according to the actual application scenario, which is not limited in the embodiment of the present application.
步骤310,通过标签分类模型中的特征提取层提取待处理图像的图像特征,标签分类模型是用于为待处理图像添加至少两个标签的神经网络模型。Step 310: Extract the image features of the image to be processed through the feature extraction layer in the label classification model, where the label classification model is a neural network model for adding at least two labels to the image to be processed.
示意性的,在获取待处理图像后,计算机设备将能够从待处理图像中提取出图像特征。在本例中,计算机设备通过标签分类模型中的特征提取层提取待处理图像的图像特征。在实际应用中,本申请提供的标签分类模型用于为一张待处理图像提供至少两个标签。Illustratively, after acquiring the image to be processed, the computer device will be able to extract image features from the image to be processed. In this example, the computer device extracts the image features of the image to be processed through the feature extraction layer in the label classification model. In practical applications, the label classification model provided in this application is used to provide at least two labels for an image to be processed.
可选地,图像特征可以根据应用场景的不同而包括全局特征和局部特征。Optionally, the image features may include global features and local features according to different application scenarios.
响应于图像特征是全局特征,计算机设备将以整张图像为素材,提取整张图像的特征作为待处理图像的图像特征。In response to the image feature being the global feature, the computer device will use the entire image as a material, and extract the feature of the entire image as the image feature of the image to be processed.
响应于图像特征是局部特征,计算机设备将以整张图像中的识别出的一个或多个局部区域为素材,提取相应的特诊作为待处理图像的图像特征。In response to the image feature being a local feature, the computer device will use the identified one or more local regions in the entire image as materials, and extract the corresponding special diagnosis as the image feature of the image to be processed.
请参见图4,图4是基于图3所示实施例提供的一种全局特征的示意图。在图4中,待处理图像400中的每一个像素均被作为素材,经过计算机设备处理后,提取出全局特征420,该全局特征420用于指示待处理图像400的特征。Please refer to FIG. 4 , which is a schematic diagram of a global feature provided based on the embodiment shown in FIG. 3 . In FIG. 4 , each pixel in the to-be-processed image 400 is used as a material, and after being processed by a computer device, a global feature 420 is extracted, and the global feature 420 is used to indicate the feature of the to-be-processed image 400 .
请参见图5,图5是基于图3所示实施例提供的另一种局部特征的示意图。在图5中,待处理图像400经过计算机设备处理后,出现3个候选框,然后计算机设备继续处理,根据3个候选框中的局部图像得到3组局部特征,分别为局部特征510、局部特征520和局部特征530。在本申请实施例中,局部特征510、局部特征520和局部特征530的总和被称之为图像特征。Please refer to FIG. 5 , which is a schematic diagram of another partial feature provided based on the embodiment shown in FIG. 3 . In FIG. 5 , after the image 400 to be processed is processed by the computer equipment, three candidate frames appear, and then the computer equipment continues processing, and obtains three sets of local features according to the local images in the three candidate frames, which are respectively the local feature 510 , the local feature 520 and local features 530. In this embodiment of the present application, the sum of the local features 510, the local features 520 and the local features 530 is referred to as an image feature.
步骤320,通过图谱特征矩阵处理图像特征,获得待激活数据,图谱特征矩阵是知识图谱经过图卷积神经网络处理后得到的矩阵,知识图谱用于指示第一标签自身的属性,以及,至少两个第一标签之间的关系。In step 320, the image features are processed by the graph feature matrix to obtain the data to be activated. The graph feature matrix is a matrix obtained after the knowledge graph is processed by the graph convolutional neural network. The knowledge graph is used to indicate the attributes of the first label itself, and at least two The relationship between the first labels.
当计算机设备获得图像特征后,计算机设备将获取到图谱特征矩阵。需要说明的是,图谱特征矩阵 是根据知识图谱得到的一个指定的矩阵。当知识图谱未发生变化或者没有更新时,图谱特征矩阵将不变化。也即,计算机设备将在内部存储的知识图谱更新时,更新出对应的图谱特征矩阵。当计算机设备内部存储的知识图谱未发生变化时,不更新原本存储的图谱特征矩阵。After the computer device obtains the image features, the computer device will obtain the atlas feature matrix. It should be noted that the graph feature matrix is a specified matrix obtained from the knowledge graph. When the knowledge graph does not change or is not updated, the graph feature matrix will not change. That is, the computer device will update the corresponding graph feature matrix when the internally stored knowledge graph is updated. When the knowledge graph stored in the computer device has not changed, the original stored graph feature matrix is not updated.
在计算机设备同时获得图像特征和图谱特征矩阵时,计算机设备将通过图谱特征矩阵处理图像特征,得到待激活数据。需要说明的是,计算方式可以根据图像特征的形式而有相应的调整。当图像特征是矩阵的形式时,计算机设备将图像特征与图谱特征矩阵进行矩阵相乘的处理方式,将相乘后得到的结果作为待激活数据。When the computer device obtains the image feature and the map feature matrix at the same time, the computer device will process the image feature through the map feature matrix to obtain the data to be activated. It should be noted that the calculation method can be adjusted according to the form of the image features. When the image features are in the form of matrices, the computer equipment performs matrix multiplication of the image features and the atlas feature matrix, and uses the result obtained after the multiplication as the data to be activated.
需要说明的是,在本申请实施例中应用的知识图谱既用于指示第一标签自身的属性,也用于指示至少两个第一标签之间的关系。It should be noted that, the knowledge graph applied in the embodiment of the present application is used to indicate not only the attribute of the first tag itself, but also the relationship between at least two first tags.
步骤330,通过标签分类模型中的激活层处理待激活数据,得到至少两个第二标签。Step 330: Process the data to be activated through the activation layer in the label classification model to obtain at least two second labels.
在本申请中,计算机设备通过标签分类模型中的激活层对待激活数据进行处理,得到至少两个第二标签。该第二标签用于指示待处理图像中的特征,每一个第一标签用于指示待处理图像中存在符合标签的特征。需要说明的是,第二标签是从第一标签中筛选出来的标签。In the present application, the computer equipment processes the data to be activated through the activation layer in the label classification model to obtain at least two second labels. The second label is used to indicate a feature in the image to be processed, and each first label is used to indicate that there is a feature in the image to be processed that matches the label. It should be noted that the second label is a label filtered from the first label.
例如,第一标签中包括如表一所示的9个标签,则第二标签可以是如表二所示的3个标签。其中,第二标签归属于第一标签,第二标签是第一标签中最符合待处理图像的特征的标签。For example, if the first label includes 9 labels as shown in Table 1, the second label may be 3 labels as shown in Table 2. Wherein, the second label belongs to the first label, and the second label is the label in the first label that best matches the characteristics of the image to be processed.
序号serial number 11 22 33 44 55 66 77 88 99
标签Label 沙滩beach 海洋ocean 日出日落sunrise sunset cat dog 宴会banquet 逛街shop 风景landscape 汽车car
表一Table I
在表一中,示出了一种可能的第一标签的种类。需要说明的是,表一所示类别仅为示例性说明,不对本申请实施例所使用的第一标签的种类形成限定。在一种可能的方式中,第一标签中还可以包括人物,该人物的标签可以具体到人的名字,或者,也可以仅是表示人的年龄或性别或职业等特征的标签。In Table 1, one possible kind of first label is shown. It should be noted that the categories shown in Table 1 are only illustrative, and do not limit the types of the first labels used in the embodiments of the present application. In a possible manner, the first tag may also include a person, and the tag of the person may be specific to the person's name, or may only be a tag that represents the person's age, gender, occupation, and other characteristics.
序号serial number 11 22 33 44
标签Label 海洋ocean 沙滩beach dog 风景landscape
表二Table II
在表二中,所示的标签是本申请实施例从表一所示的第一标签中筛选出的第二标签,一共包括4个第二标签。换言之,针对待处理图像,计算机设备得出第二标签海洋、沙滩、狗和风景均是符合待处理图像的特征的标签。In Table 2, the shown tags are the second tags screened out from the first tags shown in Table 1 in the embodiment of the present application, including 4 second tags in total. In other words, for the image to be processed, the computer device obtains that the second tags ocean, beach, dog and landscape are all tags that conform to the characteristics of the image to be processed.
需要说明的是,针对待处理图像,若图像中包括特征较为明显的海洋和狗,也包括特征不明显的沙滩。按照相关技术中的方案,较大概率仅为待处理图像打上海洋和狗两个标签。而根据本申请提供的方案,由于本申请在判断的过程中引入了图谱特征矩阵,该矩阵来源于知识图谱,知识图谱能够用于指示两个第一标签之间的关系,因此,图谱特征矩阵实际能够提供海洋和沙滩这一具有强相关关系的关联关系,也能够提供海洋和风景的强相关关系,以及沙滩和风景的强相关关系,也即本申请提供的方法,较大可能识别出海洋、沙滩、狗和风景同时作为待处理图像的第二标签。It should be noted that, for the image to be processed, if the image includes oceans and dogs with obvious features, it also includes beaches with less obvious features. According to the solution in the related art, only two labels of ocean and dog are marked on the image to be processed with a high probability. According to the solution provided by this application, since this application introduces a graph feature matrix in the process of judgment, the matrix is derived from the knowledge graph, and the knowledge graph can be used to indicate the relationship between the two first tags. Therefore, the graph feature matrix It can actually provide a strong correlation between the ocean and the beach, as well as the strong correlation between the ocean and the landscape, as well as the strong correlation between the beach and the landscape, that is, the method provided by this application is more likely to identify the ocean. , Beach, Dog, and Landscape are also used as second tags for the images to be processed.
在一种实际的实现过程中,每一个第一标签均具有各自对应的阈值,当激活层得到的该第一标签的概率值大于其对应的阈值时,激活层将该第一标签确定为第二标签。示意性的,以表一和表二所示的数据为例进行介绍。In an actual implementation process, each first label has its own corresponding threshold. When the probability value of the first label obtained by the activation layer is greater than its corresponding threshold, the activation layer determines the first label as the first label. Two labels. Illustratively, the data shown in Table 1 and Table 2 are taken as examples for introduction.
请参见表三,表三所示数据是基于表一中所示的第一标签按照激活数据指示的概率值和预设的阈值。Please refer to Table 3. The data shown in Table 3 are based on the probability value indicated by the first tag shown in Table 1 according to the activation data and the preset threshold.
序号serial number 11 22 33 44 55 66 77 88 99
标签Label 沙滩beach 海洋ocean 日出日落sunrise sunset cat dog 宴会banquet 逛街shop 风景landscape 汽车car
预设阈值Preset Threshold 0.80.8 0.950.95 0.960.96 0.980.98 0.980.98 0.880.88 0.750.75 0.850.85 0.950.95
实测概率Measured probability 0.830.83 0.980.98 0.650.65 0.320.32 0.990.99 0.110.11 0.320.32 0.930.93 0.260.26
表三Table 3
在表三中,待处理图像在经过标签分类模型处理后,得到各个第一标签的实测概率,该实测概率包括在经过激活层处理的待激活数据中。激活层中可以预先存储各个第一标签对应的预设阈值。激活层能够获得待处理图像与各个标签之间的实测概率与预设阈值进行比较,将实测概率高于预设阈值的第一标签确定为第二标签。例如,按照表三所示的数据,序号1、2、5和8对应的第一标签被确定为第二标签。In Table 3, after the image to be processed is processed by the label classification model, the measured probability of each first label is obtained, and the measured probability is included in the data to be activated processed by the activation layer. The preset threshold corresponding to each first label may be pre-stored in the activation layer. The activation layer can obtain the measured probability between the image to be processed and each label and compare it with a preset threshold, and determine the first label whose measured probability is higher than the preset threshold as the second label. For example, according to the data shown in Table 3, the first tags corresponding to serial numbers 1, 2, 5 and 8 are determined as the second tags.
步骤340,将至少两个第二标签确定为待处理图像的标签,第二标签属于第一标签。 Step 340, at least two second tags are determined as tags of the image to be processed, and the second tags belong to the first tags.
在本申请实施例中,计算机设备在确定出至少两个第二标签后,其作为待处理图像的标签。In this embodiment of the present application, after the computer device determines at least two second tags, they are used as tags of the image to be processed.
一种可能的方式中,第二标签可以作为可视化信息显示在被处理后的待处理图像上。请参见图6,图6是基于图3所示的实施例提供的一种图像处理后的可视界面。在用户界面600中,图像610经过处 理后可以在下方显示其附带的三个第二标签,分别是第一个第二标签620、第二个第二标签630和第三个第二标签640,分别是“沙滩”、“树”和“海洋”。In a possible manner, the second label may be displayed on the processed image to be processed as visual information. Referring to FIG. 6 , FIG. 6 is a visual interface after image processing provided based on the embodiment shown in FIG. 3 . In the user interface 600, after the image 610 is processed, three second labels attached to it can be displayed below, which are the first second label 620, the second second label 630 and the third second label 640, respectively. They are "Beach", "Trees" and "Ocean".
在另一种可能的方式中,第二标签可以不作为可视化信息,而作为图像的一种属性信息。可选地,该属性信息既可以存储在图像的属性帧中,也可以另外存储在计算机设备指定的文件中。其中,图像的属性帧作为图像的一部分,随着图像的拷贝而拷贝,随着图像的删除而灭失。In another possible manner, the second label may not be used as visual information, but as a kind of attribute information of the image. Optionally, the attribute information may be stored in the attribute frame of the image, or may be additionally stored in a file designated by the computer device. Among them, the attribute frame of the image, as a part of the image, is copied with the copying of the image, and disappears with the deletion of the image.
在一种实际的应用场景中,如果若干张图像均被打上多个第二标签时,计算机设备既可以根据多个标签智能生成相册。例如,若干张图像均出现“沙滩”、“海洋”和“风景”时,则这些图像智能组合成为一个相册,命名为“海边游玩”。需要说明的是,该智能生成相册的操作既可以在终端侧完成,也可以在服务器侧完成。In a practical application scenario, if several images are marked with multiple second tags, the computer device can intelligently generate albums according to the multiple tags. For example, when "beach", "ocean" and "landscape" appear in several images, these images are intelligently combined into a photo album named "Seaside Play". It should be noted that, the operation of intelligently generating an album can be completed on the terminal side or on the server side.
当该操作在服务器中完成时,终端可以将本端拍摄的图像通过云备份或者其它形式上传至服务器。从而,服务器实现对多张图像智能生成相册的操作。When the operation is completed in the server, the terminal can upload the image captured by the local end to the server through cloud backup or other forms. Thus, the server realizes the operation of intelligently generating a photo album for multiple images.
综上所述,本申请提供的图像的多标签分类办法,能够在提取到待处理图像的图像特征后,结合图谱特征矩阵得到待激活数据,在根据待激活数据获得至少两个第二标签,将第二标签作为待处理图像的标签。其中,图谱特征矩阵是知识图谱经过图卷积神经网络处理后得到的矩阵,知识图谱用于指示第一标签自身的属性,以及,至少两个第一标签之间的关系,由于本申请在为待处理图像确定第二标签的过程中,引入了体现各个第一标签之间关系的知识图谱,并使用该知识图谱得到的图谱特征矩阵辅助确定第二标签,避免了部分第二标签在待处理图像中特征不明显,从而在确定标签的过程中被遗漏的问题,提高了确定图像的多个标签的准确度。To sum up, the multi-label classification method for images provided by this application can obtain the data to be activated in combination with the atlas feature matrix after extracting the image features of the to-be-processed image, and obtain at least two second labels according to the data to be activated, Use the second label as the label of the image to be processed. Among them, the graph feature matrix is a matrix obtained after the knowledge graph is processed by the graph convolutional neural network, and the knowledge graph is used to indicate the attributes of the first label itself, and the relationship between at least two first labels. In the process of determining the second label of the image to be processed, a knowledge graph reflecting the relationship between each first label is introduced, and the graph feature matrix obtained from the knowledge graph is used to assist in determining the second label, avoiding the need for some second labels to be processed. The problem that the features in the image are not obvious, and thus is missed in the process of determining the label, improves the accuracy of determining the multiple labels of the image.
基于上述经过训练的标签分类模型,本申请实施例提供了一种基于该标签分类模型的图像的多标签分类方法。通过该标签分类模型,本申请能够针对一张待处理图像得到更为准确的多标签的分类结果。详情请参见如下介绍。Based on the above trained label classification model, an embodiment of the present application provides a multi-label classification method for images based on the label classification model. Through the label classification model, the present application can obtain a more accurate multi-label classification result for an image to be processed. For details, see the introduction below.
请参见图7,图7是本申请另一个示例性实施例提供的一种图像的多标签分类方法流程图。该图像的多标签分类方法可以应用在上述所示的终端或服务器中。在图7中,该图像的多标签分类方法包括:Please refer to FIG. 7. FIG. 7 is a flowchart of a method for classifying images with multiple labels according to another exemplary embodiment of the present application. The multi-label classification method of the image can be applied to the terminal or server shown above. In Figure 7, the multi-label classification method for this image includes:
步骤711,获取待处理图像。 Step 711, acquiring the image to be processed.
在本申请实施例中,可以根据应用在执行主体不同而有不同的待处理图像的获取方法。In this embodiment of the present application, there may be different methods for acquiring images to be processed according to different execution subjects of the application.
一种可能的方式中,当待处理图像是服务器获取的,服务器将从终端传输的数据中的获取该待处理图像。可选地,终端向服务器传输数据的方式可以包括云相册同步、智能相册的制作或者云备份等场景。In a possible manner, when the image to be processed is acquired by the server, the server acquires the image to be processed from the data transmitted from the terminal. Optionally, the manner in which the terminal transmits data to the server may include scenarios such as cloud album synchronization, smart album creation, or cloud backup.
另一种可能的方式中,当待处理图像是终端获取的,终端将从本地存储的图库中提取待处理图像,该图像既可以终端自身拍摄的也可以是其它终端拍摄后发送至终端中的图像。In another possible way, when the image to be processed is acquired by the terminal, the terminal will extract the image to be processed from the locally stored gallery, and the image can be either shot by the terminal itself or sent to the terminal after being shot by other terminals. image.
在后续步骤中,以该方法应用在终端中为例,介绍图7所示实施例的实现过程。In the subsequent steps, the implementation process of the embodiment shown in FIG. 7 is introduced by taking the method applied to the terminal as an example.
步骤712,将待处理图像输入卷积神经网络。 Step 712, input the image to be processed into the convolutional neural network.
其中,待处理图像能够直接输入到卷积神经网络中,通过卷积神经网络处理该图像。The image to be processed can be directly input into the convolutional neural network, and the image is processed through the convolutional neural network.
步骤713,通过卷积神经网络处理待处理图像,得到图像特征矩阵。Step 713: Process the image to be processed through a convolutional neural network to obtain an image feature matrix.
在本例中,卷积神经网络中包括若干层结构,卷积神经网络依次通过上述若干层结构,得到图像特征矩阵。In this example, the convolutional neural network includes several layers of structures, and the convolutional neural network sequentially passes through the above-mentioned several layers of structures to obtain the image feature matrix.
一种可能的方式中,标签分类模型包括输入层、卷积层和池化层。通过卷积神经网络处理待处理图像的过程,可以包括将待处理图像输入到输入层中,经过上述逐层的处理,最终得到图像特征矩阵。In one possible way, the label classification model includes an input layer, a convolutional layer and a pooling layer. The process of processing the image to be processed through the convolutional neural network may include inputting the image to be processed into the input layer, and through the above layer-by-layer processing, an image feature matrix is finally obtained.
示意性的,计算机设备可以将待处理图像输入输入层,得到第一中间数据;将第一中间数据输入卷积层,得到第二中间数据;将第二中间数据输入池化层,得到图像特征矩阵。Illustratively, the computer equipment can input the image to be processed into the input layer to obtain the first intermediate data; input the first intermediate data into the convolution layer to obtain the second intermediate data; input the second intermediate data into the pooling layer to obtain the image features matrix.
当待处理图像输入到输入层后,经过输入层的处理,得到第一中间数据。随后,神经网络中输入层是与卷积层相连的,卷积层对第一中间数据进行处理后,得到第二中间数据。神经网络中,与卷积层相连的是池化层,池化层对第二中间数据处理后,得到图像特征矩阵。After the image to be processed is input to the input layer, the first intermediate data is obtained after processing by the input layer. Then, the input layer in the neural network is connected with the convolution layer, and the convolution layer processes the first intermediate data to obtain the second intermediate data. In the neural network, the pooling layer is connected to the convolutional layer. After the pooling layer processes the second intermediate data, the image feature matrix is obtained.
需要说明的是,计算机设备可以在尚未存储图谱特征矩阵时执行步骤721至步骤723,以得到图谱特征矩阵。响应于计算机设备存储的图谱特征矩阵是最新版本的知识图谱对应的图谱特征矩阵,则计算机设备在为待处理图像打上多个第二标签的过程中直接使用已存储的图谱特征矩阵即可,无需执行步骤721至步骤723。It should be noted that the computer device may execute steps 721 to 723 when the atlas feature matrix has not been stored to obtain the atlas feature matrix. In response to the graph feature matrix stored by the computer device being the graph feature matrix corresponding to the latest version of the knowledge graph, the computer device can directly use the stored graph feature matrix in the process of marking the image to be processed with multiple second labels, without the need for Steps 721 to 723 are performed.
步骤721,将标签关系矩阵输入图卷积神经网络,标签关系矩阵用于指示至少两个第一标签之间的关系。 Step 721 , input the label relationship matrix into the graph convolutional neural network, where the label relationship matrix is used to indicate the relationship between at least two first labels.
图卷积神经网络中需要输入两个矩阵,标签关系矩阵作为其中的一个输入,在本申请实施例中将输入到图卷积神经网络中。其中,该标签关系矩阵用于指示至少两个第一标签之间的关系。In the graph convolutional neural network, two matrices need to be input, and the label relationship matrix is used as one of the inputs, which will be input into the graph convolutional neural network in this embodiment of the present application. Wherein, the label relationship matrix is used to indicate the relationship between at least two first labels.
步骤722,将节点信息矩阵输入图卷积神经网络,节点信息矩阵用于指示第一标签自身的属性。Step 722: Input the node information matrix into the graph convolutional neural network, where the node information matrix is used to indicate the attributes of the first label itself.
可选地,计算机设备在将标签关系矩阵输入到图卷积神经网络中时,还能够将节点信息矩阵输入到图卷积神经网络。示意性的,节点信息矩阵和标签关系矩阵共同组成知识图谱。Optionally, when inputting the label relationship matrix into the graph convolutional neural network, the computer device can also input the node information matrix into the graph convolutional neural network. Illustratively, the node information matrix and the label relationship matrix together form a knowledge graph.
步骤723,通过图卷积神经网络处理标签关系矩阵和节点信息矩阵,获得图谱特征矩阵。Step 723: Process the label relationship matrix and the node information matrix through a graph convolutional neural network to obtain a graph feature matrix.
需要说明的是,响应于生成图谱特征矩阵的知识图谱发生更新,计算机设备将根据更新后的知识图谱重新生成新的图谱特征矩阵,并存储新的图谱特征矩阵,以处理图像特征,得到待激活数据。It should be noted that, in response to the update of the knowledge graph that generates the graph feature matrix, the computer device will regenerate a new graph feature matrix according to the updated knowledge graph, and store the new graph feature matrix to process the image features and obtain the to-be-activated data.
在实际操作中,计算机设备响应于知识图谱中的数据完成更新,获取更新后的知识图谱;通过图卷积神经网络处理更新后的知识图谱,获得更新后的图谱特征矩阵;通过更新后的图谱特征矩阵,更新标签分类模型中的图谱特征矩阵。In actual operation, the computer equipment completes the update in response to the data in the knowledge graph, and obtains the updated knowledge graph; processes the updated knowledge graph through the graph convolutional neural network, and obtains the updated graph feature matrix; through the updated graph feature matrix, which updates the graph feature matrix in the label classification model.
需要说明的是,知识图谱的更新可以在服务器侧执行,服务器在知识图谱更新后计算得到更新后的图谱特征矩阵,并将更新后的图谱特征矩阵作为新的信息推送至终端。终端根据新的图谱特征矩阵处理图像特征,得到待激活数据。It should be noted that the update of the knowledge graph can be performed on the server side, and the server calculates the updated graph feature matrix after the knowledge graph is updated, and pushes the updated graph feature matrix to the terminal as new information. The terminal processes the image features according to the new atlas feature matrix, and obtains the data to be activated.
在一种可能的实现方式中,图谱特征矩阵的规模是C*N,其中,C是第一标签的个数,N是特征维数,C和N均为正整数。In a possible implementation manner, the scale of the graph feature matrix is C*N, where C is the number of first labels, N is the feature dimension, and both C and N are positive integers.
相应的,图像特征矩阵的规模是N*1,图谱特征矩阵的规模是C*N,待激活数据矩阵的规模是C*1,C是第一标签的个数,N是特征维数,C和N均为正整数。Correspondingly, the size of the image feature matrix is N*1, the size of the map feature matrix is C*N, the size of the data matrix to be activated is C*1, C is the number of first labels, N is the feature dimension, C and N are positive integers.
基于上述图谱特征矩阵的规模是C*N且图像特征矩阵的规模是N*1,得到待激活数据矩阵的规模是C*1。通俗而言,针对待激活数据矩阵而言,每一行数据均对应于一个第一标签被激活之后的数据。Based on the above-mentioned size of the map feature matrix is C*N and the size of the image feature matrix is N*1, the size of the data matrix to be activated is C*1. Generally speaking, for the data matrix to be activated, each row of data corresponds to the data after a first label is activated.
步骤731,令图像特征矩阵和图谱特征矩阵相乘,得到待激活数据矩阵。Step 731: Multiply the image feature matrix and the atlas feature matrix to obtain a data matrix to be activated.
步骤732,通过标签分类模型中的激活层处理待激活数据矩阵,得到至少两个第二标签。Step 732: Process the data matrix to be activated through the activation layer in the label classification model to obtain at least two second labels.
在本申请中,终端可以通过执行步骤(3a)、步骤(3b)和步骤(3c)来替换实现步骤732所示的得到至少两个第二标签的效果。In this application, the terminal may perform step (3a), step (3b) and step (3c) instead of realizing the effect of obtaining at least two second tags shown in step 732.
步骤(3a),将待激活数据矩阵输入激活层。Step (3a), input the data matrix to be activated into the activation layer.
步骤(3b),通过激活层处理待激活数据,获得第一标签对应的概率值,概率值用于指示第一标签符合待处理图像的概率。In step (3b), the data to be activated is processed by the activation layer to obtain a probability value corresponding to the first label, and the probability value is used to indicate the probability that the first label conforms to the image to be processed.
步骤(3c),响应于概率值高于对应的第一阈值,将对应的第一标签确定为第二标签,第一阈值是用于判断第一标签是否符合待处理图像的阈值。Step (3c), in response to the probability value being higher than the corresponding first threshold, determine the corresponding first label as the second label, and the first threshold is a threshold for judging whether the first label conforms to the image to be processed.
需要说明的是,第一阈值可以和第一标签一一对应。当标签分类模型中使用的第一标签的个数为i个时,第一阈值的个数也是i个。It should be noted that, the first threshold may be in one-to-one correspondence with the first label. When the number of first labels used in the label classification model is i, the number of first thresholds is also i.
在本申请实施例中,当计算机设备执行完成步骤732时,待处理图像已经获得至少两个第二标签。每一张待处理图像均可以通过上述的流程得到各自所属的至少两个第二标签。计算机设备可以通过对需要处理的多张待处理图像执行本申请实施例提供的步骤711至步骤732来实现多标签分类的效果。然而,为了提供更加准确的分类效果的方案,本申请实施例还可以增加一个图像后处理流程,通过待处理图像的图像内容之外的特征确定是否再为该待处理图像增加指定的第二标签。In this embodiment of the present application, when the computer device completes step 732, the image to be processed has obtained at least two second tags. Each image to be processed can obtain at least two second labels to which it belongs through the above process. The computer device can achieve the effect of multi-label classification by performing steps 711 to 732 provided in this embodiment of the present application on multiple images to be processed. However, in order to provide a solution with a more accurate classification effect, the embodiment of the present application may also add an image post-processing process, which determines whether to add a specified second label to the to-be-processed image based on features other than the image content of the to-be-processed image. .
示意性的,计算机设备响应于第一图像和第二图像已获取各自对应的第二标签,获取第一图像和第二图像之间的拍摄时刻关系信息。第一图像和第二图像均是已经添加好第二标签的待处理图像。例如,请参见表四。表四示出了一种第一图像和第二图像被处理后的第二标签的情况。Illustratively, the computer device acquires the shooting time relationship information between the first image and the second image in response to the first image and the second image having acquired their corresponding second labels. Both the first image and the second image are images to be processed to which the second label has been added. For example, see Table IV. Table 4 shows a situation of the second label after the first image and the second image are processed.
Figure PCTCN2021122741-appb-000009
Figure PCTCN2021122741-appb-000009
表四Table 4
由表四所示数据可知,第二图像在经过本申请所示的方案打上多个第二标签之后,包括3个第二标签,分别为“海洋”、“狗”和“风景”。第一图像经过本申请所示的方案打上多个第二标签之后,包括4个第二标签,分别为“海洋”、“沙滩”、“狗”和“风景”。在此情况下,计算机设备将获取第一图像和第二图像之间的拍摄时刻关系信息。From the data shown in Table 4, it can be seen that the second image includes three second labels, namely "ocean", "dog" and "landscape", after a plurality of second labels are applied through the scheme shown in this application. After the first image is marked with a plurality of second labels through the solution shown in this application, it includes four second labels, namely "ocean", "beach", "dog" and "landscape". In this case, the computer device will acquire the shooting time relationship information between the first image and the second image.
其中,拍摄时刻关系信息用于指示第一图像和第二图像在拍摄时刻上的时序关系,或者,拍摄时刻关系信息用于指示第一图像的拍摄时刻和第二图像的拍摄时刻之间的时长。Wherein, the shooting time relationship information is used to indicate the time sequence relationship between the first image and the second image at the shooting time, or the shooting time relationship information is used to indicate the duration between the shooting time of the first image and the shooting time of the second image .
在拍摄关系信息指示的第一种情况中,拍摄时刻关系信息指示的是一种时序关系。该时序关系包括两种情况,第一种情况是第一图像的拍摄时刻早于第二图像的拍摄时刻。第二种情况是第一图像的拍摄时刻晚于第二图像的拍摄时刻。需要说明的是,由于本申请所处理的第一图像和第二图像默认是通过同一个终端拍摄的图像。因此,从图像拍摄逻辑上来说,不存在第一图像的拍摄时刻等于第二图像的拍摄时刻。In the first case indicated by the shooting relationship information, the shooting time relationship information indicates a timing relationship. The time sequence relationship includes two cases. The first case is that the shooting time of the first image is earlier than the shooting time of the second image. The second case is that the shooting time of the first image is later than the shooting time of the second image. It should be noted that, because the first image and the second image processed by this application are images captured by the same terminal by default. Therefore, in terms of image capture logic, there is no capture moment of the first image equal to the capture moment of the second image.
可选的,在本申请实施例中,第一图像和第二图像是同一个终端通过同一组摄像头拍摄的图像。当 拍摄图像的终端是智能手机时,智能手机的摄像头通常包括前置摄像头组和后置摄像头组两组,智能手机可以选择其中的一组摄像头来拍摄图像。在一种较为少见的场景中,智能手机仅包括一组摄像头,但该组摄像头能够翻转到拍摄前侧的方向或翻转到拍摄后侧的方向。本申请实施例所示的第一图像和第二图像是该组摄像头在朝向同一侧拍摄的两张图像。其中,智能手机可以通过状态信息确定当前摄像头的朝向。Optionally, in this embodiment of the present application, the first image and the second image are images captured by the same terminal through the same set of cameras. When the terminal for capturing images is a smart phone, the camera of the smart phone usually includes two groups of a front camera group and a rear camera group, and the smart phone can select one group of cameras to capture images. In a rarer scenario, the smartphone includes only one set of cameras, but the set of cameras can be flipped to capture the front side or flip to capture the back side. The first image and the second image shown in the embodiment of the present application are two images captured by the camera group facing the same side. Among them, the smartphone can determine the current orientation of the camera through the status information.
在拍摄时刻关系信息指示的第二种情况中,拍摄时刻关系信息指示的是一种时长信息,该时长信息是第一图像的拍摄时刻和第二图像的拍摄时刻之间的时长,该信息可以是固定的数值,精度可以是分钟、秒或毫秒等。需要说明的是,该精度值可以根据应用的场景而有所不同。当本申请实施例应用在日常拍摄人像风景等场景中时,精度可以是秒级的精度。当本申请实施例应用在拍摄高速运动的物体的场景中精度可以是毫秒,例如高速运动的人、车辆或者微观场景下的粒子。当本申请实施例应用在自然保护区监控的场景中时,精度可以是分钟。本申请实施例对拍摄关系信息的精度仅做示意性介绍,不对实际的场景形成限定。In the second case indicated by the shooting time relationship information, the shooting time relationship information indicates a kind of duration information, and the duration information is the duration between the shooting time of the first image and the shooting time of the second image, and the information can be Is a fixed value, the precision can be minutes, seconds or milliseconds, etc. It should be noted that the precision value may vary according to the application scenario. When the embodiments of the present application are applied in scenes such as daily shooting of portraits and landscapes, the accuracy may be on the order of seconds. When the embodiments of the present application are applied to a scene of shooting a high-speed moving object, the accuracy may be milliseconds, such as a high-speed moving person, a vehicle, or particles in a microscopic scene. When the embodiment of the present application is applied in a scene of monitoring a nature reserve, the accuracy may be minutes. The embodiments of the present application only provide a schematic introduction to the accuracy of the shooting relationship information, and do not limit the actual scene.
综合上述两种拍摄时刻关系信息指示的内容,计算机设备能够在拍摄时刻关系信息符合预设条件时,将目标第二标签增加为第二图像对应的第二标签,目标第二标签是对应于第一图像且不对应于第二图像的标签。预设条件可以用来指示第一图像的拍摄时刻和第二图像的拍摄时刻接近。也即当拍摄时刻关系信息指示第一图像和第二图像拍摄时刻接近时,对应于拍摄时刻关系信息符合预设条件。Combining the contents indicated by the above two kinds of shooting time relationship information, the computer equipment can add the target second label to the second label corresponding to the second image when the shooting time relationship information meets the preset conditions, and the target second label is corresponding to the second label of the second image. a label that does not correspond to a second image. The preset condition may be used to indicate that the shooting time of the first image is close to the shooting time of the second image. That is, when the shooting time relationship information indicates that the shooting time of the first image and the second image are close, the corresponding shooting time relationship information meets the preset condition.
对于上述拍摄时刻关系信息符合预设条件,这里进行具体场景上的介绍。当拍摄时刻关系信息指示时序关系时,计算机设备通过步骤(4a)和步骤(4b)来实现为第二图像增加标签的操作。For the above-mentioned shooting time relationship information that meets the preset conditions, a specific scene is introduced here. When the photographing time relationship information indicates a time sequence relationship, the computer device implements the operation of adding a label to the second image through steps (4a) and (4b).
步骤(4a),响应于第一图像和第二图像已获取各自对应的第二标签,获取目标时长。Step (4a), in response to the first image and the second image having acquired the respective corresponding second labels, acquire the target duration.
其中,目标时长是第一图像的拍摄时刻和第二图像的拍摄时刻之间的时长。The target duration is the duration between the shooting time of the first image and the shooting time of the second image.
步骤(4b),响应于目标时长小于第二阈值,将目标第二标签增加为第二图像对应的第二标签。Step (4b), in response to the target duration being less than the second threshold, adding the target second label to the second label corresponding to the second image.
在本申请实施例中,当目标时长小于第二阈值时,说明第一图像的拍摄时刻接近于第二时刻的拍摄时刻。在该场景中,第一图像中的景物与第二图像中的景物大概率是相似的。因此,计算机设备可以将第二图像中没有的标签且第一图像中有的标签作为第二标签,也打到第二图像上。In the embodiment of the present application, when the target duration is less than the second threshold, it means that the shooting time of the first image is close to the shooting time of the second time. In this scenario, the scene in the first image is likely to be similar to the scene in the second image. Therefore, the computer device can use the label not in the second image and the label in the first image as the second label, and also print it on the second image.
当拍摄时刻关系信息指示时长时,计算机设备通过步骤(5a)和步骤(5b)来实现为第二图像增加标签的操作。When the shooting time relationship information indicates the duration, the computer device implements the operation of adding a label to the second image through steps (5a) and (5b).
步骤(5a),响应于第一图像和第二图像已获取各自对应的第二标签,且第一图像的数量为2k,k张第一图像是第二图像之前拍摄的图像,k张第一图像是第二图像之后拍摄的图像,获取第一图像的拍摄时刻。Step (5a), in response to the first image and the second image having acquired their corresponding second labels, and the number of the first images is 2k, the k first images are images taken before the second image, and the k first images are the images taken before the second image. The image is an image captured after the second image, and the capturing moment of the first image is obtained.
步骤(5b),响应于2k张第一图像的拍摄时刻所在区间的长度小于第三阈值,将目标第二标签增加为第二图像对应的第二标签,k为大于或等于1的整数。In step (5b), in response to the length of the interval where the 2k first images are taken at the time of shooting is less than the third threshold, the target second label is added to the second label corresponding to the second image, where k is an integer greater than or equal to 1.
在本申请实施例中,第一图像的张数可以选定为总共2k张,这些第一图像是终端在拍摄第二图像前后连续拍摄的图像。In this embodiment of the present application, the number of first images may be selected as a total of 2k images, and these first images are images continuously captured by the terminal before and after capturing the second image.
例如,请参见图8,图8是基于图7所示实施例提供的一种图像后处理的示意图。在图8中,k的取值是3,按照拍摄时刻从早到晚的顺序,终端连续拍摄到第一张第一图像811、第二张第一图像812、第三张第一图像813、第二图像820、第四张第一图像814、第五张第一图像815和第六张第一图像816。同时,可以参考表五。表五示出了各张图像的拍摄时刻。For example, please refer to FIG. 8 , which is a schematic diagram of an image post-processing provided based on the embodiment shown in FIG. 7 . In FIG. 8, the value of k is 3. According to the order of shooting time from morning to night, the terminal continuously shoots the first first image 811, the second first image 812, the third first image 813, Second image 820 , fourth first image 814 , fifth first image 815 , and sixth first image 816 . At the same time, you can refer to Table 5. Table 5 shows the shooting time of each image.
图像811image 811 图像812image 812 图像813image 813 图像820 image 820 图像814image 814 图像815image 815 图像816image 816
10:24:4910:24:49 10:24:5610:24:56 10:25:0610:25:06 10:25:1710:25:17 10:25:2410:25:24 10:25:2910:25:29 10:25:3510:25:35
表五Table 5
在表五所示的7张图像中,6张第一图像中均有第二标签“沙滩”,而第二图像820中没有对应第二标签“沙滩”。Among the 7 images shown in Table 5, the 6 first images all have the second label "beach", and the second image 820 does not correspond to the second label "beach".
在第一处理阶段8A中,第二图像820的第二标签是“树”和“海洋”。其它6张第一图像的第二标签均是“树”、“海洋”和“沙滩”。在第二处理阶段8B中,计算机设备确定3张第一图像处于第二图像的拍摄时刻之前,另外3张第一图像处于第二图像的拍摄图像之后,获取每一张第一图像的拍摄时刻。In the first processing stage 8A, the second labels of the second image 820 are "tree" and "ocean". The second labels of the other 6 first images are "tree", "ocean" and "beach". In the second processing stage 8B, the computer device determines that the three first images are located before the shooting time of the second image, and the other three first images are located after the shooting time of the second image, and obtains the shooting time of each first image .
在本例中,第三阈值为60秒,第一张第一图像811的拍摄时段到第6张第一图像816的拍摄时刻之间的时长为46秒。也即,6张第一图像的拍摄时刻所在区间的长度小于第三阈值60秒,计算机设备将在第二处理阶段8B中,将6张第一图像中均具有的第二标签“沙滩”作为目标第二标签,复制到第二图像对应的第二标签中。需要说明的是,在第一处理阶段8A和第二处理阶段8B中,第一图像所对应的第二标签均没有发生变化。因此,在图8中没有重复示出第一图像。In this example, the third threshold is 60 seconds, and the duration between the shooting period of the first first image 811 and the shooting time of the sixth first image 816 is 46 seconds. That is, if the length of the interval in which the six first images are taken is less than the third threshold of 60 seconds, the computer device will use the second label "beach" in the six first images as the second processing stage 8B. The target second label, copied to the second label corresponding to the second image. It should be noted that, in the first processing stage 8A and the second processing stage 8B, the second label corresponding to the first image does not change. Therefore, the first image is not shown repeatedly in FIG. 8 .
请参考图9,图9是本申请实施例提供的一种自动生成相册的过程示意图。在图9的图像采集阶段 9A中,计算机设备获取需要被处理的若干张待处理图像。若计算机设备是服务器,图像采集阶段9A可以是接收终端上传的照片,落地的过程可以终端在服务器中进行云备份或相册备份等过程。若计算机设备是终端,图像采集阶段9A可以是拍摄照片的过程,当照片拍摄并完成存储后,计算机设备已获得若干张待处理设备。Please refer to FIG. 9 , which is a schematic diagram of a process of automatically generating an album provided by an embodiment of the present application. In the image acquisition stage 9A of Figure 9, the computer device acquires several images to be processed that need to be processed. If the computer device is a server, the image acquisition stage 9A may be to receive photos uploaded by the terminal, and the process of landing may be that the terminal performs cloud backup or album backup in the server. If the computer device is a terminal, the image acquisition stage 9A may be a process of taking pictures. After the pictures are taken and stored, the computer device has obtained several pieces of devices to be processed.
在计算机设备已经采集到待处理图像后,计算机设备可以在多标签确定阶段9B中,通过本申请提供的标签分类模型为待处理图像添加至少两个第二标签。After the computer device has collected the to-be-processed image, the computer device may add at least two second labels to the to-be-processed image through the label classification model provided in the present application in the multi-label determination stage 9B.
当若干张待处理图像均已经被添加了至少两个标签时,计算机设备能够在图像后处理阶段9C,通过将待处理图像分为第一图像和第二图像,并按照第一图像和第二图像之间的拍摄时刻关系信息是否符合预定条件,确定是否为第二图像补充目标第二标签,该目标第二标签是第一图像对应有的且第二图像不对应的标签。When at least two tags have been added to several images to be processed, the computer device can divide the images to be processed into the first image and the second image in the image post-processing stage 9C, and classify the images according to the first image and the second image. Whether the photographing time relationship information between the images meets a predetermined condition determines whether the second image is supplemented with a target second label, which is a label corresponding to the first image and not corresponding to the second image.
当待处理图像经过图像后处理阶段9C处理后,计算机设备可以根据预设策略生成指定的相册,相册包括待处理图像。一种可能的策略中,计算机设备选定m个标签,根据包括m个标签的待处理图像生成第一相册中,并根据选定的m个标签生成该相册的名称。另一种可能的策略中,计算机设备将限定指定的拍摄地点以及m个标签,生成在指定的拍摄地点拍摄的类似内容的第二相册。另一种可能的策略中,计算机设备将限定指定的拍摄时间以及m个标签,生成在指定的拍摄时间拍摄的类似内容的第三相册。由此可见,本申请实施例提供的方案能够在较高准确性的前提下,为待处理图像打上多个标签,并能够基于此智能生成相应的相册,提高了自动生成相册的效率和准确度,减少了生成相册时遗漏实际上符合相册标准的图像的情况发生。After the to-be-processed image is processed in the image post-processing stage 9C, the computer device can generate a designated album according to a preset strategy, and the album includes the to-be-processed image. In a possible strategy, the computer device selects m tags, generates the first album according to the images to be processed including the m tags, and generates the name of the album according to the selected m tags. In another possible strategy, the computer device will define a specified shooting location and m tags, and generate a second album of similar content shot at the specified shooting location. In another possible strategy, the computer device will define a specified shooting time and m tags to generate a third album of similar content shot at the specified shooting time. It can be seen that the solutions provided by the embodiments of the present application can add multiple tags to the images to be processed under the premise of high accuracy, and can intelligently generate corresponding albums based on this, which improves the efficiency and accuracy of automatically generating albums , reducing the occurrence of missing images that actually meet the standards of the album when generating the album.
综上所述,本实施例所使用的标签分类模型中包括卷积神经网络的结构,该卷积神经网络用于提取待处理图像中的图像内容,卷积神经网络在提取到图像特征矩阵后,能够被来源于知识图谱的图谱特征矩阵所处理,得到待激活数据,待激活数据被激活层处理后,能够得到至少两个第二标签,已完成为待处理图像标记多个标签的效果。To sum up, the label classification model used in this embodiment includes the structure of a convolutional neural network. The convolutional neural network is used to extract the image content in the image to be processed. After the convolutional neural network extracts the image feature matrix, , which can be processed by the map feature matrix derived from the knowledge map to obtain the data to be activated. After the data to be activated is processed by the activation layer, at least two second labels can be obtained, and the effect of marking multiple labels for the image to be processed has been completed.
可选地,本申请实施例还能够引入图卷积神经网络处理知识图谱,从而得到用于处理图像特征矩阵的图谱特征矩阵,使得为待处理图像打上多个第二标签时,能够令进入激活层之前的数据受到知识图谱中第一节点之间的互相关系的制衡,从而避免待处理图像中不明显的标签被遗漏掉,提高了为待处理图像打上多个第二标签的准确度。Optionally, the embodiment of the present application can also introduce a graph convolutional neural network to process the knowledge graph, so as to obtain a graph feature matrix for processing the image feature matrix, so that when multiple second labels are added to the to-be-processed image, the entry can be activated. The data before the layer is checked and balanced by the mutual relationship between the first nodes in the knowledge graph, thereby avoiding the omission of inconspicuous labels in the image to be processed, and improving the accuracy of adding multiple second labels to the image to be processed.
可选的,本实施例还能够在待处理图像完成至少两个第二标签的标注后,通过数据后处理阶段再次检测待处理图像中是否存在尚未标注上的第二标签。在该后处理阶段中,计算机设备将检测与待处理图像相邻的图像中是否存在没有标注在待处理图像上的第二标签,若该与待处理图像相邻的图像在拍摄时刻与待处理图像的拍摄时刻比较接近,则本申请将与待处理图像相邻的图像中存在未标注在待处理图像上的第二标签标注在待处理图像上,以提升第二标签标注的准确性。Optionally, in this embodiment, after the image to be processed is marked with at least two second labels, it can be detected again whether there is a second label that has not been marked in the image to be processed through the data post-processing stage. In this post-processing stage, the computer equipment will detect whether there is a second label not marked on the image to be processed in the image adjacent to the image to be processed, if the image adjacent to the image to be processed is different from the image to be processed at the time of shooting If the shooting times of the images are relatively close, the present application marks the images adjacent to the images to be processed with second labels that are not marked on the to-be-processed images on the to-be-processed images to improve the accuracy of the second label labeling.
可选地,当待处理图像的前k张图像和后k张图像中都存在一个第二标签,且前k张图像和后k张图像所处的时间区间在指定的时长范围内,则该第二标签将标注在待处理图像上,从而进一步提高第二标签标注的准确度。Optionally, when there is a second label in the first k images and the last k images of the image to be processed, and the time interval where the first k images and the last k images are located is within the specified duration, then the The second label will be marked on the image to be processed, thereby further improving the accuracy of the second label labeling.
下述为本申请装置实施例,可以用于执行本申请方法实施例。对于本申请装置实施例中未披露的细节,请参照本申请方法实施例。The following are apparatus embodiments of the present application, which can be used to execute the method embodiments of the present application. For details not disclosed in the device embodiments of the present application, please refer to the method embodiments of the present application.
请参考图10,图10是本申请一个示例性实施例提供的一种图像的多标签分类装置的结构框图。该图像的多标签分类装置可以通过软件、硬件或者两者的结合实现成为终端的全部或一部分。该装置包括特征提取模块1010、第一获取模块1020、标签获取模块1030和标签确定模块1040。针对上述模块的具体功能进行介绍。Please refer to FIG. 10 . FIG. 10 is a structural block diagram of an apparatus for classifying images with multiple labels according to an exemplary embodiment of the present application. The image multi-label classification device can be implemented as all or a part of the terminal through software, hardware or a combination of the two. The apparatus includes a feature extraction module 1010 , a first acquisition module 1020 , a label acquisition module 1030 and a label determination module 1040 . The specific functions of the above modules are introduced.
特征提取模块1010,用于通过标签分类模型中的特征提取层提取待处理图像的图像特征,所述标签分类模型是用于为所述待处理图像添加至少两个标签的神经网络模型。The feature extraction module 1010 is configured to extract image features of the image to be processed through a feature extraction layer in a label classification model, where the label classification model is a neural network model for adding at least two labels to the image to be processed.
第一获取模块1020,用于通过图谱特征矩阵处理所述图像特征,获得待激活数据,所述图谱特征矩阵是知识图谱经过图卷积神经网络处理后得到的矩阵,所述知识图谱用于指示第一标签自身的属性,以及,至少两个所述第一标签之间的关系。The first acquisition module 1020 is configured to process the image features through a graph feature matrix to obtain the data to be activated. The graph feature matrix is a matrix obtained after the knowledge graph is processed by a graph convolutional neural network, and the knowledge graph is used to indicate The properties of the first tag itself, and the relationship between at least two of the first tags.
标签获取模块1030,用于通过所述标签分类模型中的激活层处理所述待激活数据,得到至少两个第二标签。The label obtaining module 1030 is configured to process the data to be activated through the activation layer in the label classification model to obtain at least two second labels.
标签确定模块1040,用于将至少两个所述第二标签确定为所述待处理图像的标签,所述第二标签属于所述第一标签。The label determination module 1040 is configured to determine at least two of the second labels as labels of the image to be processed, and the second labels belong to the first labels.
在一个可选的实施例中,所述第一获取模块1020,用于令所述图像特征矩阵和所述图谱特征矩阵相乘,得到待激活数据矩阵。所述标签获取模块1030,用于通过所述标签分类模型中的所述激活层处 理所述待激活数据矩阵,得到至少两个所述第二标签。In an optional embodiment, the first obtaining module 1020 is configured to multiply the image feature matrix and the atlas feature matrix to obtain a data matrix to be activated. The label obtaining module 1030 is configured to process the data matrix to be activated through the activation layer in the label classification model to obtain at least two second labels.
在一个可选的实施例中,所述装置涉及的所述知识图谱包括标签关系矩阵和节点信息矩阵,所述装置还包括第一输入模块、第二输入模块和第二获取模块。所述第一输入模块,用于将所述标签关系矩阵输入所述图卷积神经网络,所述标签关系矩阵用于指示至少两个所述第一标签之间的关系;所述第二输入模块,用于将所述节点信息矩阵输入所述图卷积神经网络,所述节点信息矩阵用于指示所述第一标签自身的属性;所述第二获取模块,用于通过所述图卷积神经网络处理所述标签关系矩阵和所述节点信息矩阵,获得所述图谱特征矩阵。In an optional embodiment, the knowledge graph involved in the apparatus includes a label relationship matrix and a node information matrix, and the apparatus further includes a first input module, a second input module, and a second acquisition module. The first input module is configured to input the label relationship matrix into the graph convolutional neural network, where the label relationship matrix is used to indicate the relationship between at least two of the first labels; the second input a module for inputting the node information matrix into the graph convolutional neural network, where the node information matrix is used to indicate the attributes of the first label itself; the second acquisition module is used for passing the graph volume The product neural network processes the label relationship matrix and the node information matrix to obtain the graph feature matrix.
在一个可选的实施例中,所述装置还包括第三获取模块、第四获取模块和矩阵更新模块。所述第三获取模块,用于响应于所述知识图谱中的数据完成更新,获取更新后的知识图谱;所述第四获取模块,用于通过所述图卷积神经网络处理所述更新后的知识图谱,获得更新后的所述图谱特征矩阵;所述矩阵更新模块,用于通过所述更新后的所述图谱特征矩阵,更新所述标签分类模型中的所述图谱特征矩阵。In an optional embodiment, the apparatus further includes a third acquiring module, a fourth acquiring module and a matrix updating module. The third acquisition module is used to complete the update in response to the data in the knowledge map, and acquire the updated knowledge map; the fourth acquisition module is used to process the updated knowledge map through the graph convolutional neural network The updated knowledge graph is obtained, and the updated graph feature matrix is obtained; the matrix update module is configured to update the graph feature matrix in the label classification model through the updated graph feature matrix.
在一个可选的实施例中,所述装置涉及的所述图谱特征矩阵的规模是C*N,其中,C是所述第一标签的个数,N是特征维数,C和N均为正整数。In an optional embodiment, the scale of the graph feature matrix involved in the device is C*N, where C is the number of the first labels, N is the feature dimension, and both C and N are positive integer.
在一个可选的实施例中,所述装置涉及的所述图像特征矩阵的规模是N*1,所述图谱特征矩阵的规模是C*N,所述待激活数据矩阵的规模是C*1,C是所述第一标签的个数,N是特征维数,C和N均为正整数。In an optional embodiment, the size of the image feature matrix involved in the apparatus is N*1, the size of the atlas feature matrix is C*N, and the size of the data matrix to be activated is C*1 , C is the number of the first label, N is the feature dimension, and both C and N are positive integers.
在一个可选的实施例中,所述装置涉及的所述待处理图像包括第一图像和第二图像,所述装置还包括后处理模块,所述后处理模块,用于响应于所述第一图像和所述第二图像已获取各自对应的所述第二标签,获取所述第一图像和所述第二图像之间的拍摄时刻关系信息,所述拍摄时刻关系信息用于指示所述第一图像和所述第二图像在拍摄时刻上的时序关系,或者,所述拍摄时刻关系信息用于指示所述第一图像的拍摄时刻和所述第二图像的拍摄时刻之间的时长;响应于所述拍摄时刻关系信息符合预设条件,将目标第二标签增加为所述第二图像对应的所述第二标签,所述目标第二标签是对应于所述第一图像且不对应于所述第二图像的所述第二标签。In an optional embodiment, the to-be-processed images involved in the apparatus include a first image and a second image, and the apparatus further includes a post-processing module, where the post-processing module is configured to respond to the first image An image and the second image have acquired the respective corresponding second labels, and acquired the shooting time relationship information between the first image and the second image, and the shooting time relationship information is used to indicate the the time sequence relationship between the first image and the second image at the shooting time, or the shooting time relationship information is used to indicate the duration between the shooting time of the first image and the shooting time of the second image; In response to the photographing moment relationship information meeting a preset condition, the target second label is added to the second label corresponding to the second image, and the target second label corresponds to the first image and does not correspond to the second label on the second image.
在一个可选的实施例中,所述后处理模块,用于响应于目标时长小于第二阈值,将所述目标第二标签增加为所述第二图像对应的所述第二标签,所述目标时长是所述第一图像的拍摄时刻和所述第二图像的拍摄时刻之间的时长。In an optional embodiment, the post-processing module is configured to, in response to the target duration being less than a second threshold, add the target second label to the second label corresponding to the second image, the The target duration is the duration between the capture moment of the first image and the capture moment of the second image.
在一个可选的实施例中,所述后处理模块,用于响应于所述第一图像的数量为2k张,在2k张所述第一图像中,k张所述第一图像是所述第二图像之前拍摄的图像,k张所述第一图像是所述第二图像之后拍摄的图像,获取所述第一图像的拍摄时刻;响应于2k张所述第一图像的拍摄时刻所在区间的长度小于第三阈值,将目标第二标签增加为所述第二图像对应的所述第二标签,所述目标第二标签是2k张所述第一图像均对应的标签且所述目标第二标签是所述第二图像不对应的所述第二标签,k为大于或等于1的整数。In an optional embodiment, the post-processing module is configured to respond that the number of the first images is 2k, and among the 2k first images, the k first images are the The images taken before the second image, the k first images are images taken after the second image, and the shooting time of the first image is obtained; in response to the interval in which the shooting time of the 2k first images is located is less than the third threshold, the target second label is added to the second label corresponding to the second image, the target second label is the label corresponding to the 2k first images, and the target second label is the label corresponding to the first image. The second label is the second label not corresponding to the second image, and k is an integer greater than or equal to 1.
综上所述,本实施例所使用的标签分类模型中包括卷积神经网络的结构,该卷积神经网络用于提取待处理图像中的图像内容,卷积神经网络在提取到图像特征矩阵后,能够被来源于知识图谱的图谱特征矩阵所处理,得到待激活数据,待激活数据被激活层处理后,能够得到至少两个第二标签,已完成为待处理图像标记多个标签的效果。To sum up, the label classification model used in this embodiment includes the structure of a convolutional neural network. The convolutional neural network is used to extract the image content in the image to be processed. After the convolutional neural network extracts the image feature matrix, , which can be processed by the map feature matrix derived from the knowledge map to obtain the data to be activated. After the data to be activated is processed by the activation layer, at least two second labels can be obtained, and the effect of marking multiple labels for the image to be processed has been completed.
可选地,本申请实施例还能够引入图卷积神经网络处理知识图谱,从而得到用于处理图像特征矩阵的图谱特征矩阵,使得为待处理图像打上多个第二标签时,能够令进入激活层之前的数据受到知识图谱中第一节点之间的互相关系的制衡,从而避免待处理图像中不明显的标签被遗漏掉,提高了为待处理图像打上多个第二标签的准确度。Optionally, the embodiment of the present application can also introduce a graph convolutional neural network to process the knowledge graph, so as to obtain a graph feature matrix for processing the image feature matrix, so that when multiple second labels are added to the to-be-processed image, the entry can be activated. The data before the layer is checked and balanced by the mutual relationship between the first nodes in the knowledge graph, thereby avoiding the omission of inconspicuous labels in the image to be processed, and improving the accuracy of adding multiple second labels to the image to be processed.
可选的,本实施例还能够在待处理图像完成至少两个第二标签的标注后,通过数据后处理阶段再次检测待处理图像中是否存在尚未标注上的第二标签。在该后处理阶段中,计算机设备将检测与待处理图像相邻的图像中是否存在没有标注在待处理图像上的第二标签,若该与待处理图像相邻的图像在拍摄时刻与待处理图像的拍摄时刻比较接近,则本申请将与待处理图像相邻的图像中存在未标注在待处理图像上的第二标签标注在待处理图像上,以提升第二标签标注的准确性。Optionally, in this embodiment, after the image to be processed is marked with at least two second labels, it can be detected again whether there is a second label that has not been marked in the image to be processed through the data post-processing stage. In this post-processing stage, the computer equipment will detect whether there is a second label not marked on the image to be processed in the image adjacent to the image to be processed, if the image adjacent to the image to be processed is different from the image to be processed at the time of shooting If the shooting times of the images are relatively close, the present application marks the images adjacent to the images to be processed with second labels that are not marked on the to-be-processed images on the to-be-processed images to improve the accuracy of the second label labeling.
可选地,当待处理图像的前k张图像和后k张图像中都存在一个第二标签,且前k张图像和后k张图像所处的时间区间在指定的时长范围内,则该第二标签将标注在待处理图像上,从而进一步提高第二标签标注的准确度。Optionally, when there is a second label in the first k images and the last k images of the image to be processed, and the time interval where the first k images and the last k images are located is within the specified duration, then the The second label will be marked on the image to be processed, thereby further improving the accuracy of the second label labeling.
示例性地,本申请实施例所示的图像的多标签分类方法,可以应用在计算机设备中,计算机设备可以是终端,该终端具备显示屏且具备图像的多标签分类功能。终端可以包括手机、平板电脑、膝上型电脑、台式电脑、电脑一体机、服务器、工作站、电视、机顶盒、智能眼镜、智能手表、数码相机、MP4 播放终端、MP5播放终端、学习机、点读机、电纸书、电子词典、车载终端、虚拟现实(Virtual Reality,VR)播放终端或增强现实(Augmented Reality,AR)播放终端等。Exemplarily, the multi-label classification method for images shown in the embodiments of the present application may be applied to a computer device, and the computer device may be a terminal having a display screen and a multi-label image classification function. Terminals can include mobile phones, tablet computers, laptop computers, desktop computers, computer all-in-one computers, servers, workstations, TVs, set-top boxes, smart glasses, smart watches, digital cameras, MP4 playback terminals, MP5 playback terminals, learning machines, point-of-view computer, electronic paper book, electronic dictionary, vehicle terminal, virtual reality (Virtual Reality, VR) playback terminal or augmented reality (Augmented Reality, AR) playback terminal, etc.
请参考图11,图11是本申请一个示例性实施例提供的一种终端的结构框图,如图1所示,该终端包括处理器1120和存储器1140,所述存储器1140中存储有至少一条指令,所述指令由所述处理器1120加载并执行以实现如本申请各个方法实施例所述的图像的多标签分类方法。Please refer to FIG. 11 , which is a structural block diagram of a terminal provided by an exemplary embodiment of the present application. As shown in FIG. 1 , the terminal includes a processor 1120 and a memory 1140 , and the memory 1140 stores at least one instruction , the instructions are loaded and executed by the processor 1120 to implement the multi-label classification method for images according to the various method embodiments of the present application.
处理器1120可以包括一个或者多个处理核心。处理器1120利用各种接口和线路连接整个终端110内的各个部分,通过运行或执行存储在存储器1140内的指令、程序、代码集或指令集,以及调用存储在存储器1140内的数据,执行终端110的各种功能和处理数据。可选的,处理器1120可以采用数字信号处理(Digital Signal Processing,DSP)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)、可编程逻辑阵列(Programmable Logic Array,PLA)中的至少一种硬件形式来实现。处理器1120可集成中央处理器(Central Processing Unit,CPU)、图像处理器(Graphics Processing Unit,GPU)和调制解调器等中的一种或几种的组合。其中,CPU主要处理操作系统、用户界面和应用程序等;GPU用于负责显示屏所需要显示的内容的渲染和绘制;调制解调器用于处理无线通信。可以理解的是,上述调制解调器也可以不集成到处理器1120中,单独通过一块芯片进行实现。The processor 1120 may include one or more processing cores. The processor 1120 uses various interfaces and lines to connect various parts in the entire terminal 110, and executes the terminal by running or executing the instructions, programs, code sets or instruction sets stored in the memory 1140, and calling the data stored in the memory 1140. 110 various functions and processing data. Optionally, the processor 1120 may use at least one of digital signal processing (Digital Signal Processing, DSP), field-programmable gate array (Field-Programmable Gate Array, FPGA), and programmable logic array (Programmable Logic Array, PLA). A hardware form is implemented. The processor 1120 may integrate one or a combination of a central processing unit (Central Processing Unit, CPU), a graphics processing unit (Graphics Processing Unit, GPU), a modem, and the like. Among them, the CPU mainly handles the operating system, user interface, and application programs; the GPU is used to render and draw the content that needs to be displayed on the display screen; the modem is used to handle wireless communication. It can be understood that, the above-mentioned modem may not be integrated into the processor 1120, and is implemented by a single chip.
存储器1140可以包括随机存储器(Random Access Memory,RAM),也可以包括只读存储器(Read-Only Memory,ROM)。可选的,该存储器1140包括非瞬时性计算机可读介质(non-transitory computer-readable storage medium)。存储器1140可用于存储指令、程序、代码、代码集或指令集。存储器1140可包括存储程序区和存储数据区,其中,存储程序区可存储用于实现操作系统的指令、用于至少一个功能的指令(比如触控功能、声音播放功能、图像播放功能等)、用于实现下述各个方法实施例的指令等;存储数据区可存储下面各个方法实施例中涉及到的数据等。The memory 1140 may include random access memory (Random Access Memory, RAM), or may include read-only memory (Read-Only Memory, ROM). Optionally, the memory 1140 includes a non-transitory computer-readable storage medium. Memory 1140 may be used to store instructions, programs, codes, sets of codes, or sets of instructions. The memory 1140 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playback function, an image playback function, etc.), Instructions, etc., used to implement the following method embodiments; the storage data area can store data and the like involved in the following method embodiments.
在本申请实施例中,计算机设备还可以是服务器,服务器的结构可参见图12所示的结构。In this embodiment of the present application, the computer device may also be a server, and the structure of the server may refer to the structure shown in FIG. 12 .
请参考图12,图12是本申请一个实施例提供的一种服务器的结构示意图。该服务器用于实施上述实施例提供的应用部署方法。具体来讲:Please refer to FIG. 12 , which is a schematic structural diagram of a server provided by an embodiment of the present application. The server is used to implement the application deployment method provided by the above embodiment. Specifically:
所述服务器1200包括中央处理单元(CPU)1201、包括随机存取存储器(RAM)1202和只读存储器(ROM)1203的系统存储器1204,以及连接系统存储器1204和中央处理单元1201的系统总线1205。所述服务器1200还包括帮助计算机内的各个器件之间传输信息的基本输入/输出系统(Input/Output,I/O系统)1206,和用于存储操作系统1213、应用程序1214和其他程序模块1215的大容量存储设备1207。The server 1200 includes a central processing unit (CPU) 1201, a system memory 1204 including a random access memory (RAM) 1202 and a read only memory (ROM) 1203, and a system bus 1205 connecting the system memory 1204 and the central processing unit 1201. The server 1200 also includes a basic input/output system (Input/Output, I/O system) 1206 that helps to transfer information between various devices in the computer, and is used to store the operating system 1213, application programs 1214 and other program modules 1215 The mass storage device 1207.
所述基本输入/输出系统1206包括有用于显示信息的显示器1208和用于用户输入信息的诸如鼠标、键盘之类的输入设备1209。其中所述显示器1208和输入设备1209都通过连接到系统总线1205的输入输出控制器1210连接到中央处理单元1201。所述基本输入/输出系统1206还可以包括输入输出控制器1210以用于接收和处理来自键盘、鼠标、或电子触控笔等多个其他设备的输入。类似地,输入输出控制器1210还提供输出到显示屏、打印机或其他类型的输出设备。The basic input/output system 1206 includes a display 1208 for displaying information and input devices 1209 such as a mouse, keyboard, etc., for user input of information. The display 1208 and the input device 1209 are both connected to the central processing unit 1201 through the input and output controller 1210 connected to the system bus 1205. The basic input/output system 1206 may also include an input output controller 1210 for receiving and processing input from a number of other devices such as a keyboard, mouse, or electronic stylus. Similarly, input output controller 1210 also provides output to a display screen, printer, or other type of output device.
所述大容量存储设备1207通过连接到系统总线1205的大容量存储控制器(未示出)连接到中央处理单元1201。所述大容量存储设备1207及其相关联的计算机可读介质为服务器1200提供非易失性存储。也就是说,所述大容量存储设备1207可以包括诸如硬盘或者CD-ROM(Compact Disc Read-Only Memory,只读光盘)驱动器之类的计算机可读介质(未示出)。The mass storage device 1207 is connected to the central processing unit 1201 through a mass storage controller (not shown) connected to the system bus 1205 . The mass storage device 1207 and its associated computer-readable media provide non-volatile storage for the server 1200 . That is, the mass storage device 1207 may include a computer-readable medium (not shown) such as a hard disk or a CD-ROM (Compact Disc Read-Only Memory) drive.
不失一般性,所述计算机可读介质可以包括计算机存储介质和通信介质。计算机存储介质包括以用于存储诸如计算机可读指令、数据结构、程序模块或其他数据等信息的任何方法或技术实现的易失性和非易失性、可移动和不可移动介质。计算机存储介质包括RAM、ROM、EPROM(Electrical Programmable Read Only Memory,电动程控只读存储器)、EEPROM(Electrically Erasable Programmable Read Only Memory,带电可擦可编程只读存储器)、闪存或其他固态存储其技术,CD-ROM、DVD(Digital Video Disc,高密度数字视频光盘)或其他光学存储、磁带盒、磁带、磁盘存储或其他磁性存储设备。当然,本领域技术人员可知所述计算机存储介质不局限于上述几种。上述的系统存储器1204和大容量存储设备1207可以统称为存储器。Without loss of generality, the computer-readable media can include computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media include RAM, ROM, EPROM (Electrical Programmable Read Only Memory), EEPROM (Electrically Erasable Programmable Read Only Memory, electrically erasable programmable read only memory), flash memory or other solid-state storage technologies, CD-ROM, DVD (Digital Video Disc, High Density Digital Video Disc) or other optical storage, cassette, magnetic tape, magnetic disk storage or other magnetic storage device. Of course, those skilled in the art know that the computer storage medium is not limited to the above-mentioned ones. The system memory 1204 and the mass storage device 1207 described above may be collectively referred to as memory.
根据本申请的各种实施例,所述服务器1200还可以通过诸如因特网等网络连接到网络上的远程计算机运行。也即服务器1200可以通过连接在所述系统总线1205上的网络接口单元1211连接到网络1212,或者说,也可以使用网络接口单元1211来连接到其他类型的网络或远程计算机系统。According to various embodiments of the present application, the server 1200 may also be operated by connecting to a remote computer on the network through a network such as the Internet. That is, the server 1200 can be connected to the network 1212 through the network interface unit 1211 connected to the system bus 1205, or can also use the network interface unit 1211 to connect to other types of networks or remote computer systems.
本申请实施例还提供了一种计算机可读介质,该计算机可读介质存储有至少一条指令,所述至少一条指令由所述处理器加载并执行以实现如上各个实施例所述的图像的多标签分类方法。Embodiments of the present application further provide a computer-readable medium, where the computer-readable medium stores at least one instruction, and the at least one instruction is loaded and executed by the processor to realize the multiplexing of images according to the above embodiments. Label classification method.
需要说明的是:上述实施例提供的图像的多标签分类装置在执行图像的多标签分类方法时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将设备的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的图像的多标签分类装置与图像的多标签分类方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。It should be noted that when the apparatus for multi-label classification of images provided in the above embodiments executes the method for multi-label classification of images, only the division of the above-mentioned functional modules is used as an example for illustration. In practical applications, the above-mentioned functions may be allocated as required. It is completed by different functional modules, that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above. In addition, the apparatus for multi-label classification of images provided by the above embodiments and the embodiments of the multi-label classification method for images belong to the same concept, and the specific implementation process is detailed in the method embodiments, which will not be repeated here.
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。The above-mentioned serial numbers of the embodiments of the present application are only for description, and do not represent the advantages or disadvantages of the embodiments.
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。Those of ordinary skill in the art can understand that all or part of the steps of implementing the above embodiments can be completed by hardware, or can be completed by instructing relevant hardware through a program, and the program can be stored in a computer-readable storage medium. The storage medium mentioned may be a read-only memory, a magnetic disk or an optical disk, etc.
以上所述仅为本申请的能够实现的示例性的实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。The above descriptions are only exemplary embodiments that can be implemented in the present application, and are not intended to limit the present application. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present application shall be included in the within the scope of protection of this application.

Claims (20)

  1. 一种图像的多标签分类方法,其中,所述方法包括:A multi-label classification method for images, wherein the method comprises:
    通过标签分类模型中的特征提取层提取待处理图像的图像特征,所述标签分类模型是用于为所述待处理图像添加至少两个标签的神经网络模型;Extract the image features of the image to be processed through the feature extraction layer in the label classification model, where the label classification model is a neural network model for adding at least two labels to the image to be processed;
    通过图谱特征矩阵处理所述图像特征,获得待激活数据,所述图谱特征矩阵是知识图谱经过图卷积神经网络处理后得到的矩阵,所述知识图谱用于指示第一标签自身的属性,以及,至少两个所述第一标签之间的关系;The image features are processed by a graph feature matrix to obtain the data to be activated. The graph feature matrix is a matrix obtained after a knowledge graph is processed by a graph convolutional neural network, and the knowledge graph is used to indicate the attributes of the first label itself, and , the relationship between at least two of the first tags;
    通过所述标签分类模型中的激活层处理所述待激活数据,得到至少两个第二标签;The data to be activated is processed by the activation layer in the label classification model to obtain at least two second labels;
    将至少两个所述第二标签确定为所述待处理图像的标签,所述第二标签属于所述第一标签。At least two of the second tags are determined as tags of the image to be processed, and the second tags belong to the first tags.
  2. 根据权利要求1所述的方法,所述通过图谱特征矩阵处理所述图像特征,获得待激活数据,包括:The method according to claim 1, wherein the processing of the image features by the atlas feature matrix to obtain the data to be activated comprises:
    令所述图像特征矩阵和所述图谱特征矩阵相乘,得到待激活数据矩阵;Multiplying the image feature matrix and the atlas feature matrix to obtain a data matrix to be activated;
    所述通过所述标签分类模型中的激活层处理所述待激活数据,得到至少两个第二标签,包括:The to-be-activated data is processed by the activation layer in the label classification model to obtain at least two second labels, including:
    通过所述标签分类模型中的所述激活层处理所述待激活数据矩阵,得到至少两个所述第二标签。The data matrix to be activated is processed by the activation layer in the label classification model to obtain at least two second labels.
  3. 根据权利要求2所述的方法,所述知识图谱包括标签关系矩阵和节点信息矩阵,所述方法还包括:The method according to claim 2, wherein the knowledge graph comprises a label relationship matrix and a node information matrix, and the method further comprises:
    将所述标签关系矩阵输入所述图卷积神经网络,所述标签关系矩阵用于指示至少两个所述第一标签之间的关系;inputting the label relationship matrix into the graph convolutional neural network, where the label relationship matrix is used to indicate a relationship between at least two of the first labels;
    将所述节点信息矩阵输入所述图卷积神经网络,所述节点信息矩阵用于指示所述第一标签自身的属性;inputting the node information matrix into the graph convolutional neural network, where the node information matrix is used to indicate the attribute of the first label itself;
    通过所述图卷积神经网络处理所述标签关系矩阵和所述节点信息矩阵,获得所述图谱特征矩阵。The graph feature matrix is obtained by processing the label relationship matrix and the node information matrix through the graph convolutional neural network.
  4. 根据权利要求3所述的方法,所述方法还包括:The method of claim 3, further comprising:
    响应于所述知识图谱中的数据完成更新,获取更新后的知识图谱;Acquiring an updated knowledge graph in response to the data in the knowledge graph being updated;
    通过所述图卷积神经网络处理所述更新后的知识图谱,获得更新后的所述图谱特征矩阵;Process the updated knowledge graph through the graph convolutional neural network to obtain the updated graph feature matrix;
    通过所述更新后的所述图谱特征矩阵,更新所述标签分类模型中的所述图谱特征矩阵。The atlas feature matrix in the label classification model is updated through the updated atlas feature matrix.
  5. 根据权利要求3所述的方法,所述图谱特征矩阵的规模是C*N,其中,C是所述第一标签的个数,N是特征维数,C和N均为正整数。The method according to claim 3, wherein the scale of the graph feature matrix is C*N, where C is the number of the first labels, N is the feature dimension, and both C and N are positive integers.
  6. 根据权利要求2所述的方法,所述图像特征矩阵的规模是N*1,所述图谱特征矩阵的规模是C*N,所述待激活数据矩阵的规模是C*1,C是所述第一标签的个数,N是特征维数,C和N均为正整数。The method according to claim 2, wherein the size of the image feature matrix is N*1, the size of the graph feature matrix is C*N, the size of the data matrix to be activated is C*1, and C is the size of the The number of first labels, N is the feature dimension, and C and N are both positive integers.
  7. 根据权利要求1所述的方法,所述待处理图像包括第一图像和第二图像,所述方法还包括:The method of claim 1, wherein the images to be processed include a first image and a second image, the method further comprising:
    响应于所述第一图像和所述第二图像已获取各自对应的所述第二标签,获取所述第一图像和所述第二图像之间的拍摄时刻关系信息,所述拍摄时刻关系信息用于指示所述第一图像和所述第二图像在拍摄时刻上的时序关系,或者,所述拍摄时刻关系信息用于指示所述第一图像的拍摄时刻和所述第二图像的拍摄时刻之间的时长;In response to the first image and the second image having acquired the respective corresponding second labels, acquiring shooting time relationship information between the first image and the second image, the shooting time relationship information used to indicate the time sequence relationship between the first image and the second image at the shooting time, or the shooting time relationship information is used to indicate the shooting time of the first image and the shooting time of the second image the time between;
    响应于所述拍摄时刻关系信息符合预设条件,将目标第二标签增加为所述第二图像对应的所述第二标签,所述目标第二标签是对应于所述第一图像且不对应于所述第二图像的所述第二标签。In response to the photographing moment relationship information meeting a preset condition, the target second label is added to the second label corresponding to the second image, and the target second label corresponds to the first image and does not correspond to the second label on the second image.
  8. 根据权利要求7所述的方法,所述响应于所述拍摄时刻关系信息符合预设条件,将目标第二标签增加为所述第二图像对应的所述第二标签,包括:The method according to claim 7, wherein in response to the photographing moment relationship information meeting a preset condition, adding a target second label to the second label corresponding to the second image, comprising:
    响应于目标时长小于第二阈值,将所述目标第二标签增加为所述第二图像对应的所述第二标签,所述目标时长是所述第一图像的拍摄时刻和所述第二图像的拍摄时刻之间的时长。In response to the target duration being less than the second threshold, adding the target second label to the second label corresponding to the second image, where the target duration is the shooting moment of the first image and the second image of time between capture moments.
  9. 根据权利要求7所述的方法,所述响应于所述拍摄时刻关系信息符合预设条件,将目标第二标签增加为所述第二图像对应的所述第二标签,包括:The method according to claim 7, wherein in response to the photographing moment relationship information meeting a preset condition, adding a target second label to the second label corresponding to the second image, comprising:
    响应于所述第一图像的数量为2k张,在2k张所述第一图像中,k张所述第一图像是所述第二图像之前拍摄的图像,k张所述第一图像是所述第二图像之后拍摄的图像,获取所述第一图像的拍摄时刻;In response to the number of the first images being 2k, among the 2k first images, the k first images are images captured before the second image, and the k first images are the an image taken after the second image, and obtain the shooting moment of the first image;
    响应于2k张所述第一图像的拍摄时刻所在区间的长度小于第三阈值,将目标第二标签增加为所述第二图像对应的所述第二标签,所述目标第二标签是2k张所述第一图像均对应的标签且所述目标第二 标签是所述第二图像不对应的所述第二标签,k为大于或等于1的整数。In response to the length of the interval where the 2k first images are at the shooting time being less than the third threshold, the target second label is added to the second label corresponding to the second image, and the target second label is 2k Each of the first images corresponds to a label and the target second label is the second label not corresponding to the second image, and k is an integer greater than or equal to 1.
  10. 一种图像的多标签分类装置,其中,所述装置包括:A multi-label classification device for images, wherein the device includes:
    特征提取模块,用于通过标签分类模型中的特征提取层提取待处理图像的图像特征,所述标签分类模型是用于为所述待处理图像添加至少两个标签的神经网络模型;A feature extraction module for extracting image features of the image to be processed through a feature extraction layer in a label classification model, where the label classification model is a neural network model for adding at least two labels to the image to be processed;
    第一获取模块,用于通过图谱特征矩阵处理所述图像特征,获得待激活数据,所述图谱特征矩阵是知识图谱经过图卷积神经网络处理后得到的矩阵,所述知识图谱用于指示第一标签自身的属性,以及,至少两个所述第一标签之间的关系;The first acquisition module is used to process the image features through a graph feature matrix to obtain data to be activated. The graph feature matrix is a matrix obtained after the knowledge graph is processed by a graph convolutional neural network, and the knowledge graph is used to indicate the first An attribute of a tag itself, and a relationship between at least two of the first tags;
    标签获取模块,用于通过所述标签分类模型中的激活层处理所述待激活数据,得到至少两个第二标签;A label acquisition module, configured to process the data to be activated through the activation layer in the label classification model to obtain at least two second labels;
    标签确定模块,用于将至少两个所述第二标签确定为所述待处理图像的标签,所述第二标签属于所述第一标签。A label determination module, configured to determine at least two of the second labels as labels of the to-be-processed image, where the second labels belong to the first labels.
  11. 根据权利要求10所述的装置,The device of claim 10,
    所述第一获取模块,用于令所述图像特征矩阵和所述图谱特征矩阵相乘,得到待激活数据矩阵;The first acquisition module is used to multiply the image feature matrix and the atlas feature matrix to obtain a data matrix to be activated;
    所述标签获取模块,用于通过所述标签分类模型中的所述激活层处理所述待激活数据矩阵,得到至少两个所述第二标签。The label acquisition module is configured to process the data matrix to be activated through the activation layer in the label classification model to obtain at least two second labels.
  12. 根据权利要求11所述的装置,所述知识图谱包括标签关系矩阵和节点信息矩阵,所述装置还包括第一输入模块、第二输入模块和第二获取模块;The device according to claim 11, wherein the knowledge graph comprises a label relationship matrix and a node information matrix, the device further comprises a first input module, a second input module and a second acquisition module;
    所述第一输入模块,用于将所述标签关系矩阵输入所述图卷积神经网络,所述标签关系矩阵用于指示至少两个所述第一标签之间的关系;the first input module, configured to input the label relationship matrix into the graph convolutional neural network, where the label relationship matrix is used to indicate the relationship between at least two of the first labels;
    所述第二输入模块,用于将所述节点信息矩阵输入所述图卷积神经网络,所述节点信息矩阵用于指示所述第一标签自身的属性;The second input module is configured to input the node information matrix into the graph convolutional neural network, where the node information matrix is used to indicate the attribute of the first label itself;
    所述第二获取模块,用于通过所述图卷积神经网络处理所述标签关系矩阵和所述节点信息矩阵,获得所述图谱特征矩阵。The second obtaining module is configured to process the label relationship matrix and the node information matrix through the graph convolutional neural network to obtain the graph feature matrix.
  13. 根据权利要求12所述的装置,所述装置还包括第三获取模块、第四获取模块和矩阵更新模块;The apparatus according to claim 12, further comprising a third acquisition module, a fourth acquisition module and a matrix update module;
    所述第三获取模块,用于响应于所述知识图谱中的数据完成更新,获取更新后的知识图谱;the third obtaining module, configured to complete the update in response to the data in the knowledge graph, and obtain the updated knowledge graph;
    所述第四获取模块,用于通过所述图卷积神经网络处理所述更新后的知识图谱,获得更新后的所述图谱特征矩阵;The fourth acquisition module is configured to process the updated knowledge graph through the graph convolutional neural network to obtain the updated graph feature matrix;
    所述矩阵更新模块,用于通过所述更新后的所述图谱特征矩阵,更新所述标签分类模型中的所述图谱特征矩阵。The matrix updating module is configured to update the atlas feature matrix in the label classification model through the updated atlas feature matrix.
  14. 根据权利要求12所述的装置,所述图谱特征矩阵的规模是C*N,其中,C是所述第一标签的个数,N是特征维数,C和N均为正整数。The apparatus according to claim 12, wherein the scale of the graph feature matrix is C*N, where C is the number of the first labels, N is the feature dimension, and both C and N are positive integers.
  15. 根据权利要求11所述的装置,所述图像特征矩阵的规模是N*1,所述图谱特征矩阵的规模是C*N,所述待激活数据矩阵的规模是C*1,C是所述第一标签的个数,N是特征维数,C和N均为正整数。The apparatus according to claim 11, the scale of the image feature matrix is N*1, the scale of the graph feature matrix is C*N, the scale of the data matrix to be activated is C*1, and C is the scale of the The number of first labels, N is the feature dimension, and C and N are both positive integers.
  16. 根据权利要求10所述的装置,所述待处理图像包括第一图像和第二图像,所述装置还包括后处理模块,所述后处理模块,用于:The device according to claim 10, wherein the to-be-processed image comprises a first image and a second image, the device further comprises a post-processing module, the post-processing module is configured to:
    响应于所述第一图像和所述第二图像已获取各自对应的所述第二标签,获取所述第一图像和所述第二图像之间的拍摄时刻关系信息,所述拍摄时刻关系信息用于指示所述第一图像和所述第二图像在拍摄时刻上的时序关系,或者,所述拍摄时刻关系信息用于指示所述第一图像的拍摄时刻和所述第二图像的拍摄时刻之间的时长;In response to the first image and the second image having acquired the respective corresponding second labels, acquiring shooting time relationship information between the first image and the second image, the shooting time relationship information used to indicate the time sequence relationship between the first image and the second image at the shooting time, or the shooting time relationship information is used to indicate the shooting time of the first image and the shooting time of the second image the time between;
    响应于所述拍摄时刻关系信息符合预设条件,将目标第二标签增加为所述第二图像对应的所述第二标签,所述目标第二标签是对应于所述第一图像且不对应于所述第二图像的所述第二标签。In response to the photographing moment relationship information meeting a preset condition, the target second label is added to the second label corresponding to the second image, and the target second label corresponds to the first image and does not correspond to the second label on the second image.
  17. 根据权利要求16所述的装置,所述后处理模块,用于响应于目标时长小于第二阈值,将所述目标第二标签增加为所述第二图像对应的所述第二标签,所述目标时长是所述第一图像的拍摄时刻和所述第二图像的拍摄时刻之间的时长。The apparatus according to claim 16, wherein the post-processing module is configured to, in response to the target duration being less than a second threshold, add the target second label to the second label corresponding to the second image, the The target duration is the duration between the capture moment of the first image and the capture moment of the second image.
  18. 根据权利要求16所述的装置,所述后处理模块,用于:The apparatus of claim 16, the post-processing module for:
    响应于所述第一图像的数量为2k张,在2k张所述第一图像中,k张所述第一图像是所述第二图像之前拍摄的图像,k张所述第一图像是所述第二图像之后拍摄的图像,获取所述第一图像的拍摄时刻;In response to the number of the first images being 2k, among the 2k first images, the k first images are images captured before the second image, and the k first images are the an image taken after the second image, and obtain the shooting moment of the first image;
    响应于2k张所述第一图像的拍摄时刻所在区间的长度小于第三阈值,将目标第二标签增加为所述第二图像对应的所述第二标签,所述目标第二标签是2k张所述第一图像均对应的标签且所述目标第二标签是所述第二图像不对应的所述第二标签,k为大于或等于1的整数。In response to the length of the interval where the 2k first images are at the shooting time being less than the third threshold, the target second label is added to the second label corresponding to the second image, and the target second label is 2k Each of the first images corresponds to a label and the target second label is the second label not corresponding to the second image, and k is an integer greater than or equal to 1.
  19. 一种计算机设备,其中,所述计算机设备包括处理器、和与所述处理器相连的存储器,以及存储在所述存储器上的程序指令,所述处理器执行所述程序指令时实现如权利要求1至9任一所述的图像的多标签分类方法。A computer device, wherein the computer device comprises a processor, a memory connected to the processor, and program instructions stored on the memory, the processor implementing the program instructions as claimed in the claims The multi-label classification method of images according to any one of 1 to 9.
  20. 一种计算机可读存储介质,所述存储介质中存储有程序指令,其中,所述程序指令被如权利要求19所述的处理器执行时实现如权利要求1至9任一所述的图像的多标签分类方法。A computer-readable storage medium storing program instructions, wherein, when the program instructions are executed by the processor of claim 19, the image as claimed in any one of claims 1 to 9 is implemented. Multi-label classification methods.
PCT/CN2021/122741 2020-12-09 2021-10-09 Image multi-tag classification method and apparatus, computer device, and storage medium WO2022121485A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011451978.1A CN112487207A (en) 2020-12-09 2020-12-09 Image multi-label classification method and device, computer equipment and storage medium
CN202011451978.1 2020-12-09

Publications (1)

Publication Number Publication Date
WO2022121485A1 true WO2022121485A1 (en) 2022-06-16

Family

ID=74941444

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/122741 WO2022121485A1 (en) 2020-12-09 2021-10-09 Image multi-tag classification method and apparatus, computer device, and storage medium

Country Status (2)

Country Link
CN (1) CN112487207A (en)
WO (1) WO2022121485A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117332282A (en) * 2023-11-29 2024-01-02 之江实验室 Knowledge graph-based event matching method and device
CN117392470A (en) * 2023-12-11 2024-01-12 安徽中医药大学 Fundus image multi-label classification model generation method and system based on knowledge graph

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112487207A (en) * 2020-12-09 2021-03-12 Oppo广东移动通信有限公司 Image multi-label classification method and device, computer equipment and storage medium
CN112883731B (en) * 2021-04-29 2021-08-20 腾讯科技(深圳)有限公司 Content classification method and device
CN114312236B (en) * 2021-12-29 2024-02-09 上海瑾盛通信科技有限公司 Motion sickness relieving method and related products
CN114707004B (en) * 2022-05-24 2022-08-16 国网浙江省电力有限公司信息通信分公司 Method and system for extracting and processing case-affair relation based on image model and language model
CN116842479B (en) * 2023-08-29 2023-12-12 腾讯科技(深圳)有限公司 Image processing method, device, computer equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109816009A (en) * 2019-01-18 2019-05-28 南京旷云科技有限公司 Multi-tag image classification method, device and equipment based on picture scroll product
CN110084296A (en) * 2019-04-22 2019-08-02 中山大学 A kind of figure expression learning framework and its multi-tag classification method based on certain semantic
US20190362490A1 (en) * 2018-05-25 2019-11-28 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for inspecting corrosion defect of ladle
CN110807495A (en) * 2019-11-08 2020-02-18 腾讯科技(深圳)有限公司 Multi-label classification method and device, electronic equipment and storage medium
CN112487207A (en) * 2020-12-09 2021-03-12 Oppo广东移动通信有限公司 Image multi-label classification method and device, computer equipment and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019536137A (en) * 2016-10-25 2019-12-12 コーニンクレッカ フィリップス エヌ ヴェKoninklijke Philips N.V. Knowledge diagnosis based clinical diagnosis support
CN111291643B (en) * 2020-01-20 2023-08-22 北京百度网讯科技有限公司 Video multi-label classification method, device, electronic equipment and storage medium
CN111476315B (en) * 2020-04-27 2023-05-05 中国科学院合肥物质科学研究院 Image multi-label identification method based on statistical correlation and graph convolution technology

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190362490A1 (en) * 2018-05-25 2019-11-28 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for inspecting corrosion defect of ladle
CN109816009A (en) * 2019-01-18 2019-05-28 南京旷云科技有限公司 Multi-tag image classification method, device and equipment based on picture scroll product
CN110084296A (en) * 2019-04-22 2019-08-02 中山大学 A kind of figure expression learning framework and its multi-tag classification method based on certain semantic
CN110807495A (en) * 2019-11-08 2020-02-18 腾讯科技(深圳)有限公司 Multi-label classification method and device, electronic equipment and storage medium
CN112487207A (en) * 2020-12-09 2021-03-12 Oppo广东移动通信有限公司 Image multi-label classification method and device, computer equipment and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117332282A (en) * 2023-11-29 2024-01-02 之江实验室 Knowledge graph-based event matching method and device
CN117332282B (en) * 2023-11-29 2024-03-08 之江实验室 Knowledge graph-based event matching method and device
CN117392470A (en) * 2023-12-11 2024-01-12 安徽中医药大学 Fundus image multi-label classification model generation method and system based on knowledge graph
CN117392470B (en) * 2023-12-11 2024-03-01 安徽中医药大学 Fundus image multi-label classification model generation method and system based on knowledge graph

Also Published As

Publication number Publication date
CN112487207A (en) 2021-03-12

Similar Documents

Publication Publication Date Title
WO2022121485A1 (en) Image multi-tag classification method and apparatus, computer device, and storage medium
WO2021169723A1 (en) Image recognition method and apparatus, electronic device, and storage medium
US9633282B2 (en) Cross-trained convolutional neural networks using multimodal images
US11328172B2 (en) Method for fine-grained sketch-based scene image retrieval
CN110276406B (en) Expression classification method, apparatus, computer device and storage medium
US8712157B2 (en) Image quality assessment
Murray et al. A deep architecture for unified aesthetic prediction
WO2021248859A1 (en) Video classification method and apparatus, and device, and computer readable storage medium
JP2020524348A (en) Face image retrieval method and system, photographing device, and computer storage medium
US20140198986A1 (en) System and method for image selection using multivariate time series analysis
CN113434716B (en) Cross-modal information retrieval method and device
KR101832680B1 (en) Searching for events by attendants
CN113761359B (en) Data packet recommendation method, device, electronic equipment and storage medium
CN112200041A (en) Video motion recognition method and device, storage medium and electronic equipment
CN112069338A (en) Picture processing method and device, electronic equipment and storage medium
WO2021185184A1 (en) Content recommendation method and apparatus, electronic device, and storage medium
US20230072445A1 (en) Self-supervised video representation learning by exploring spatiotemporal continuity
CN110992217A (en) Method and device for expressing and searching multi-view features of design patent
Kapadia et al. Improved CBIR system using Multilayer CNN
US11816181B2 (en) Blur classification and blur map estimation
CN111353063B (en) Picture display method, device and storage medium
CN113590854A (en) Data processing method, data processing equipment and computer readable storage medium
CN112100427A (en) Video processing method and device, electronic equipment and storage medium
CN112084371A (en) Film multi-label classification method and device, electronic equipment and storage medium
Heng et al. Personalized knowledge distillation-based mobile food recognition

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21902193

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21902193

Country of ref document: EP

Kind code of ref document: A1