CN113723513A

CN113723513A - Multi-label image classification method and device and related equipment

Info

Publication number: CN113723513A
Application number: CN202111011719.1A
Authority: CN
Inventors: 张玉琪
Original assignee: Ping An International Smart City Technology Co Ltd
Current assignee: Ping An International Smart City Technology Co Ltd
Priority date: 2021-08-31
Filing date: 2021-08-31
Publication date: 2021-11-30
Anticipated expiration: 2041-08-31
Also published as: CN113723513B

Abstract

The application relates to the technical field of artificial intelligence and digital medical treatment, and provides a multi-label image classification method, a multi-label image classification device, computer equipment and a storage medium, wherein the multi-label image classification method comprises the following steps: calling a semantic conversion model to process the labels to obtain label semantic vectors corresponding to the labels; calling a feature extraction model to process the classified images to obtain feature semantic vectors; calculating an actual correlation value according to the label semantic vector and the feature semantic vector, taking the classified image as an input vector, and taking a label corresponding to the classified image as an output vector to train a multi-label image classification model; calling a multi-label image classification model to process an image to be classified to obtain an initial label set; calling a semantic conversion model to process the initial label set to obtain a target label semantic vector corresponding to each initial label in the initial label set; and acquiring the semantic relation between the semantic vectors of each target label, and outputting a target label set corresponding to the image to be classified according to the semantic relation. The method and the device can improve the accuracy of multi-label image classification.

Description

Multi-label image classification method and device and related equipment

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a multi-label image classification method, apparatus, computer device, and medium.

Background

With the development of computer vision technology, image classification has been widely used. Multi-tag image classification is a very common visual computing problem for automatically generating descriptions containing multiple tags for a single picture, for example, automatically identifying various objects (e.g., pedestrians, animals, trees, etc.) and scene-related descriptions (e.g., blue sky, white clouds, sunrise, etc.) in an image containing a complex scene.

In the course of implementing the present application, the applicant has found that the following problems exist in the prior art: the existing multi-label image classification is solved by converting a multi-label image classification model into a target detection problem, so that a large number of picture samples are needed, the difficulty of data annotation is greatly increased, and the model training effect cannot be ensured under the condition that picture samples are lacked in an actual scene, so that the accuracy of multi-label image classification is low; in addition, the above method may not be able to classify and convert the multi-labeled image into the object detection problem, for example, when the attribute to be identified is a state or a style, such a category is often determined by combining the whole image, rather than combining a part of the image in the object detection model, which results in low accuracy of multi-labeled image classification.

Therefore, it is necessary to provide a multi-label image classification method capable of improving the accuracy of multi-label image classification.

Disclosure of Invention

In view of the foregoing, there is a need for a multi-label image classification method, a multi-label image classification apparatus, a computer device, and a medium, which can improve the accuracy of multi-label image classification.

A first aspect of an embodiment of the present application provides a multi-label image classification method, where the multi-label image classification method includes:

acquiring a label in a classified image labeled with a label in advance, and calling a pre-trained semantic conversion model to process the label to obtain a label semantic vector corresponding to the label;

calling a pre-trained feature extraction model to process the classified images to obtain feature semantic vectors corresponding to the classified images;

calculating an actual correlation value according to the tag semantic vector and the feature semantic vector;

taking the classified image as an input vector, taking a label corresponding to the classified image as an output vector to train a multi-label image classification model, wherein a loss function of the multi-label image classification model is determined according to the actual correlation value and a preset target correlation value, and the loss function is converged as a target until the multi-label image classification model is trained;

calling the multi-label image classification model to process the image to be classified to obtain an initial label set contained in the image to be classified;

calling the semantic conversion model to process the initial label set to obtain a target label semantic vector corresponding to each initial label in the initial label set;

and acquiring semantic relation among the semantic vectors of each target label, and outputting a target label set corresponding to the image to be classified according to the semantic relation.

Further, in the multi-label image classification method provided in the embodiment of the present application, the calling a pre-trained semantic conversion model to process the label to obtain a label semantic vector corresponding to the label includes:

acquiring a target attribute corresponding to the label, wherein the target attribute comprises a label name and a spatial position of the label in the classified image;

combining the target attributes according to a preset data format to obtain a target attribute sequence;

and calling a pre-trained semantic conversion model to process the target attribute sequence to obtain a label semantic vector of a target dimension corresponding to the label.

Further, in the multi-label image classification method provided in the embodiment of the present application, before the step of calling a pre-trained feature extraction model to process the classified image and obtaining a feature semantic vector corresponding to the classified image, the method further includes:

processing the classified images in a gray scale manner to obtain target classified images;

acquiring a plurality of pre-marked area labels corresponding to the classified target images, and determining a target area set of the classified target images according to the area labels;

extracting a feature corpus corresponding to each target area in the target area set, and converting the feature corpus into a feature semantic vector of the target dimension;

and training an initial neural network by taking the classified images as input vectors and the feature semantic vectors corresponding to the classified images as output vectors to obtain a trained feature extraction model.

Further, in the above multi-tag image classification method provided in an embodiment of the present application, the calculating an actual correlation value according to the tag semantic vector and the feature semantic vector includes:

multiplying the tag semantic vector and the feature semantic vector to obtain an initial value;

and calling a preset function to process the initial value to obtain an actual correlation value.

Further, in the multi-tag image classification method provided in an embodiment of the present application, the obtaining a semantic relationship between each of the target tag semantic vectors includes:

calculating a similarity value between semantic vectors of each target label;

acquiring a target interval to which the similarity value belongs;

and traversing the mapping relation between the preset interval and the semantic relation according to the target interval to obtain the target semantic relation corresponding to the target interval.

Further, in the multi-label image classification method provided in the embodiment of the present application, the training process of the semantic conversion model includes:

acquiring a training sample which takes a target attribute corresponding to a label as input data and takes a label semantic vector corresponding to the label as output data;

splitting the training sample into a training set and a test set according to a preset splitting ratio;

inputting the training set into an initial neural network model to obtain an initial semantic conversion model;

inputting the test set into the initial semantic conversion model, and calculating the accuracy of the model;

detecting whether the accuracy rate exceeds a preset accuracy rate threshold value;

and when the detection result shows that the accuracy rate exceeds the preset accuracy rate threshold value, determining that the training of the semantic conversion model is finished.

Further, in the above multi-label image classification method provided in an embodiment of the present application, the outputting a target label corresponding to the image to be classified according to the semantic relationship includes:

acquiring a label set corresponding to the semantic relation;

acquiring a preset label format corresponding to the semantic relation;

arranging the label sets according to the preset label format to obtain a target label set;

and outputting the target label set.

A second aspect of the embodiments of the present application further provides a multi-label image classification device, including:

the label obtaining module is used for obtaining labels in the classified images labeled with the labels in advance, calling a pre-trained semantic conversion model to process the labels, and obtaining label semantic vectors corresponding to the labels;

the image processing module is used for calling a pre-trained feature extraction model to process the classified images to obtain feature semantic vectors corresponding to the classified images;

the correlation calculation module is used for calculating an actual correlation value according to the label semantic vector and the feature semantic vector;

the model training module is used for training a multi-label image classification model by taking the classified images as input vectors and taking labels corresponding to the classified images as output vectors, wherein a loss function of the multi-label image classification model is determined according to the actual correlation value and a preset target correlation value, and the loss function is converged as a target until the multi-label image classification model is trained;

the label determining module is used for calling the multi-label image classification model to process the image to be classified to obtain an initial label set contained in the image to be classified;

the tag calling module is used for calling the semantic conversion model to process the initial tag set to obtain a target tag semantic vector corresponding to each initial tag in the initial tag set;

and the label output module is used for acquiring the semantic relation between the semantic vectors of each target label and outputting the target label set corresponding to the image to be classified according to the semantic relation.

The third aspect of the embodiments of the present application further provides a computer device, which includes a processor, and the processor is configured to implement the multi-label image classification method according to any one of the above items when executing the computer program stored in the memory.

The fourth aspect of the embodiments of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements any one of the multi-label image classification methods described above.

According to the multi-label image classification method, the multi-label image classification device, the computer equipment and the computer readable storage medium provided by the embodiment of the application, the loss function of the multi-label image classification model is determined by using the actual correlation value obtained by calculating the label semantic vector and the feature semantic vector and the preset target correlation value in the training process of the multi-label image classification model, and the loss function is converged as the target until the training of the multi-label image classification model is completed. The label semantic vector of the image is fused with the characteristic semantic vector of the image, so that the problem of poor model training effect caused by large data annotation difficulty due to a large number of image samples in an actual scene can be solved, and the accuracy of multi-label image classification is improved; in addition, the label semantic vector of the image is fused with the characteristic semantic vector of the image, the semantic relation between the labels can be determined through the label semantic vector, the labels are output according to the semantic relation, and the semantic relation between the labels can be more clearly and visually expressed. The method and the system can be applied to various functional modules of smart cities such as digital medical treatment and smart traffic, for example, a multi-label image classification module of smart government affairs can promote the rapid development of the smart cities.

Drawings

Fig. 1 is a flowchart of a multi-label image classification method according to an embodiment of the present application.

Fig. 2 is a structural diagram of a multi-label image classification apparatus according to a second embodiment of the present application.

Fig. 3 is a schematic structural diagram of a computer device provided in the third embodiment of the present application.

The following detailed description will further illustrate the present application in conjunction with the above-described figures.

Detailed Description

In order that the above objects, features and advantages of the present application can be more clearly understood, a detailed description of the present application will be given below with reference to the accompanying drawings and specific embodiments. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.

In the following description, numerous specific details are set forth to provide a thorough understanding of the present application, and the described embodiments are a part, but not all, of the present application.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein in the description of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application.

The embodiment of the application can acquire and process related data based on an artificial intelligence technology. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.

The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

The multi-label image classification method provided by the embodiment of the invention is executed by computer equipment, and correspondingly, the multi-label image classification device runs in the computer equipment.

Fig. 1 is a flowchart of a multi-label image classification method according to a first embodiment of the present application. As shown in fig. 1, the multi-label image classification method may include the following steps, and the order of the steps in the flowchart may be changed and some may be omitted according to different requirements:

s11, obtaining the labels in the classified images labeled with the labels in advance, and calling a pre-trained semantic conversion model to process the labels to obtain the label semantic vectors corresponding to the labels.

In at least one embodiment of the application, a training set is preset and used for training an initial neural network model to obtain a multi-label image classification model, the training set is a classified image labeled with labels in advance, and the number of the training set can be set according to model training requirements. The training set is stored in a preset database, and the preset database can be a certain target node in the block chain in consideration of privacy and reliability of data storage. The label may refer to an image feature after standardized processing included in a single image, for example, a label included in one image may be a cat, a person, a child, a stool, a long sleeve, a tree, spring, winter, or the like; for another example, when the image is a medical image, the label included in the medical image may be a doctor, a patient, a medical instrument, a body organ, or the like, and is not limited herein. The number of the labels included in one image may be single or multiple, and the present application takes the example that the number of the labels included in one image is multiple. The semantic conversion model is used for converting the labels into the embedding vectors of target dimensions, and the target dimensions are preset dimensions. The semantic conversion model can be a Bert model or a Word2vec model and the like.

Optionally, the calling a pre-trained semantic conversion model to process the tag, and obtaining a tag semantic vector corresponding to the tag includes:

The target attributes are attributes which are preset by system personnel and used for identifying tags, and include, but are not limited to, tag names and spatial positions of the tags in the classified images, wherein the tag names can be names of people, cats, trees, long sleeves and the like; the spatial position of the tag in the classified image is the position information of the object corresponding to the tag in the image, for example, the spatial position of the tag in the classified image is identified by means of two-dimensional coordinates. The preset data format is a preset format for combining the target attributes, for example, a target attribute sequence is obtained by combining the tag types, the tag lengths and the arrangement sequence of the spatial positions of the tags in the classified images.

Optionally, the training process of the semantic conversion model includes:

inputting the test set into the initial semantic conversion model, and calculating the accuracy of the initial semantic conversion model;

The preset splitting ratio is a preset ratio of splitting the training set and the test set, for example, the preset splitting ratio may be 8:2, and is not limited herein. The preset accuracy threshold is a preset threshold for evaluating the accuracy of the model, for example, the preset accuracy threshold may be 85%, and is not limited herein.

And S12, calling a pre-trained feature extraction model to process the classified images to obtain feature semantic vectors corresponding to the classified images.

In at least one embodiment of the present application, the feature extraction model is used to extract image features in the classified images, and the feature extraction model may be a Resnet model or an Efficientnet model. The feature semantic vector is an embedding vector corresponding to the feature in the image. The feature semantic vector is the same dimension as the tag semantic vector. In an embodiment, a full connection layer may be added to the feature extraction model, such as a Resnet model or an Efficientnet model, and the full connection layer is used to control the feature semantic vector and the tag semantic vector to have the same dimension.

Optionally, the training process of the feature extraction model includes:

The classified images are subjected to gray level processing, so that the problem of uneven brightness distribution of the classified images can be solved, and the effect of increasing the definition of the classified images is achieved. The classified target image includes a plurality of pre-marked area labels, and the area labels may be numeric labels, letter labels, or color labels, which is not limited herein. Each of the region labels represents a region, and when the target classified image includes 3 region labels, the target classified image includes 3 regions. The classified images may be images composed of different target areas, each of which may represent a different feature of the picture, and the target areas may be cat, child, stool, tree, etc. For each target region, a corresponding feature corpus exists, and the feature corpus may refer to a proportional feature corpus, a geometric feature corpus, a location feature corpus, and the like of an entity in the target region. For example, when the target region is a cat, the feature corpus may be part scale features of the cat, shape features of the cat, and a position feature of the cat in the classified image. And converting the characteristic linguistic data to obtain a characteristic semantic vector. Illustratively, the feature corpus is processed by calling a Bert model or a Word2vec model, so that a feature semantic vector can be obtained.

And training an initial neural network by taking the classified images as input vectors and the feature semantic vectors corresponding to the classified images as output vectors as training data and test data, so as to obtain a trained feature extraction model. Taking the initial neural network model as an example of a ResNet model, the training process of the model can input the training data into the ResNet model, obtain a trained feature extraction model after multiple convolutions, multiple pooling and multiple activation of the initial neural network model, and call the feature extraction model to process the classified images, so as to obtain feature semantic vectors corresponding to feature semantic vector images corresponding to the classified images.

S13, calculating an actual correlation value according to the label semantic vector and the feature semantic vector.

In at least one embodiment of the present application, the actual correlation value is used to evaluate a degree of correlation between the tag semantic vector and the feature semantic vector, and in an embodiment, when the actual correlation value is greater than 0.5, it is determined that the degree of correlation between the tag semantic vector and the feature semantic vector is high, that is, the classified image includes a tag corresponding to the tag semantic vector; when the actual correlation value is less than 0.5, determining that the degree of correlation between the tag semantic vector and the feature semantic vector is low, that is, the classified image does not contain a tag corresponding to the tag semantic vector. The loss function is a cross-entropy function consisting of the actual correlation value and the target correlation value.

Optionally, the calculating an actual correlation value according to the tag semantic vector and the feature semantic vector comprises:

The preset function can be a Sigmod function, and the Sigmoid function is also called a Logistic function, is used for hidden layer neuron output, and has a value range of (0, 1). Exemplarily, taking the tag semantic vector as a 512-dimensional vector, the tag semantic vector may be (x1, y1, z1, …, n1), the feature semantic vector may have the same dimension as the tag semantic vector, the feature semantic vector may be b ═ x2, y2, z2, …, n2, and a and b are multiplied to obtain an initial value c ═ x2+ y1y2+ z1z2+ … + n1n 2. And calling the Sigmod function to process the initial value to obtain an actual correlation value.

S14, the classified images are used as input vectors, labels corresponding to the classified images are used as output vectors to train a multi-label image classification model, wherein a loss function of the multi-label image classification model is determined according to the actual correlation value and a preset target correlation value, the loss function is converged as a target, and the multi-label image classification model is trained completely.

In at least one embodiment of the present application, the classified image is used as an input vector, the label corresponding to the classified image is used as an output vector to generate training data and test data, and the training data is input into an initial neural network model for training to obtain an initial multi-label image classification model; inputting the test data into the initial multi-label image classification model for testing, and calculating a loss function of the model according to an actual correlation value corresponding to the test data and a preset target correlation value; judging whether the loss function is converged; when the loss function is converged, determining that the multi-label image classification model is trained; and when the loss function is not converged, adding training data to retrain the initial multi-label image classification model until the loss function is converged. Judging whether the loss function is converged belongs to the prior art, and is not described herein again.

S15, calling the multi-label image classification model to process the image to be classified, and obtaining an initial label set contained in the image to be classified.

In at least one embodiment of the present application, the image to be classified is an image without a label, the multi-label image classification model is called to process the image to be classified, an initial label set included in the image to be classified can be obtained, and the number of the initial labels in the initial label set may be one or multiple. Each initial label in the initial label set contains a corresponding target attribute, the target attribute can be represented by adding a mark, and the target attribute comprises a label name and a spatial position of the label in the classified image.

And S16, calling the semantic conversion model to process the initial label set to obtain a target label semantic vector corresponding to each initial label in the initial label set.

In at least one embodiment of the present application, the invoking the semantic conversion model to process the initial tag set to obtain a target tag semantic vector corresponding to each initial tag in the initial tag set includes:

acquiring a target attribute corresponding to each initial label in the initial label set, wherein the target attribute comprises a label name and a spatial position of the label in the classified image;

And querying a mark carried by each initial label in the initial label set to obtain a target attribute corresponding to each initial label.

S17, obtaining semantic relations among the semantic vectors of each target label, and outputting a target label set corresponding to the image to be classified according to the semantic relations.

In at least one embodiment of the present application, for example, the labels cat, person, child, stool, long sleeve, leisure, tree, spring, winter, there is a semantic relationship between the labels, e.g., "child" is a subclass of "person," spring "and" winter "are opposite categories, and" long sleeve "will typically appear with" person ". The semantic relationship can be an affiliation, an opponent and an association between the tags. According to the method and the device, the semantic relation among the plurality of initial label semantic vectors is analyzed, so that the initial labels of the images to be classified are subjected to semantic description, label classification information is obtained, and the accuracy of multi-label image classification can be improved.

Optionally, the obtaining the semantic relationship between each of the target tag semantic vectors includes:

calculating a similarity value between semantic vectors of each target label;

acquiring a target interval to which the similarity value belongs;

Wherein the range of similarity values is (0, 1). And a mapping relation exists between the interval and the semantic relation, and a target semantic relation corresponding to the target interval can be obtained by inquiring the mapping relation.

Optionally, the outputting the target tag set corresponding to the image to be classified according to the semantic relationship includes:

acquiring a label set corresponding to the semantic relation;

acquiring a preset label format corresponding to the semantic relation;

and outputting the target label set.

And for different semantic relationships, corresponding preset label formats exist. The preset label format is a preset data format arranged between labels, and the semantic relationship between the labels can be visually expressed by arranging the labels with the semantic relationship according to the preset label format.

In the multi-label image classification method provided by the embodiment of the application, a loss function of the multi-label image classification model is determined by using an actual correlation value obtained by calculating the label semantic vector and the feature semantic vector and a preset target correlation value in the training process of the multi-label image classification model, and the loss function is converged as a target until the training of the multi-label image classification model is completed. The label semantic vector of the image is fused with the characteristic semantic vector of the image, so that the problem of poor model training effect caused by large data annotation difficulty due to a large number of image samples in an actual scene can be solved, and the accuracy of multi-label image classification is improved; in addition, the label semantic vector of the image is fused with the characteristic semantic vector of the image, the semantic relation between the labels can be determined through the label semantic vector, the labels are output according to the semantic relation, and the semantic relation between the labels can be more clearly and visually expressed. The method and the system can be applied to various functional modules of smart cities such as digital medical treatment and smart traffic, for example, a multi-label image classification module of smart government affairs can promote the rapid development of the smart cities.

In some embodiments, the multi-label image classification apparatus 20 may include a plurality of functional modules composed of computer program segments. The computer program of each program segment in the multi-label image classification apparatus 20 may be stored in a memory of a computer device and executed by at least one processor to perform the function of multi-label image classification (described in detail in fig. 1).

In this embodiment, the multi-label image classification apparatus 20 may be divided into a plurality of functional modules according to the functions performed by the multi-label image classification apparatus. The functional module may include: the label acquiring module 201, the image processing module 202, the correlation calculating module 203, the model training module 204, the label determining module 205, the label calling module 206 and the label outputting module 207. A module as referred to herein is a series of computer program segments capable of being executed by at least one processor and capable of performing a fixed function and is stored in a memory. In the present embodiment, the functions of the modules will be described in detail in the following embodiments.

The label obtaining module 201 may be configured to obtain a label in a classified image labeled with a label in advance, and call a pre-trained semantic conversion model to process the label, so as to obtain a label semantic vector corresponding to the label.

Optionally, the training process of the semantic conversion model includes:

The image processing module 202 may be configured to invoke a pre-trained feature extraction model to process the classified image, so as to obtain a feature semantic vector corresponding to the classified image.

Optionally, the training process of the feature extraction model includes:

The relevance computation module 203 may be configured to compute an actual relevance value from the tag semantic vector and the feature semantic vector.

The model training module 204 may be configured to train a multi-label image classification model using the classified images as input vectors and labels corresponding to the classified images as output vectors, where a loss function of the multi-label image classification model is determined according to the actual correlation value and a preset target correlation value, and the loss function is converged as a target until the training of the multi-label image classification model is completed.

The label determining module 205 may be configured to invoke the multi-label image classification model to process an image to be classified, so as to obtain an initial label set included in the image to be classified.

In at least one embodiment of the present application, the image to be classified is an image without a label, the multi-label image classification model is called to process the image to be classified, an initial label set included in the image to be classified can be obtained, and the number of the initial labels in the initial label set may be one or multiple. Each initial label in the initial label set contains a corresponding target attribute, the target attribute can be represented by adding a mark, and the target attribute comprises a label category, a label length and a spatial position of the label in the classified image.

The tag calling module 206 may be configured to call the semantic conversion model to process the initial tag set, so as to obtain a target tag semantic vector corresponding to each initial tag in the initial tag set.

The tag output module 207 may be configured to obtain a semantic relationship between each target tag semantic vector, and output a target tag set corresponding to the image to be classified according to the semantic relationship.

calculating a similarity value between semantic vectors of each target label;

acquiring a target interval to which the similarity value belongs;

acquiring a label set corresponding to the semantic relation;

acquiring a preset label format corresponding to the semantic relation;

and outputting the target label set.

Fig. 3 is a schematic structural diagram of a computer device according to a third embodiment of the present application. In the preferred embodiment of the present application, the computer device 3 includes a memory 31, at least one processor 32, at least one communication bus 33, and a transceiver 34.

It will be appreciated by those skilled in the art that the configuration of the computer device shown in fig. 3 is not a limitation of the embodiments of the present application, and may be a bus-type configuration or a star-type configuration, and that the computer device 3 may include more or less hardware or software than those shown, or a different arrangement of components.

In some embodiments, the computer device 3 is a device capable of automatically performing numerical calculation and/or information processing according to instructions set or stored in advance, and the hardware includes but is not limited to a microprocessor, an application specific integrated circuit, a programmable gate array, a digital processor, an embedded device, and the like. The computer device 3 may also include a client device, which includes, but is not limited to, any electronic product capable of interacting with a client through a keyboard, a mouse, a remote controller, a touch pad, or a voice control device, for example, a personal computer, a tablet computer, a smart phone, a digital camera, etc.

It should be noted that the computer device 3 is only an example, and other existing or future electronic products, such as those that may be adapted to the present application, are also included in the scope of the present application and are incorporated herein by reference.

In some embodiments, the memory 31 has stored therein a computer program which, when executed by the at least one processor 32, implements all or part of the steps of the multi-label image classification method as described. The Memory 31 includes a Read-Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), a One-time Programmable Read-Only Memory (OTPROM), an electronically Erasable rewritable Read-Only Memory (Electrically-Erasable Programmable Read-Only Memory (EEPROM)), an optical Read-Only disk (CD-ROM) or other optical disk Memory, a magnetic disk Memory, a tape Memory, or any other medium readable by a computer capable of carrying or storing data.

Further, the computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the blockchain node, and the like.

The block chain referred by the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

In some embodiments, the at least one processor 32 is a Control Unit (Control Unit) of the computer device 3, connects various components of the entire computer device 3 by using various interfaces and lines, and executes various functions and processes data of the computer device 3 by running or executing programs or modules stored in the memory 31 and calling data stored in the memory 31. For example, the at least one processor 32, when executing the computer program stored in the memory, implements all or part of the steps of the multi-label image classification method described in the embodiments of the present application; or implement all or part of the functionality of the multi-label image classification apparatus. The at least one processor 32 may be composed of an integrated circuit, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips.

In some embodiments, the at least one communication bus 33 is arranged to enable connection communication between the memory 31 and the at least one processor 32 or the like.

Although not shown, the computer device 3 may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor 32 through a power management device, so as to implement functions of managing charging, discharging, and power consumption through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The computer device 3 may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.

The integrated unit implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a computer device, or a network device) or a processor (processor) to execute parts of the methods according to the embodiments of the present application.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

In addition, functional modules in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.

It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or that the singular does not exclude the plural. A plurality of units or means recited in the specification may also be implemented by one unit or means through software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.

Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present application and not for limiting, and although the present application is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions can be made on the technical solutions of the present application without departing from the spirit and scope of the technical solutions of the present application.

Claims

1. A multi-label image classification method is characterized by comprising the following steps:

2. The multi-label image classification method according to claim 1, wherein the calling a pre-trained semantic conversion model to process the labels to obtain the label semantic vectors corresponding to the labels comprises:

3. The multi-label image classification method according to claim 2, wherein before the step of calling a pre-trained feature extraction model to process the classified image to obtain the feature semantic vector corresponding to the classified image, the method further comprises:

4. The multi-label image classification method according to claim 1, wherein the calculating an actual correlation value from the label semantic vector and the feature semantic vector comprises:

5. The multi-label image classification method according to claim 1, wherein the obtaining the semantic relationship between each of the target label semantic vectors comprises:

calculating a similarity value between semantic vectors of each target label;

acquiring a target interval to which the similarity value belongs;

6. The multi-label image classification method according to claim 1, wherein the training process of the semantic conversion model comprises:

7. The multi-label image classification method according to claim 1, wherein the outputting the target label set corresponding to the image to be classified according to the semantic relationship comprises:

acquiring a label set corresponding to the semantic relation;

acquiring a preset label format corresponding to the semantic relation;

and outputting the target label set.

8. A multi-label image classification apparatus, characterized in that the multi-label image classification apparatus comprises:

9. A computer device, characterized in that the computer device comprises a processor for implementing the multi-label image classification method according to any one of claims 1 to 7 when executing a computer program stored in a memory.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out a multi-label image classification method according to any one of claims 1 to 7.