CN113723513B

CN113723513B - Multi-label image classification method and device and related equipment

Info

Publication number: CN113723513B
Application number: CN202111011719.1A
Authority: CN
Inventors: 张玉琪
Original assignee: Ping An International Smart City Technology Co Ltd
Current assignee: Ping An International Smart City Technology Co Ltd
Priority date: 2021-08-31
Filing date: 2021-08-31
Publication date: 2024-05-03
Anticipated expiration: 2041-08-31
Also published as: CN113723513A

Abstract

The application relates to the technical field of artificial intelligence and digital medical treatment, and provides a multi-label image classification method, a device, computer equipment and a storage medium, wherein the method comprises the following steps: invoking a semantic conversion model to process the label to obtain a label semantic vector corresponding to the label; invoking a feature extraction model to process the classified images to obtain feature semantic vectors; calculating an actual correlation value according to the label semantic vector and the feature semantic vector, taking the classified image as an input vector, and taking a label corresponding to the classified image as an output vector to train a multi-label image classification model; calling a multi-label image classification model to process an image to be classified to obtain an initial label set; invoking a semantic conversion model to process the initial tag set to obtain a target tag semantic vector corresponding to each initial tag in the initial tag set; and acquiring semantic relations among semantic vectors of each target label, and outputting a target label set corresponding to the image to be classified according to the semantic relations. The method and the device can improve the accuracy of multi-label image classification.

Description

Multi-label image classification method and device and related equipment

Technical Field

The present application relates to the field of artificial intelligence, and in particular, to a method, an apparatus, a computer device, and a medium for classifying multi-label images.

Background

With the development of computer vision technology, image classification has been widely used. Multi-label image classification is a very common type of visual computing problem that is used to automatically generate descriptions containing multiple labels for a single picture, e.g., automatically identifying multiple objects (e.g., pedestrians, animals, trees, etc.) and scene-related descriptions (e.g., blue sky, white cloud, sunrise, etc.) in an image containing a complex scene.

In carrying out the present application, the applicant has found that the prior art has the following problems: the existing multi-label image classification is realized by converting a multi-label image classification model into a target detection problem, so that not only is a large number of picture samples needed, but also the difficulty of data annotation is greatly increased, and the model training effect cannot be ensured under the condition that the picture samples are lacking in an actual scene, so that the accuracy of multi-label image classification is lower; in addition, the above method may not convert the multi-label image classification into the target detection problem, for example, when the attribute to be resolved is a state or a style, often such a classification needs to be combined with the whole image to determine, instead of combining a part of the image in the target detection model, which results in lower accuracy of multi-label image classification.

Therefore, it is necessary to provide a multi-label image classification method capable of improving accuracy of multi-label image classification.

Disclosure of Invention

In view of the foregoing, there is a need for a multi-label image classification method, a multi-label image classification apparatus, a computer device, and a medium that can improve the accuracy of multi-label image classification.

An embodiment of the present application provides a multi-label image classification method, including:

Obtaining labels in the classified images with the labels marked in advance, and calling a pre-trained semantic conversion model to process the labels to obtain label semantic vectors corresponding to the labels;

Invoking a pre-trained feature extraction model to process the classified image to obtain a feature semantic vector corresponding to the classified image;

Calculating an actual correlation value according to the tag semantic vector and the feature semantic vector;

Taking the classified image as an input vector, taking a label corresponding to the classified image as an output vector to train a multi-label image classification model, wherein a loss function of the multi-label image classification model is determined according to the actual correlation value and a preset target correlation value, and the loss function is taken as a target for convergence until the multi-label image classification model training is completed;

Invoking the multi-label image classification model to process an image to be classified to obtain an initial label set contained in the image to be classified;

invoking the semantic conversion model to process the initial tag set to obtain a target tag semantic vector corresponding to each initial tag in the initial tag set;

and acquiring semantic relations among semantic vectors of each target label, and outputting a target label set corresponding to the image to be classified according to the semantic relations.

Further, in the multi-label image classification method provided by the embodiment of the present application, the step of calling a pre-trained semantic conversion model to process the label, and the step of obtaining a label semantic vector corresponding to the label includes:

Acquiring a target attribute corresponding to the label, wherein the target attribute comprises a label name and a spatial position of the label in the classified image;

Combining the target attributes according to a preset data format to obtain a target attribute sequence;

And calling a pre-trained semantic conversion model to process the target attribute sequence to obtain a label semantic vector of a target dimension corresponding to the label.

Further, in the above multi-label image classification method provided by the embodiment of the present application, before the invoking the feature extraction model trained in advance to process the classified image to obtain the feature semantic vector corresponding to the classified image, the method further includes:

gray processing the classified image to obtain a target classified image;

acquiring a plurality of area labels which correspond to the target classified images and are marked in advance, and determining a target area set of the target classified images according to the area labels;

Extracting feature corpus corresponding to each target region in the target region set, and converting the feature corpus into feature semantic vectors of the target dimensions;

and training an initial neural network by taking the classified images as input vectors and the feature semantic vectors corresponding to the classified images as output vectors to obtain a trained feature extraction model.

Further, in the multi-label image classification method provided by the embodiment of the present application, the calculating the actual correlation value according to the label semantic vector and the feature semantic vector includes:

the product is used for processing the tag semantic vector and the feature semantic vector to obtain an initial value;

and calling a preset function to process the initial value to obtain an actual correlation value.

Further, in the above multi-label image classification method provided by the embodiment of the present application, the obtaining the semantic relationship between each of the target label semantic vectors includes:

calculating a similarity value between semantic vectors of each target label;

Obtaining a target interval to which the similarity value belongs;

Traversing the mapping relation between the preset interval and the semantic relation according to the target interval to obtain the target semantic relation corresponding to the target interval.

Further, in the multi-label image classification method provided by the embodiment of the present application, the training process of the semantic conversion model includes:

Acquiring training samples which take target attributes corresponding to labels as input data and label semantic vectors corresponding to the labels as output data;

splitting the training sample into a training set and a testing set according to a preset splitting proportion;

Inputting the training set into an initial neural network model to obtain an initial semantic conversion model;

inputting the test set into the initial semantic conversion model, and calculating the accuracy of the model;

detecting whether the accuracy exceeds a preset accuracy threshold;

And when the detection result is that the accuracy exceeds the preset accuracy threshold, determining that the training of the semantic conversion model is completed.

Further, in the method for classifying multi-label images provided in the embodiment of the present application, outputting, according to the semantic relationship, the target label corresponding to the image to be classified includes:

acquiring a label set corresponding to the semantic relation;

acquiring a preset label format corresponding to the semantic relation;

Arranging the tag sets according to the preset tag format to obtain a target tag set;

Outputting the target label set.

The second aspect of the embodiment of the present application further provides a multi-label image classification device, where the multi-label image classification device includes:

The label acquisition module is used for acquiring labels in the classified images with the labels marked in advance, and calling a pre-trained semantic conversion model to process the labels to obtain label semantic vectors corresponding to the labels;

the image processing module is used for calling a pre-trained feature extraction model to process the classified images to obtain feature semantic vectors corresponding to the classified images;

The correlation calculation module is used for calculating an actual correlation value according to the tag semantic vector and the feature semantic vector;

The model training module is used for taking the classified image as an input vector, taking a label corresponding to the classified image as an output vector to train a multi-label image classification model, determining a loss function of the multi-label image classification model according to the actual correlation value and a preset target correlation value, and taking convergence of the loss function as a target until the multi-label image classification model training is completed;

the label determining module is used for calling the multi-label image classification model to process the image to be classified to obtain an initial label set contained in the image to be classified;

The label calling module is used for calling the semantic conversion model to process the initial label set to obtain a target label semantic vector corresponding to each initial label in the initial label set;

The label output module is used for acquiring the semantic relation between the semantic vectors of each target label and outputting a target label set corresponding to the image to be classified according to the semantic relation.

A third aspect of the embodiment of the present application further provides a computer device, where the computer device includes a processor, where the processor is configured to implement a multi-label image classification method according to any one of the above when executing a computer program stored in a memory.

The fourth aspect of the embodiment of the present application further provides a computer readable storage medium, where a computer program is stored, where the computer program is executed by a processor to implement any one of the above multi-label image classification methods.

According to the multi-label image classification method, the multi-label image classification device, the computer equipment and the computer readable storage medium provided by the embodiment of the application, the loss function of the multi-label image classification model is determined by the actual correlation value obtained by calculating the label semantic vector and the characteristic semantic vector and the preset target correlation value in the training process of the multi-label image classification model, and the loss function is converged as a target until the training of the multi-label image classification model is completed. The label semantic vector of the image is fused with the feature semantic vector of the image, so that the problem of poor model training effect caused by large data annotation difficulty due to the fact that a large number of picture samples are needed in an actual scene can be solved, and the accuracy of multi-label image classification is improved; in addition, the semantic relation between the labels can be determined through the label semantic vector by fusing the label semantic vector of the image with the feature semantic vector of the image, and the labels are output according to the semantic relation, so that the semantic relation between the labels can be expressed more clearly and intuitively. The intelligent city intelligent management system can be applied to various functional modules of intelligent cities such as digital medical treatment, intelligent transportation and the like, such as a multi-label image classification module of intelligent government affairs and the like, and can promote the rapid development of the intelligent cities.

Drawings

Fig. 1 is a flowchart of a multi-label image classification method according to an embodiment of the application.

Fig. 2 is a block diagram of a multi-label image classification apparatus according to a second embodiment of the present application.

Fig. 3 is a schematic structural diagram of a computer device according to a third embodiment of the present application.

The application will be further described in the following detailed description in conjunction with the above-described figures.

Detailed Description

In order that the above-recited objects, features and advantages of the present application will be more clearly understood, a more particular description of the application will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings. It should be noted that, without conflict, the embodiments of the present application and features in the embodiments may be combined with each other.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application, the described embodiments are examples of some, but not all, embodiments of the application.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein in the description of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application.

The embodiment of the application can acquire and process the related data based on the artificial intelligence technology. Wherein artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) is the theory, method, technique, and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend, and expand human intelligence, sense the environment, acquire knowledge, and use knowledge to obtain optimal results.

Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

The multi-label image classification method provided by the embodiment of the invention is executed by the computer equipment, and correspondingly, the multi-label image classification device is operated in the computer equipment.

Fig. 1 is a flowchart of a multi-label image classification method according to a first embodiment of the present application. As shown in fig. 1, the multi-label image classification method may include the following steps, the order of the steps in the flowchart may be changed according to different needs, and some may be omitted:

s11, obtaining labels in the classified images with the labels marked in advance, and calling a pre-trained semantic conversion model to process the labels to obtain label semantic vectors corresponding to the labels.

In at least one embodiment of the present application, a training set is preset, and is used for training an initial neural network model to obtain a multi-label image classification model, where the training set is a classified image with labels labeled in advance, and the number of the training sets can be set according to model training requirements. The training set is stored in a preset database, and the preset database can be a certain target node in the blockchain in consideration of the privacy and reliability of data storage. The label may refer to a standardized image feature contained in a single image, for example, the label contained in a single image may be a cat, a person, a child, a stool, a long sleeve, a tree, spring, winter, etc.; for another example, when the image is a medical image, the label contained in the medical image may be a doctor, a patient, a medical instrument, a body organ, or the like, without limitation. The number of labels contained in one image may be single or plural, and the present application is described by taking the number of labels contained in one image as plural. The semantic conversion model is used for converting the label into embedding vectors of target dimensions, and the target dimensions are preset dimensions. The semantic conversion model can be a Bert model or a Word2vec model, etc.

Optionally, the calling the pre-trained semantic conversion model to process the tag, and obtaining the tag semantic vector corresponding to the tag includes:

The target attribute is an attribute preset by a system staff and used for identifying a tag, and the target attribute comprises, but is not limited to, a tag name and a spatial position of the tag in the classified image, wherein the tag name can be names of people, cats, trees, long sleeves and the like; the spatial position of the label in the classified image is the position information of the label corresponding object in the image, for example, the spatial position of the label in the classified image is identified by a two-dimensional coordinate mode. The preset data format is a preset format for combining the target attribute, for example, according to the label category, the label length and the arrangement sequence of the spatial positions of the labels in the classified images, a target attribute sequence is obtained.

Optionally, the training process of the semantic conversion model includes:

Inputting the test set into the initial semantic conversion model, and calculating the accuracy of the initial semantic conversion model;

detecting whether the accuracy exceeds a preset accuracy threshold;

The preset splitting ratio is a preset ratio of splitting the training set and the test set, for example, the preset splitting ratio may be 8:2, which is not limited herein. The preset accuracy threshold is a preset threshold for evaluating accuracy of the model, for example, the preset accuracy threshold may be 85%, which is not limited herein.

S12, invoking a pre-trained feature extraction model to process the classified image, and obtaining a feature semantic vector corresponding to the classified image.

In at least one embodiment of the present application, the feature extraction model is used to extract image features in the classified image, and the feature extraction model may be a Resnet model or a EFFICIENTNET model. The feature semantic vector is embedding vectors corresponding to features in the image. The feature semantic vector is the same as the tag semantic vector in dimension. In an embodiment, a fully connected layer may be added to the feature extraction model, such as Resnet model or EFFICIENTNET model, and the fully connected layer is used to control the feature semantic vector to be the same as the label semantic vector in dimension.

Optionally, the training process of the feature extraction model includes:

gray processing the classified image to obtain a target classified image;

The classified images are subjected to gray level processing, so that the problem of uneven brightness distribution of the classified images can be solved, and the effect of increasing the definition of the classified images is achieved. The target classified image contains a number of pre-marked region labels, which may be numeric, alphabetic, or color labels, without limitation. Each of the region labels represents a region, and when the target classified image contains 3 region labels, the target classified image contains 3 regions. The classified images may be images composed of different target areas, each of which may represent a different feature of the picture, which may be areas of a cat, child, stool, tree, etc. For each target region, there is a corresponding feature corpus, where the feature corpus may refer to a proportional feature corpus, a geometric feature corpus, a position feature corpus, and the like of an entity in the target region. For example, when the target region is a cat, the feature corpus may be features of portions of the cat, features of shapes of the cat, and features of positions of the cat in the classified image. And converting the feature corpus to obtain feature semantic vectors. Illustratively, a Bert model or a Word2vec model is called to process the feature corpus, so that feature semantic vectors can be obtained.

And training an initial neural network by taking the classified images as input vectors and the feature semantic vectors corresponding to the classified images as output vectors as training data and test data, so as to obtain a trained feature extraction model. Taking the initial neural network model as ResNet model example, the training process of the model can input the training data into the ResNet model, the trained feature extraction model is obtained after multiple convolutions, multiple pooling and multiple activation of the initial neural network model, and the feature extraction model is called to process the classified image, so that the feature semantic vector corresponding to the feature semantic vector image corresponding to the classified image can be obtained.

S13, calculating an actual correlation value according to the tag semantic vector and the feature semantic vector.

In at least one embodiment of the present application, the actual correlation value is used to evaluate the degree of correlation between the tag semantic vector and the feature semantic vector, and in one embodiment, when the actual correlation value is greater than 0.5, it is determined that the degree of correlation between the tag semantic vector and the feature semantic vector is high, that is, the classified image includes a tag corresponding to the tag semantic vector; and when the actual correlation value is smaller than 0.5, determining that the correlation degree between the tag semantic vector and the feature semantic vector is low, namely that the classified image does not contain the tag corresponding to the tag semantic vector. The loss function is a cross entropy function consisting of the actual correlation value and the target correlation value.

Optionally, the calculating the actual correlation value according to the tag semantic vector and the feature semantic vector includes:

The preset function may be Sigmod functions, and the Sigmoid function is also called a Logistic function, and is used for hidden layer neuron output, and the value range is (0, 1). Illustratively, taking the tag semantic vector as a 512-dimensional vector as an example, the tag semantic vector may be a= (x 1, y1, z1, …, n 1), the feature semantic vector is the same as the dimension of the tag semantic vector, the feature semantic vector may be b= (x 2, y2, z2, …, n 2), and the product of the steps a and b is processed to obtain an initial value c=x1x2+y1y2+z1z2+ … +n1n2. And calling the Sigmod function to process the initial value, so that an actual correlation value can be obtained.

S14, taking the classified image as an input vector, taking a label corresponding to the classified image as an output vector to train a multi-label image classification model, wherein a loss function of the multi-label image classification model is determined according to the actual correlation value and a preset target correlation value, and convergence is taken as a target until the multi-label image classification model training is completed.

In at least one embodiment of the present application, the classified image is used as an input vector, the label corresponding to the classified image is used as an output vector to generate training data and test data, and the training data is input into an initial neural network model for training, so as to obtain an initial multi-label image classification model; inputting the test data into the initial multi-label image classification model for testing, and calculating a loss function of the model according to an actual correlation value corresponding to the test data and a preset target correlation value; judging whether the loss function converges or not; when the loss function converges, determining that the training of the multi-label image classification model is completed; and when the loss function is not converged, training data is added to retrain the initial multi-label image classification model until the loss function is converged. Judging whether the loss function converges belongs to the prior art, and will not be described in detail herein.

S15, calling the multi-label image classification model to process the image to be classified, and obtaining an initial label set contained in the image to be classified.

In at least one embodiment of the present application, the image to be classified is an image without labels, and the multi-label image classification model is invoked to process the image to be classified, so that an initial label set included in the image to be classified can be obtained, where the number of initial labels in the initial label set may be one or multiple. Each initial tag in the initial tag set contains a corresponding target attribute, wherein the target attribute can be represented by adding a mark, and the target attribute comprises a tag name and the spatial position of the tag in the classified image.

S16, calling the semantic conversion model to process the initial tag set, and obtaining a target tag semantic vector corresponding to each initial tag in the initial tag set.

In at least one embodiment of the present application, the invoking the semantic conversion model to process the initial tag set, to obtain the target tag semantic vector corresponding to each initial tag in the initial tag set includes:

acquiring a target attribute corresponding to each initial tag in the initial tag set, wherein the target attribute comprises a tag name and a spatial position of the tag in the classified image;

The target attribute corresponding to each initial tag can be obtained by inquiring the tag carried by each initial tag in the initial tag set.

S17, acquiring semantic relations among semantic vectors of each target label, and outputting a target label set corresponding to the image to be classified according to the semantic relations.

In at least one embodiment of the application, the tags are used as cats, people, children, stools, long sleeves, leisure, trees, spring, winter, for example, and semantic relationships exist between the tags, for example, the "child" is a subclass of "people", the "spring" and the "winter" are opposite categories, and the "long sleeve" generally appears together with the "person". The semantic relationship may be a subordinate relationship, a contradictory relationship and an association relationship between tags. According to the method and the device, the semantic relation among the semantic vectors of the plurality of initial labels is analyzed, so that the semantic description is carried out on the initial labels of the images to be classified, the label classification information is obtained, and the accuracy of multi-label image classification can be improved.

Optionally, the acquiring the semantic relation between the semantic vectors of each target tag includes:

calculating a similarity value between semantic vectors of each target label;

Obtaining a target interval to which the similarity value belongs;

Wherein the range of similarity values is (0, 1). And a mapping relation exists between the interval and the semantic relation, and a target semantic relation corresponding to the target interval can be obtained by inquiring the mapping relation.

Optionally, the outputting, according to the semantic relation, the target tag set corresponding to the image to be classified includes:

acquiring a label set corresponding to the semantic relation;

acquiring a preset label format corresponding to the semantic relation;

Outputting the target label set.

For different semantic relationships, a preset label format corresponding to the semantic relationships exists. The preset label format is a data format arranged between preset labels, and semantic relation among the labels can be intuitively represented by arranging the labels with semantic relation according to the preset label format.

According to the multi-label image classification method provided by the embodiment of the application, the loss function of the multi-label image classification model is determined by using the actual correlation value obtained by calculating the label semantic vector and the feature semantic vector and the preset target correlation value in the training process of the multi-label image classification model, and the loss function is converged as a target until the training of the multi-label image classification model is completed. The label semantic vector of the image is fused with the feature semantic vector of the image, so that the problem of poor model training effect caused by large data annotation difficulty due to the fact that a large number of picture samples are needed in an actual scene can be solved, and the accuracy of multi-label image classification is improved; in addition, the semantic relation between the labels can be determined through the label semantic vector by fusing the label semantic vector of the image with the feature semantic vector of the image, and the labels are output according to the semantic relation, so that the semantic relation between the labels can be expressed more clearly and intuitively. The intelligent city intelligent management system can be applied to various functional modules of intelligent cities such as digital medical treatment, intelligent transportation and the like, such as a multi-label image classification module of intelligent government affairs and the like, and can promote the rapid development of the intelligent cities.

In some embodiments, the multi-label image classification device 20 may include a plurality of functional modules comprised of computer program segments. The computer program of the individual program segments in the multi-label image classification apparatus 20 may be stored in a memory of a computer device and executed by at least one processor to perform (see fig. 1 for details) the multi-label image classification functions.

In this embodiment, the multi-label image classification device 20 may be divided into a plurality of functional modules according to the functions performed by the device. The functional module may include: a tag acquisition module 201, an image processing module 202, a correlation calculation module 203, a model training module 204, a tag determination module 205, a tag calling module 206, and a tag output module 207. The module referred to in the present application refers to a series of computer program segments capable of being executed by at least one processor and of performing a fixed function, stored in a memory. In the present embodiment, the functions of the respective modules will be described in detail in the following embodiments.

The tag obtaining module 201 may be configured to obtain a tag in a classified image with a pre-labeled tag, and call a pre-trained semantic conversion model to process the tag, so as to obtain a tag semantic vector corresponding to the tag.

Optionally, the training process of the semantic conversion model includes:

detecting whether the accuracy exceeds a preset accuracy threshold;

The image processing module 202 may be configured to invoke a pre-trained feature extraction model to process the classified image, so as to obtain a feature semantic vector corresponding to the classified image.

Optionally, the training process of the feature extraction model includes:

gray processing the classified image to obtain a target classified image;

The correlation calculation module 203 may be configured to calculate an actual correlation value from the tag semantic vector and the feature semantic vector.

The model training module 204 may be configured to train a multi-label image classification model with the classified image as an input vector and the label corresponding to the classified image as an output vector, where a loss function of the multi-label image classification model is determined according to the actual correlation value and a preset target correlation value, and the loss function is converged as a target until the multi-label image classification model training is completed.

The tag determination module 205 may be configured to invoke the multi-tag image classification model to process an image to be classified, so as to obtain an initial tag set contained in the image to be classified.

In at least one embodiment of the present application, the image to be classified is an image without labels, and the multi-label image classification model is invoked to process the image to be classified, so that an initial label set included in the image to be classified can be obtained, where the number of initial labels in the initial label set may be one or multiple. Each initial tag in the initial tag set contains a corresponding target attribute, wherein the target attribute can be represented by adding a mark, and the target attribute comprises a tag category, a tag length and a spatial position of the tag in the classified image.

The tag calling module 206 may be configured to call the semantic conversion model to process the initial tag set, so as to obtain a target tag semantic vector corresponding to each initial tag in the initial tag set.

The tag output module 207 may be configured to obtain a semantic relationship between semantic vectors of each of the target tags, and output a target tag set corresponding to the image to be classified according to the semantic relationship.

calculating a similarity value between semantic vectors of each target label;

Obtaining a target interval to which the similarity value belongs;

acquiring a label set corresponding to the semantic relation;

acquiring a preset label format corresponding to the semantic relation;

Outputting the target label set.

Fig. 3 is a schematic structural diagram of a computer device according to a third embodiment of the present application. In the preferred embodiment of the present application, the computer device 3 includes a memory 31, at least one processor 32, at least one communication bus 33, and a transceiver 34.

It will be appreciated by those skilled in the art that the configuration of the computer device shown in fig. 3 is not limiting of the embodiments of the present application, and that either a bus-type configuration or a star-type configuration is possible, and that the computer device 3 may include more or less other hardware or software than that shown, or a different arrangement of components.

In some embodiments, the computer device 3 is a device capable of automatically performing numerical calculation and/or information processing according to preset or stored instructions, and its hardware includes, but is not limited to, a microprocessor, an application specific integrated circuit, a programmable gate array, a digital processor, an embedded device, and the like. The computer device 3 may also include a client device, which includes, but is not limited to, any electronic product that can interact with a client by way of a keyboard, mouse, remote control, touch pad, or voice control device, such as a personal computer, tablet, smart phone, digital camera, etc.

It should be noted that the computer device 3 is only used as an example, and other electronic products that may be present in the present application or may be present in the future are also included in the scope of the present application by way of reference.

In some embodiments, the memory 31 has stored therein a computer program which, when executed by the at least one processor 32, performs all or part of the steps in the multi-label image classification method as described. The Memory 31 includes Read-Only Memory (ROM), programmable Read-Only Memory (Programmable Read-Only Memory, PROM), erasable programmable Read-Only Memory (Erasable Programmable Read-Only Memory, EPROM), one-time programmable Read-Only Memory (OTPROM), electrically erasable rewritable Read-Only Memory (EEPROM), compact disc Read-Only Memory (Compact Disc Read-Only Memory, CD-ROM) or other optical disc Memory, magnetic tape Memory, or any other medium that can be used for carrying or storing data.

Further, the computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created from the use of blockchain nodes, and the like.

The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The blockchain (Blockchain), essentially a de-centralized database, is a string of data blocks that are generated in association using cryptographic methods, each of which contains information from a batch of network transactions for verifying the validity (anti-counterfeit) of its information and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.

In some embodiments, the at least one processor 32 is a Control Unit (Control Unit) of the computer device 3, connects the various components of the entire computer device 3 using various interfaces and lines, and performs various functions and processes of the computer device 3 by running or executing programs or modules stored in the memory 31, and invoking data stored in the memory 31. For example, the at least one processor 32, when executing the computer program stored in the memory, implements all or part of the steps of the multi-label image classification method described in embodiments of the application; or to implement all or part of the functionality of the multi-label image classification device. The at least one processor 32 may be comprised of integrated circuits, such as a single packaged integrated circuit, or may be comprised of multiple integrated circuits packaged with the same or different functionality, including one or more central processing units (Central Processing unit, CPU), microprocessors, digital processing chips, graphics processors, combinations of various control chips, and the like.

In some embodiments, the at least one communication bus 33 is arranged to enable connected communication between the memory 31 and the at least one processor 32 or the like.

Although not shown, the computer device 3 may further comprise a power source (such as a battery) for powering the various components, preferably the power source is logically connected to the at least one processor 32 via a power management means, whereby the functions of managing charging, discharging, and power consumption are performed by the power management means. The power supply may also include one or more of any of a direct current or alternating current power supply, recharging device, power failure detection circuit, power converter or inverter, power status indicator, etc. The computer device 3 may further include various sensors, bluetooth modules, wi-Fi modules, etc., which will not be described in detail herein.

The integrated units implemented in the form of software functional modules described above may be stored in a computer readable storage medium. The software functional modules described above are stored in a storage medium and include instructions for causing a computer device (which may be a personal computer, a computer device, or a network device, etc.) or processor (processor) to perform portions of the methods described in the various embodiments of the application.

In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be other manners of division when actually implemented.

The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional module in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units can be realized in a form of hardware or a form of hardware and a form of software functional modules.

It will be evident to those skilled in the art that the application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it will be obvious that the term "comprising" does not exclude other elements or that the singular does not exclude a plurality. Several of the elements or devices recited in the specification may be embodied by one and the same item of software or hardware. The terms first, second, etc. are used to denote a name, but not any particular order.

Finally, it should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present application and not for limiting the same, and although the present application has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made to the technical solution of the present application without departing from the spirit and scope of the technical solution of the present application.

Claims

1. A multi-label image classification method, characterized in that the multi-label image classification method comprises:

calculating an actual correlation value according to the tag semantic vector and the feature semantic vector, including: the product is used for processing the tag semantic vector and the feature semantic vector to obtain an initial value; calling a preset function to process the initial value to obtain an actual correlation value;

When the actual correlation value is larger than 0.5, determining that the classified image contains a label corresponding to the label semantic vector; when the actual correlation value is smaller than 0.5, determining that the classified image does not contain the label corresponding to the label semantic vector;

obtaining the semantic relation between the semantic vectors of each target label comprises the following steps: calculating a similarity value between semantic vectors of each target label; obtaining a target interval to which the similarity value belongs; traversing a preset mapping relation between a section and a semantic relation according to the target section to obtain the semantic relation corresponding to the target section;

and outputting a target label set corresponding to the image to be classified according to the semantic relation.

2. The multi-label image classification method according to claim 1, wherein the invoking the pre-trained semantic conversion model to process the label to obtain a label semantic vector corresponding to the label comprises:

3. The multi-label image classification method according to claim 2, wherein before said invoking a pre-trained feature extraction model to process said classified image to obtain a feature semantic vector corresponding to said classified image, said method further comprises:

gray processing the classified image to obtain a target classified image;

extracting feature corpus corresponding to each target region in the target region set, and converting the feature corpus into feature semantic vectors of the target dimension;

4. The multi-label image classification method according to claim 1, wherein the training process of the semantic conversion model comprises:

detecting whether the accuracy exceeds a preset accuracy threshold;

5. The multi-label image classification method according to claim 1, wherein the outputting the target label set corresponding to the image to be classified according to the semantic relation comprises:

acquiring a label set corresponding to the semantic relation;

acquiring a preset label format corresponding to the semantic relation;

Outputting the target label set.

6. A multi-label image classification apparatus, characterized in that the multi-label image classification apparatus comprises:

The correlation calculation module is used for calculating an actual correlation value according to the tag semantic vector and the feature semantic vector, and comprises the following steps: the product is used for processing the tag semantic vector and the feature semantic vector to obtain an initial value; calling a preset function to process the initial value to obtain an actual correlation value;

The correlation calculation module is further configured to determine that the classified image includes a label corresponding to the label semantic vector when the actual correlation value is greater than 0.5; when the actual correlation value is smaller than 0.5, determining that the classified image does not contain the label corresponding to the label semantic vector;

The label output module is used for acquiring the semantic relation between the semantic vectors of each target label and comprises the following steps: calculating a similarity value between semantic vectors of each target label; obtaining a target interval to which the similarity value belongs; traversing a preset mapping relation between a section and a semantic relation according to the target section to obtain the semantic relation corresponding to the target section;

the label output module is further used for outputting a target label set corresponding to the image to be classified according to the semantic relation.

7. A computer device comprising a processor for implementing the multi-label image classification method according to any one of claims 1 to 5 when executing a computer program stored in a memory.

8. A computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the multi-label image classification method according to any of claims 1 to 5.