CN114255363A

CN114255363A - Image tag identification method and device

Info

Publication number: CN114255363A
Application number: CN202011004911.3A
Authority: CN
Inventors: 许静; 张磊; 刘超; 张奇; 迟颖
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2020-09-22
Filing date: 2020-09-22
Publication date: 2022-03-29

Abstract

The invention discloses an image tag identification method and device. Wherein, the method comprises the following steps: the method comprises the following steps of identifying a target image by using an image tag identification model to obtain a first identification result, wherein the image tag identification model is trained by using multiple groups of data through machine learning, and the data in the multiple groups of data comprise: the image and the value corresponding to the image label included in the image; converting the numerical value in the first recognition result into a preset numerical value interval to obtain a second recognition result, wherein the first recognition result comprises at least one numerical value, and the numerical value corresponds to the image tag; and determining the image label of the target image according to the second recognition result. The invention solves the technical problem of low accuracy of identifying the image label.

Description

Image tag identification method and device

Technical Field

The invention relates to the technical field of image processing, in particular to an image tag identification method and device.

Background

In recent years, with the rapid development of computer-aided diagnosis, medical imaging technology has become an important basis for clinical diagnosis, and the screening and summarization of medical images are usually performed by experienced medical professionals, and it is a time-consuming and laborious task to obtain important information from a huge medical image data set.

In the task of medical image label identification, in the prior art, single-label concept identification of a large-scale image is mostly carried out on the basis of an image dataset, or a small amount of multi-label concept identification is carried out on the basis of a small-scale medical image. In the model construction method, visual information is extracted by a certain convolutional neural network, and medical images are learned by various additional methods or a deeper convolutional neural model is adopted. Still other approaches take the form of a combination of convolutional and recurrent neural networks, with an encoder-to-decoder structure, to generate a set of concepts corresponding to a medical image. However, these deeper network models do not bring obvious improvement, the effect of combining the convolutional neural network and the cyclic neural network schemes also highly depends on the correlation of multiple concepts in the same image, the result of identifying the tag with higher correlation is better, but the accuracy rate of identifying the tag with low magnitude is lower, and the tag is easily interfered by the noise tag.

Aiming at the problem of low image label identification accuracy rate in the prior art, no effective solution is provided at present.

Disclosure of Invention

The embodiment of the invention provides an image tag identification method and device, which at least solve the technical problem of low accuracy of image tag identification.

In order to achieve the above object, according to one aspect of the present application, there is provided an image tag identification method. The method comprises the following steps: identifying a target image by using an image tag identification model to obtain a first identification result, wherein the image tag identification model is trained by using multiple groups of data through machine learning, and the data in the multiple groups of data comprises: the image and the value corresponding to the image label included in the image; converting the numerical value in the first recognition result into a preset numerical value interval to obtain a second recognition result, wherein the first recognition result comprises at least one numerical value, and the numerical value corresponds to the image tag; and determining the image label of the target image according to the second recognition result.

Further, before processing each image in the set of images to be recognized using the image tag recognition model, the method further comprises: carrying out low-frequency filtering processing on image labels in the image sample set to obtain a first image label set; performing relevance processing on image tags in the first image tag set to obtain a second image tag set; determining image labels included in the images in the image sample set according to the second image label set to obtain a third image label set, wherein the third image label set is composed of at least one subset, and the subset corresponds to the images in the image sample set; and performing learning training according to the third image label set and the image sample set to generate an image label identification model.

Further, before the image labels in the image sample set are subjected to low-frequency filtering processing, the method comprises the following steps: acquiring all image tags in a verification image set to obtain a verification set image tag set; determining image tags which do not appear in the image tags of the image sample set in the verification set image tag set to obtain an image tag subset to be expanded; and performing expansion processing on the image tags of the image sample set and the image sample set according to the subset of the image tags to be expanded to obtain the image tags of the image sample set and the image sample set after the expansion processing.

Further, the expanding the image tag of the image sample set and the image sample set according to the subset of the image tag to be expanded includes: determining an image to be expanded according to the image tag subset to be expanded; carrying out random transformation processing on the image to be expanded; adding the image to be expanded after the transformation processing into the image sample set.

Further, the low-frequency filtering processing of the image labels in the image sample set includes: counting the number of times that the image labels of the image sample set appear in the image sample set; and filtering the image labels with the times lower than a first preset threshold value from the image labels of the image sample set.

Further, performing relevance processing on the image tags in the first image tag set to obtain a second image tag set includes: sequentially calculating the support degree and the credibility among the image labels in the first image label set; and combining the image labels with the support degree larger than a second preset threshold and the reliability degree larger than a third preset threshold to obtain a second image label set.

Further, before performing learning training according to the third image label set and the image sample set to generate an image label recognition model, the method includes: randomly overturning and adjusting the size of the image in the image sample set according to a preset probability; and cutting the processed image.

In order to achieve the above object, according to another aspect of the present application, there is provided an image tag identification apparatus. The device includes: the identification unit is used for identifying a target image by using an image tag identification model to obtain a first identification result, wherein the image tag identification model is trained by machine learning by using a plurality of groups of data, and the data in the plurality of groups of data comprises: the image and the value corresponding to the image label included in the image; the conversion unit is used for converting numerical values in the first identification result into a preset numerical value interval to obtain a second identification result, wherein the first identification result comprises at least one numerical value, and the numerical value corresponds to the image label; and the first determining unit is used for determining the image label of the target image according to the second recognition result.

Further, the apparatus further comprises: the first processing unit is used for performing low-frequency filtering processing on the image labels in the image sample set to obtain a first image label set before processing each image in the image set to be recognized by using the image label recognition model; the second processing unit is used for performing relevance processing on the image tags in the first image tag set to obtain a second image tag set; a first obtaining unit, configured to determine, according to the second image tag set, image tags included in the images in the image sample set, and obtain a third image tag set, where the third image tag set is composed of at least one subset, and the subset corresponds to the images in the image sample set; and the generating unit is used for performing learning training according to the third image label set and the image sample set to generate an image label identification model.

Further, the apparatus comprises: the second acquisition unit is used for acquiring all image tags in the verification image set before carrying out low-frequency filtering processing on the image tags in the image sample set to obtain a verification set image tag set; a second determining unit, configured to determine an image tag that does not appear in the image tag of the image sample set in the verification set image tag set, to obtain an image tag subset to be expanded; and the expansion unit is used for expanding the image tags of the image sample set and the image sample set according to the image tag subset to be expanded to obtain the image tags of the image sample set and the image sample set after expansion.

In order to achieve the above object, according to another aspect of the present application, there is provided an image tag identification method including: receiving a service calling request sent by a client, wherein the service calling request carries an image tag for identifying a target image; responding to the service calling request, calling an image tag identification model to identify a target image to obtain a first identification result, wherein the image tag identification model is trained by machine learning by using multiple groups of data, and the data in the multiple groups of data comprises: the image and the value corresponding to the image label included in the image; and outputting the first recognition result, wherein the first recognition result is used for determining an image label of the target image.

Further, after outputting the first recognition result, the method further includes: converting the numerical value in the first recognition result into a preset numerical value interval to obtain a second recognition result, wherein the first recognition result comprises at least one numerical value, and the numerical value corresponds to the image tag; and determining the image label of the target image according to the second recognition result.

In order to achieve the above object, according to another aspect of the present application, there is provided a storage medium including a stored program, wherein when the program runs, a device on which the storage medium is located is controlled to execute the method for identifying an image tag according to any one of the above items.

In order to achieve the above object, according to another aspect of the present application, there is provided a processor for executing a program, wherein the program executes to execute the image tag identification method according to any one of the above items.

In the embodiment of the present invention, a pre-trained image tag recognition model is adopted to perform image tag recognition on a target image, and the target image is recognized by using the image tag recognition model to obtain a first recognition result, wherein the image tag recognition model is trained by using multiple sets of data through machine learning, and the data in the multiple sets of data includes: the image and the value corresponding to the image label included in the image; converting the numerical value in the first recognition result into a preset numerical value interval to obtain a second recognition result, wherein the first recognition result comprises at least one numerical value, and the numerical value corresponds to the image tag; and determining the image label of the target image according to the second recognition result. The purpose of improving the identification accuracy of the image label is achieved, the technical effect of quickly and accurately identifying the image label is achieved, and the technical problem of low accuracy of identifying the image label is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

fig. 1 is a block diagram of a hardware configuration of a computer terminal according to an embodiment of the present invention;

fig. 2 is a flowchart of a method for identifying an image tag according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of an alternative image tag identification method according to an embodiment of the present invention;

FIG. 4 is a flowchart of model construction of a method for identifying image tags according to an embodiment of the present invention;

fig. 5 is a schematic diagram of an image tag identification apparatus according to a second embodiment of the present invention;

fig. 6 is a flowchart of an image tag identification method according to a third embodiment of the present invention; and

fig. 7 is a block diagram of an alternative computer terminal according to an embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Example 1

In accordance with an embodiment of the present invention, there is provided an image tag identification method embodiment, it should be noted that the steps illustrated in the flowchart of the drawings may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.

The method provided by the first embodiment of the present application may be executed in a mobile terminal, a computer terminal, or a similar computing device. Fig. 1 shows a hardware configuration block diagram of a computer terminal (or mobile device) for implementing an image tag identification method. As shown in fig. 1, the computer terminal 10 (or mobile device 10) may include one or more (shown as 102a, 102b, … …, 102 n) processors 102 (the processors 102 may include, but are not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA, etc.), a memory 104 for storing data, and a transmission device for communication functions. Besides, the method can also comprise the following steps: a display, an input/output interface (I/O interface), a Universal Serial Bus (USB) port (which may be included as one of the ports of the I/O interface), a network interface, a power source, and/or a camera. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration and is not intended to limit the structure of the electronic device. For example, the computer terminal 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.

It should be noted that the one or more processors 102 and/or other data processing circuitry described above may be referred to generally herein as "data processing circuitry". The data processing circuitry may be embodied in whole or in part in software, hardware, firmware, or any combination thereof. Further, the data processing circuit may be a single stand-alone processing module, or incorporated in whole or in part into any of the other elements in the computer terminal 10 (or mobile device). As referred to in the embodiments of the application, the data processing circuit acts as a processor control (e.g. selection of a variable resistance termination path connected to the interface).

The memory 104 may be used to store software programs and modules of application software, such as program instructions/data storage devices corresponding to the image tag identification method in the embodiment of the present invention, and the processor 102 executes various functional applications and data processing by running the software programs and modules stored in the memory 104, that is, implements the image tag identification method of the application program. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the computer terminal 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 106 is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal 10. In one example, the transmission device 106 includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmission device 106 can be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.

The display may be, for example, a touch screen type Liquid Crystal Display (LCD) that may enable a user to interact with a user interface of the computer terminal 10 (or mobile device).

Under the operating environment, the application provides an image tag identification method as shown in fig. 2. Fig. 2 is a flowchart of an image tag identification method according to a first embodiment of the present invention.

Step S101, identifying a target image by using an image tag identification model to obtain a first identification result, wherein the image tag identification model is trained by using multiple groups of data through machine learning, and the data in the multiple groups of data comprises: the image and the value corresponding to the image tag included in the image.

And training the image set labeled with the image label by a machine learning algorithm to generate the image label identification model, and identifying the target image by using the image label identification model. An image generally contains a plurality of concepts, namely a plurality of labels, the embodiment of the application is based on multi-label training learning, and a data set used for training comprises: a plurality of images and a numerical value corresponding to an image tag included in each image, e.g. by_iLabel, l, representing the ith image in a sample image set for training_iCan be expressed as: l_i＝[c_i，1，c_i，2，c_i，3，...，c_i，j，...，c_i，n]Wherein n represents the number of all tags, when the jth tag is included based on the ith image, c is added_i，jThe value of (d) is set to 1 and the remainder to 0.

Optionally, in an identification method of an image tag provided according to an embodiment of the present application, before processing each image in an image set to be identified using an image tag identification model, the method further includes: carrying out low-frequency filtering processing on image labels in the image sample set to obtain a first image label set; performing relevance processing on image tags in the first image tag set to obtain a second image tag set; determining image labels included in the images in the image sample set according to the second image label set to obtain a third image label set, wherein the third image label set is composed of at least one subset, and the subset corresponds to the images in the image sample set; and performing learning training according to the third image label set and the image sample set to generate an image label identification model.

In the scheme, considering the problem that a large number of low-frequency concepts exist in a data sample, the distribution difference among the concepts is obvious, and the learning samples are insufficient for all concept sets, the scheme firstly analyzes and processes the existing medical images, counts the distribution of concept data, filters out the low-frequency concepts, and then searches for the internal relation among different concepts based on an association rule learning method, so that the binding of the high-frequency concepts is realized, and the concept recognition task is simplified. And then sequentially determining concepts contained in each image in the sample image set for training on the basis of the image labels processed by the low-frequency filtering and association rules, and expressing the concepts by using corresponding numerical values. The set of sample image labels is formed by concepts contained by each of all images in the set of sample images, the set being formed by a plurality of subsets, each subset corresponding to an image in the set of sample images. According to the embodiment of the application, the low-frequency image labels in the sample image set are filtered in advance, the relevance concept is processed, the distribution difference between data samples is reduced, the noise labels are abandoned, and the identification accuracy of the model is improved.

Optionally, in an identification method of an image tag provided according to an embodiment of the present application, before performing low-frequency filtering processing on an image tag in an image sample set, the method includes: acquiring all image tags in a verification image set to obtain a verification set image tag set; determining image tags which do not appear in the image tags of the image sample set in the verification set image tag set to obtain an image tag subset to be expanded; and carrying out expansion processing on the image tags and the image sample set of the image sample set according to the image tag subset to be expanded to obtain the image tags and the image sample set of the image sample set after the expansion processing.

In order to meet the requirements of model training and testing, in the embodiment of the application, the existing images are randomly divided into a training image set and a verification image set, wherein the training image set is used for model training, and the verification image set is used for model testing. Since the image labels included in each image may be different, in order to achieve the purpose as much as possible that all the image labels included in the training image set include the types of all the image labels, so that the model generated according to the training image set can identify the image labels of all the categories, it is necessary to perform necessary expansion before processing the image sample set, that is, the training image set. The image labels in the image sample set and the image labels in the verification image set are obtained, the image labels which exist in the image label set of the verification set and do not exist in the image labels of the image sample set are determined, the image labels are defined as image label subsets to be expanded, in order to enable a model generated by training to identify the image labels of all categories, the image labels and the image sample set of the image sample set need to be expanded according to the image label subsets to be expanded, and therefore when model training is performed by adopting the image labels and the image sample set of the expanded image sample set, the model training can be more accurate.

Optionally, in the identification method of an image tag according to an embodiment of the present application, performing expansion processing on the image tag of the image sample set and the image sample set according to the subset of the image tag to be expanded includes: determining an image to be expanded according to the image tag subset to be expanded; carrying out random transformation processing on an image to be expanded; and adding the image to be expanded after the transformation processing into the image sample set.

In the above scheme, after the image tag subset to be expanded is obtained, it may be determined that the images to be expanded in the image sample set are all the images including the image tags to be expanded in the verification image set according to the image tag subset to be expanded, and in order to avoid the existence of the same data as that in the verification image set in the image sample set, the images cannot be directly added to the image sample set, but the images are subjected to random transformation processing, such as mirror image processing and rotation processing, and then the processed images are added to the image sample set, so as to complete the expansion of the image sample set, while the data of the verification image set remains unchanged, thereby preserving the accuracy of the data.

Optionally, in the method for identifying an image tag according to an embodiment of the present application, the performing low-frequency filtering processing on the image tag in the image sample set includes: counting the times of the image labels of the image sample set appearing in the image sample set; and filtering the image labels with the times lower than a first preset threshold value from the image labels of the image sample set.

In the above scheme, after the expansion of the image sample set is completed, the low-frequency filtering processing is performed on the image tags of the image sample set, so as to reduce the interference of the low-frequency image tags on model identification. The image labels with the times lower than the threshold value are filtered from the sample image label set by counting the total times of the image labels of each category in the sample image label set and combining a preset threshold value.

Optionally, in the method for identifying image tags according to the first embodiment of the present application, performing relevance processing on image tags in the first image tag set to obtain the second image tag set includes: sequentially calculating the support degree and the credibility among the image labels in the first image label set; and merging the image labels with the support degree greater than a second preset threshold and the reliability degree greater than a third preset threshold to obtain a second image label set.

In the above solution, the relationship between all concepts is searched in the first image tag set by using association rule mining, and the form of the association rule of the concept sets X and Y can be written as follows: x → Y, wherein

And

setting Support (Support) as the probability of an item set (X, Y) appearing in an overall concept item set N, and setting Confidence (Confidence) to represent the probability of Y obtained according to an association rule X → Y when the concept set X occurs, namely the frequency of the concept Y appearing in a sample set containing X, and assuming that sigma represents the frequency of the appearance of a certain item subset, wherein:

and according to the definition of the support degree and the confidence degree between the concept Y and the concept X, sequentially calculating the support degree and the confidence degree between each concept in the first image label set, and according to a preset threshold, combining the image labels of which the support degree is greater than a second preset threshold and the confidence degree is greater than a third preset threshold to obtain a second image label set. For example, a first image tag set is defined as C ═ { C ═ C₁，c₂，c₃，...c_MAnd defining a set of image labels included in each image in the set of image samples as I ═ I₁，i₂，i₃，...i_NTherein of

c_MImage tag representing a certain category, i_MRepresenting a set of concepts corresponding to each image. Sequentially calculating the support degree and the confidence degree between every concept in the set C according to the calculation formula of the support degree and the confidence degree defined above, screening out concept subsets with the support degree larger than 0.02 and the confidence degree larger than 0.9, and recombining the selected concept subsets into a new concept C_gAnd the image labels are mapped to the original data set, so that the image labels meeting the relevance condition are combined, and the model training can be more accurate when a second image label set is obtained for model training in the subsequent process.

Optionally, in an image tag identification method provided according to an embodiment of the present application, before performing learning training according to a third image tag set and an image sample set to generate an image tag identification model, the method includes: randomly overturning and adjusting the size of the image in the image sample set according to a preset probability; and clipping the processed image.

In the scheme, before the learning training is performed on the third image label set and the image sample set, the preprocessing operation is performed on the medical image through a data enhancement technology to optimize the recognition effect of the model generated by the training. The specific method can be as follows: and randomly overturning the image horizontally or vertically with a certain probability p, adjusting the size of the original image with different proportions by utilizing bilinear interpolation, and cutting all the transformed images to finish the preprocessing operation of the image.

Step S102, converting the numerical value in the first recognition result into a preset numerical value interval to obtain a second recognition result, wherein the first recognition result comprises at least one numerical value, and the numerical value corresponds to the image label.

The used image label recognition model can be obtained through convolutional neural network learning training, and in the process of training the convolutional neural network, a ResNet-101 model which is pre-trained on the basis of an ImageNet data set is used as a main skeleton of the multi-label classification model. The image is first preprocessed to fit the input of the network (224 x 224), feature vectors are output via a feed-forward network, binary cross entropy is used as a loss function of the model, and the fully connected layer is followed by a sigmoid activation function to calculate the probability of each label class, e.g., if the probability is greater than 0.5, it indicates that the input image belongs to that label class. As shown in fig. 3, when performing recognition analysis on a target image by using a model, a scalar result output from the last layer (fully connected layer) of the model is converted into a preset value interval, for example, an [0, 1] interval by using a two-class cross entropy loss function, and a probability threshold of 0.5, a transition greater than the threshold of 1, and a transition less than the threshold of 0 may be set during the conversion.

Fig. 3 is a schematic diagram of an image label identification method, which includes preprocessing a target image (such as a medical radiation image), analyzing the target image by using a trained image label identification model, and determining whether a label of a certain category is included in the target image by using a sigmoid classifier according to an output result of a model full-link layer.

And step S103, determining an image label of the target image according to the second recognition result.

After analyzing the target image by using the image label identification model and processing the analysis result by the classifier, obtaining the corresponding value of each category of image label in the target image, if the value is 1, determining that the image label of the category exists in the target image, and if the value is 0, determining that the image label of the category does not exist in the target image.

Through the steps from S101 to S103, the image label of the target image can be automatically and rapidly identified, and the purpose of improving the identification accuracy of the image label is achieved, so that the technical effect of rapidly and accurately identifying the image label is realized, and the technical problem of low accuracy of identifying the image label is solved.

Fig. 4 is a flow chart of constructing an image tag identification model, which screens out a high-frequency image tag set by performing statistical analysis on an existing image data set, and meanwhile, discards low-frequency image tags, and obtains the internal relation among different image tags by a relation rule learning method, thereby clustering similar image tags and integrating the tag sets. And meanwhile, supplementing the image sample set, and enhancing and normalizing the image. And finally, training the processed image sample set and the image labels of the image sample set by using a convolutional neural network to generate an image label recognition model. Compared with a processing mode of a CNN-RNN structure, the technical scheme provided by the embodiment of the application can better weaken the influence of noise data on the model and improve the accuracy of image label identification.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

Example 2

According to an embodiment of the present invention, there is also provided an apparatus for implementing the method for identifying an image tag, as shown in fig. 5, the apparatus including: an identification unit 501, a conversion unit 502 and a first determination unit 503.

The recognizing unit 501 is configured to recognize a target image by using an image tag recognition model, and obtain a first recognition result, where the image tag recognition model is trained by machine learning using multiple sets of data, and the data in the multiple sets of data includes: the image and the value corresponding to the image label included in the image;

a converting unit 502, configured to convert a numerical value in the first recognition result to a preset numerical value interval to obtain a second recognition result, where the first recognition result includes at least one numerical value, and the numerical value corresponds to the image tag;

a first determining unit 503, configured to determine an image label of the target image according to the second recognition result.

In summary, in the apparatus of the method for identifying an image tag provided in the embodiment of the present application, a pre-trained image tag identification model is used to identify an object image, and the identifying unit 501 identifies the object image by using the image tag identification model to obtain a first identification result, where the image tag identification model is trained by machine learning using multiple sets of data, and the data in the multiple sets of data includes: the image and the value corresponding to the image label included in the image; the converting unit 502 converts a numerical value in the first recognition result into a preset numerical value interval to obtain a second recognition result, wherein the first recognition result comprises at least one numerical value, and the numerical value corresponds to the image tag; the first determination unit 503 determines an image tag of the target image based on the second recognition result. The purpose of improving the identification accuracy of the image label is achieved, the technical effect of quickly and accurately identifying the image label is achieved, and the technical problem of low accuracy of identifying the image label is solved.

Optionally, in the identification apparatus for an image tag provided in an embodiment of the present application, the apparatus further includes: the first processing unit is used for performing low-frequency filtering processing on the image labels in the image sample set to obtain a first image label set before processing each image in the image set to be recognized by using the image label recognition model; the second processing unit is used for performing relevance processing on the image tags in the first image tag set to obtain a second image tag set; the first obtaining unit is used for determining image tags included in the images in the image sample set according to the second image tag set to obtain a third image tag set, wherein the third image tag set is composed of at least one subset, and the subset corresponds to the images in the image sample set; and the generating unit is used for performing learning training according to the third image label set and the image sample set to generate an image label identification model.

Optionally, in the identification apparatus for an image tag provided in an embodiment of the present application, the apparatus includes: the second acquisition unit is used for acquiring all image tags in the verification image set before carrying out low-frequency filtering processing on the image tags in the image sample set to obtain a verification set image tag set; the second determining unit is used for determining the image tags which do not appear in the image tags of the image sample set in the verification set image tag set to obtain an image tag subset to be expanded; and the expansion unit is used for expanding the image tags and the image sample set of the image sample set according to the image tag subset to be expanded to obtain the image tags and the image sample set of the expanded image sample set.

Optionally, in the apparatus for identifying an image tag provided in an embodiment of the present application, the expansion unit includes: the first determining module is used for determining the image to be expanded according to the image tag subset to be expanded; the first processing module is used for carrying out random transformation processing on the image to be expanded; and the second processing module is used for adding the image to be expanded after the transformation processing into the image sample set.

Optionally, in the identification apparatus for an image tag provided in an embodiment of the present application, the first processing unit further includes: the counting module is used for counting the times of the image labels of the image sample set appearing in the image sample set; and the third processing module is used for filtering the image labels with the times lower than the first preset threshold value from the image labels of the image sample set.

Optionally, in the identification apparatus for an image tag provided in an embodiment of the present application, the second processing unit includes: the calculation module is used for sequentially calculating the support degree and the credibility among the image labels in the first image label set; and the fourth processing module is used for merging the image labels with the support degree greater than the second preset threshold and the reliability degree greater than the third preset threshold to obtain a second image label set.

Optionally, in the identification apparatus for an image tag provided in an embodiment of the present application, the apparatus includes: the third processing unit is used for randomly overturning and resizing the images in the image sample set according to a preset probability before learning training is carried out according to the third image label set and the image sample set and an image label recognition model is generated; and the fourth processing unit is used for cutting the processed image.

It should be noted here that the above-mentioned identifying unit 501, converting unit 502 and first determining unit 503 correspond to steps S101 to S103 in embodiment 1, and the three units are the same as the examples and application scenarios realized by the corresponding steps, but are not limited to what is disclosed in the above-mentioned first embodiment. It should be noted that the modules described above as part of the apparatus may be run in the computer terminal 10 provided in the first embodiment.

Example 3

In the operating environment of the first embodiment, the present application provides the identification of the image tag as shown in fig. 6. Fig. 6 is a flowchart of image tag identification according to a third embodiment of the present invention.

Step S601, receiving a service calling request sent by a client, wherein the service calling request carries an image tag for identifying a target image;

step S602, in response to the service invocation request, invoking an image tag recognition model to recognize a target image, so as to obtain a first recognition result, where the image tag recognition model is trained through machine learning by using multiple sets of data, where the data in the multiple sets of data includes: the image and the value corresponding to the image label included in the image;

step S603, outputting the first recognition result, where the first recognition result is used to determine an image tag of the target image.

The image label of the target image is identified on the server through the service calling request, so that the aim of improving the identification accuracy of the image label is fulfilled, the technical effect of quickly and accurately identifying the image label is achieved, and the technical problem of low accuracy of identifying the image label is solved.

Optionally, after outputting the first recognition result, the method further includes: converting the numerical value in the first recognition result into a preset numerical value interval to obtain a second recognition result, wherein the first recognition result comprises at least one numerical value, and the numerical value corresponds to the image tag; and determining the image label of the target image according to the second recognition result.

The image label recognition model can be obtained through convolutional neural network learning training, and in the process of training the convolutional neural network, a ResNet-101 model which is pre-trained on the basis of an ImageNet data set is used as a main skeleton of the multi-label classification model. The image is first preprocessed to fit the input of the network (224 x 224), feature vectors are output via a feed-forward network, binary cross entropy is used as a loss function of the model, and the fully connected layer is followed by a sigmoid activation function to calculate the probability of each label class, e.g., if the probability is greater than 0.5, it indicates that the input image belongs to that label class. For example, when the model is used to perform recognition analysis on the target image, the scalar result output from the last layer (fully-connected layer) of the model is converted into a preset value interval, for example, an [0, 1] interval, using a two-class cross entropy loss function, and a probability threshold of 0.5, a transition greater than the threshold of 1, and a transition less than the threshold of 0 can be set during the conversion. For example, after obtaining the corresponding value of each category of image tag in the target image, if the value is 1, it is determined that the image tag of the category exists in the target image, and if the value is 0, it is determined that the image tag of the category does not exist in the target image.

By the scheme, the image label of the target image can be automatically and quickly identified, and the aim of improving the identification accuracy of the image label is fulfilled, so that the technical effect of quickly and accurately identifying the image label is realized, and the technical problem of low accuracy of identifying the image label is solved.

Example 4

The embodiment of the invention can provide a computer terminal which can be any computer terminal device in a computer terminal group. Optionally, in this embodiment, the computer terminal may also be replaced with a terminal device such as a mobile terminal.

Optionally, in this embodiment, the computer terminal may be located in at least one network device of a plurality of network devices of a computer network.

In this embodiment, the computer terminal may execute program codes of the following steps in the method for identifying an image tag of an application program: identifying a target image by using an image tag identification model to obtain a first identification result, wherein the image tag identification model is trained by using multiple groups of data through machine learning, and the data in the multiple groups of data comprises: the image and the value corresponding to the image label included in the image; converting the numerical value in the first recognition result into a preset numerical value interval to obtain a second recognition result, wherein the first recognition result comprises at least one numerical value, and the numerical value corresponds to the image tag; and determining the image label of the target image according to the second recognition result.

The computer terminal may execute the program code of the following steps in the method for identifying an image tag of an application program: before processing each image in the set of images to be recognized using the image tag recognition model, the method further comprises: carrying out low-frequency filtering processing on image labels in the image sample set to obtain a first image label set; performing relevance processing on image tags in the first image tag set to obtain a second image tag set; determining image labels included in the images in the image sample set according to the second image label set to obtain a third image label set, wherein the third image label set is composed of at least one subset, and the subset corresponds to the images in the image sample set; and performing learning training according to the third image label set and the image sample set to generate an image label identification model.

The computer terminal may execute the program code of the following steps in the method for identifying an image tag of an application program: before the image labels in the image sample set are subjected to low-frequency filtering processing, the method comprises the following steps: acquiring all image tags in a verification image set to obtain a verification set image tag set; determining image tags which do not appear in the image tags of the image sample set in the verification set image tag set to obtain an image tag subset to be expanded; and performing expansion processing on the image tags of the image sample set and the image sample set according to the subset of the image tags to be expanded to obtain the image tags of the image sample set and the image sample set after the expansion processing.

The computer terminal may execute the program code of the following steps in the method for identifying an image tag of an application program: the expanding processing of the image tags of the image sample set and the image sample set according to the image tag subset to be expanded comprises the following steps: determining an image to be expanded according to the image tag subset to be expanded; carrying out random transformation processing on the image to be expanded; adding the image to be expanded after the transformation processing into the image sample set.

The computer terminal may execute the program code of the following steps in the method for identifying an image tag of an application program: the low-frequency filtering processing of the image labels in the image sample set comprises the following steps: counting the number of times that the image labels of the image sample set appear in the image sample set; and filtering the image labels with the times lower than a first preset threshold value from the image labels of the image sample set.

The computer terminal may execute the program code of the following steps in the method for identifying an image tag of an application program: performing relevance processing on the image tags in the first image tag set to obtain a second image tag set, wherein the obtaining of the second image tag set comprises: sequentially calculating the support degree and the credibility among the image labels in the first image label set; and combining the image labels with the support degree larger than a second preset threshold and the reliability degree larger than a third preset threshold to obtain a second image label set.

The computer terminal may execute the program code of the following steps in the method for identifying an image tag of an application program: before performing learning training according to the third image label set and the image sample set to generate an image label recognition model, the method comprises: randomly overturning and adjusting the size of the image in the image sample set according to a preset probability; and cutting the processed image.

Alternatively, fig. 7 is a block diagram of a computer terminal according to an embodiment of the present invention. As shown in fig. 7, the computer terminal may include: one or more processors, memory (only one shown in fig. 7).

The memory may be used to store software programs and modules, such as program instructions/modules corresponding to the image tag identification method and apparatus in the embodiments of the present invention, and the processor executes various functional applications and data processing by running the software programs and modules stored in the memory, that is, implements the image tag identification method described above. The memory may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory may further include memory remotely located from the processor, and these remote memories may be connected to terminal a through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The processor can call the information and application program stored in the memory through the transmission device to execute the following steps: identifying a target image by using an image tag identification model to obtain a first identification result, wherein the image tag identification model is trained by using multiple groups of data through machine learning, and the data in the multiple groups of data comprises: the image and the value corresponding to the image label included in the image; converting the numerical value in the first recognition result into a preset numerical value interval to obtain a second recognition result, wherein the first recognition result comprises at least one numerical value, and the numerical value corresponds to the image tag; and determining the image label of the target image according to the second recognition result.

Optionally, the processor may further execute the program code of the following steps: before processing each image in the set of images to be recognized using the image tag recognition model, the method further comprises: carrying out low-frequency filtering processing on image labels in the image sample set to obtain a first image label set; performing relevance processing on image tags in the first image tag set to obtain a second image tag set; determining image labels included in the images in the image sample set according to the second image label set to obtain a third image label set, wherein the third image label set is composed of at least one subset, and the subset corresponds to the images in the image sample set; and performing learning training according to the third image label set and the image sample set to generate an image label identification model.

Optionally, the processor may further execute the program code of the following steps: before the image labels in the image sample set are subjected to low-frequency filtering processing, the method comprises the following steps: acquiring all image tags in a verification image set to obtain a verification set image tag set; determining image tags which do not appear in the image tags of the image sample set in the verification set image tag set to obtain an image tag subset to be expanded; and performing expansion processing on the image tags of the image sample set and the image sample set according to the subset of the image tags to be expanded to obtain the image tags of the image sample set and the image sample set after the expansion processing.

Optionally, the processor may further execute the program code of the following steps: the expanding processing of the image tags of the image sample set and the image sample set according to the image tag subset to be expanded comprises the following steps: determining an image to be expanded according to the image tag subset to be expanded; carrying out random transformation processing on the image to be expanded; adding the image to be expanded after the transformation processing into the image sample set.

Optionally, the processor may further execute the program code of the following steps: the low-frequency filtering processing of the image labels in the image sample set comprises the following steps: counting the number of times that the image labels of the image sample set appear in the image sample set; and filtering the image labels with the times lower than a first preset threshold value from the image labels of the image sample set.

Optionally, the processor may further execute the program code of the following steps: performing relevance processing on the image tags in the first image tag set to obtain a second image tag set, wherein the obtaining of the second image tag set comprises: sequentially calculating the support degree and the credibility among the image labels in the first image label set; and combining the image labels with the support degree larger than a second preset threshold and the reliability degree larger than a third preset threshold to obtain a second image label set.

Optionally, the processor may further execute the program code of the following steps: before performing learning training according to the third image label set and the image sample set to generate an image label recognition model, the method comprises: randomly overturning and adjusting the size of the image in the image sample set according to a preset probability; and cutting the processed image.

The embodiment of the invention provides a scheme of an image tag identification method. The method comprises the steps of adopting a pre-trained image tag identification model to identify a target image, and identifying the target image by using the image tag identification model to obtain a first identification result, wherein the image tag identification model is trained by using multiple groups of data through machine learning, and the data in the multiple groups of data comprises: the image and the value corresponding to the image label included in the image; converting the numerical value in the first recognition result into a preset numerical value interval to obtain a second recognition result, wherein the first recognition result comprises at least one numerical value, and the numerical value corresponds to the image tag; and determining an image label of the target image according to the second recognition result. The purpose of improving the identification accuracy of the image label is achieved, the technical effect of quickly and accurately identifying the image label is achieved, and the technical problem of low accuracy of identifying the image label is solved.

It can be understood by those skilled in the art that the structure shown in fig. 7 is only an illustration, and the computer terminal may also be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palmtop computer, a Mobile Internet Device (MID), a PAD, and the like. Fig. 7 is a diagram illustrating a structure of the electronic device. For example, the computer terminal 10 may also include more or fewer components (e.g., network interfaces, display devices, etc.) than shown in FIG. 7, or have a different configuration than shown in FIG. 7.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.

Example 5

The embodiment of the invention also provides a storage medium. Optionally, in this embodiment, the storage medium may be configured to store a program code executed by the image tag identification method provided in the first embodiment.

Optionally, in this embodiment, the storage medium may be located in any one of computer terminals in a computer terminal group in a computer network, or in any one of mobile terminals in a mobile terminal group.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: identifying a target image by using an image tag identification model to obtain a first identification result, wherein the image tag identification model is trained by using multiple groups of data through machine learning, and the data in the multiple groups of data comprises: the image and the value corresponding to the image label included in the image; converting the numerical value in the first recognition result into a preset numerical value interval to obtain a second recognition result, wherein the first recognition result comprises at least one numerical value, and the numerical value corresponds to the image tag; and determining the image label of the target image according to the second recognition result.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: before processing each image in the set of images to be recognized using the image tag recognition model, the method further comprises: carrying out low-frequency filtering processing on image labels in the image sample set to obtain a first image label set; performing relevance processing on image tags in the first image tag set to obtain a second image tag set; determining image labels included in the images in the image sample set according to the second image label set to obtain a third image label set, wherein the third image label set is composed of at least one subset, and the subset corresponds to the images in the image sample set; and performing learning training according to the third image label set and the image sample set to generate an image label identification model.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: before the image labels in the image sample set are subjected to low-frequency filtering processing, the method comprises the following steps: acquiring all image tags in a verification image set to obtain a verification set image tag set; determining image tags which do not appear in the image tags of the image sample set in the verification set image tag set to obtain an image tag subset to be expanded; and performing expansion processing on the image tags of the image sample set and the image sample set according to the subset of the image tags to be expanded to obtain the image tags of the image sample set and the image sample set after the expansion processing.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: the expanding processing of the image tags of the image sample set and the image sample set according to the image tag subset to be expanded comprises the following steps: determining an image to be expanded according to the image tag subset to be expanded; carrying out random transformation processing on the image to be expanded; adding the image to be expanded after the transformation processing into the image sample set.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: the low-frequency filtering processing of the image labels in the image sample set comprises the following steps: counting the number of times that the image labels of the image sample set appear in the image sample set; and filtering the image labels with the times lower than a first preset threshold value from the image labels of the image sample set.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: performing relevance processing on the image tags in the first image tag set to obtain a second image tag set, wherein the obtaining of the second image tag set comprises: sequentially calculating the support degree and the credibility among the image labels in the first image label set; and combining the image labels with the support degree larger than a second preset threshold and the reliability degree larger than a third preset threshold to obtain a second image label set.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: before performing learning training according to the third image label set and the image sample set to generate an image label recognition model, the method comprises: randomly overturning and adjusting the size of the image in the image sample set according to a preset probability; and cutting the processed image.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A method for identifying an image tag, the method comprising:

identifying a target image by using an image tag identification model to obtain a first identification result, wherein the image tag identification model is trained by using multiple groups of data through machine learning, and the data in the multiple groups of data comprises: the image and the value corresponding to the image label included in the image;

converting the numerical value in the first recognition result into a preset numerical value interval to obtain a second recognition result, wherein the first recognition result comprises at least one numerical value, and the numerical value corresponds to the image tag;

and determining the image label of the target image according to the second recognition result.

2. The method of claim 1, wherein prior to processing each image in the set of images to be recognized using the image tag recognition model, the method further comprises:

carrying out low-frequency filtering processing on image labels in the image sample set to obtain a first image label set;

performing relevance processing on image tags in the first image tag set to obtain a second image tag set;

determining image labels included in the images according to the second image label set to obtain a third image label set, wherein the third image label set is composed of at least one subset, and the subset corresponds to the images in the image sample set;

and performing learning training according to the third image label set and the image sample set to generate an image label identification model.

3. The method of claim 2, wherein prior to the low frequency filtering of the image labels in the set of image samples, the method comprises:

acquiring all image tags in a verification image set to obtain a verification set image tag set;

determining image tags which do not appear in the image tags of the image sample set in the verification set image tag set to obtain an image tag subset to be expanded;

and performing expansion processing on the image tags of the image sample set and the image sample set according to the subset of the image tags to be expanded to obtain the image tags of the image sample set and the image sample set after the expansion processing.

4. The method of claim 3, wherein the expanding the image tag of the image sample set and the image sample set according to the subset of the image tags to be expanded comprises:

determining an image to be expanded according to the image tag subset to be expanded;

carrying out random transformation processing on the image to be expanded;

adding the image to be expanded after the transformation processing into the image sample set.

5. The method of claim 2, wherein the low-frequency filtering the image labels in the set of image samples comprises:

counting the number of times that the image labels of the image sample set appear in the image sample set;

and filtering the image labels with the times lower than a first preset threshold value from the image labels of the image sample set.

6. The method of claim 2, wherein performing relevance processing on image tags in the first set of image tags to obtain a second set of image tags comprises:

sequentially calculating the support degree and the credibility among the image labels in the first image label set;

and combining the image labels with the support degree larger than a second preset threshold and the reliability degree larger than a third preset threshold to obtain a second image label set.

7. The method of claim 2, wherein prior to performing learning training to generate an image tag recognition model from the third set of image tags and the set of image samples, the method comprises:

randomly overturning and adjusting the size of the image in the image sample set according to a preset probability;

and cutting the processed image.

8. An apparatus for recognizing an image tag, the apparatus comprising:

the identification unit is used for identifying a target image by using an image tag identification model to obtain a first identification result, wherein the image tag identification model is trained by machine learning by using a plurality of groups of data, and the data in the plurality of groups of data comprises: the image and the value corresponding to the image label included in the image;

the conversion unit is used for converting numerical values in the first identification result into a preset numerical value interval to obtain a second identification result, wherein the first identification result comprises at least one numerical value, and the numerical value corresponds to the image label;

and the first determining unit is used for determining the image label of the target image according to the second recognition result.

9. The apparatus of claim 8, further comprising:

the first processing unit is used for performing low-frequency filtering processing on the image labels in the image sample set to obtain a first image label set before processing each image in the image set to be recognized by using the image label recognition model;

the second processing unit is used for performing relevance processing on the image tags in the first image tag set to obtain a second image tag set;

a first obtaining unit, configured to determine, according to the second image tag set, image tags included in the images in the image sample set, and obtain a third image tag set, where the third image tag set is composed of at least one subset, and the subset corresponds to the images in the image sample set;

and the generating unit is used for performing learning training according to the third image label set and the image sample set to generate an image label identification model.

10. The apparatus of claim 9, wherein the apparatus comprises:

the second acquisition unit is used for acquiring all image tags in the verification image set before carrying out low-frequency filtering processing on the image tags in the image sample set to obtain a verification set image tag set;

a second determining unit, configured to determine an image tag that does not appear in the image tag of the image sample set in the verification set image tag set, to obtain an image tag subset to be expanded;

and the expansion unit is used for expanding the image tags of the image sample set and the image sample set according to the image tag subset to be expanded to obtain the image tags of the image sample set and the image sample set after expansion.

11. An image tag identification method, comprising:

receiving a service calling request sent by a client, wherein the service calling request carries an image tag for identifying a target image;

responding to the service calling request, calling an image tag identification model to identify a target image to obtain a first identification result, wherein the image tag identification model is trained by machine learning by using multiple groups of data, and the data in the multiple groups of data comprises: the image and the value corresponding to the image label included in the image;

and outputting the first recognition result, wherein the first recognition result is used for determining an image label of the target image.

12. The method of claim 11, wherein after outputting the first recognition result, the method further comprises:

13. A storage medium, characterized in that the storage medium comprises a stored program, wherein when the program runs, a device where the storage medium is located is controlled to execute the image tag identification method according to any one of claims 1 to 7.

14. A processor, characterized in that the processor is configured to execute a program, wherein the program executes the method for identifying an image tag according to any one of claims 1 to 7.