CN111027600A

CN111027600A - Image category prediction method and device

Info

Publication number: CN111027600A
Application number: CN201911164132.7A
Authority: CN
Inventors: 闫桂霞; 王瑞琛; 王晓利
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-11-25
Filing date: 2019-11-25
Publication date: 2020-04-17
Anticipated expiration: 2039-11-25
Also published as: CN111027600B

Abstract

The embodiment of the application provides an image category prediction method and device, and the image to be classified can be obtained; classifying the images to be classified through a first classification network to obtain reference classes of the images to be classified; classifying the images to be classified through a second classification network to obtain a prediction category set of the images to be classified, wherein the prediction category set comprises a plurality of prediction categories which are arranged according to prediction probabilities corresponding to the prediction categories; determining a target arrangement position of the reference category in the prediction category set; adjusting the arrangement sequence of the prediction categories based on the target arrangement position to obtain an adjusted prediction category set; and acquiring the class prediction result of the image to be classified based on the adjusted prediction class set, so that the accuracy of image class prediction can be effectively improved.

Description

Image category prediction method and device

Technical Field

The present application relates to the field of communications technologies, and in particular, to a method and an apparatus for predicting an image category.

Background

Nowadays, image category prediction techniques are widely used in various industries. The image category prediction technology is an image processing method for quantitatively analyzing an image and classifying the image according to feature information reflected by the image.

In the current image category prediction technology, an image of a known category is generally adopted to train a neural network model, and then the trained neural network model is used to classify the image to obtain a prediction result of the image category.

Disclosure of Invention

The invention aims to provide an image category prediction method and device, which can effectively improve the accuracy of image category prediction.

The embodiment of the application provides an image category prediction method, which comprises the following steps:

acquiring an image to be classified;

classifying the images to be classified through a first classification network to obtain reference classes of the images to be classified;

classifying the images to be classified through a second classification network to obtain a prediction category set of the images to be classified, wherein the prediction category set comprises a plurality of prediction categories which are arranged according to prediction probabilities corresponding to the prediction categories;

determining a target arrangement position of the reference category in the prediction category set;

adjusting the arrangement sequence of the prediction categories based on the target arrangement position to obtain an adjusted prediction category set;

and acquiring a category prediction result of the image to be classified based on the adjusted prediction category set.

Correspondingly, an embodiment of the present application further provides an image category prediction apparatus, including:

an acquisition unit for acquiring an image to be classified;

the first classification unit is used for classifying the images to be classified through a first classification network to obtain reference classes of the images to be classified;

the second classification unit is used for classifying the images to be classified through a second classification network to obtain a prediction category set of the images to be classified, wherein the prediction category set comprises a plurality of prediction categories which are arranged according to prediction probabilities corresponding to the prediction categories;

a determining unit, configured to determine a target arrangement position of the reference category in the prediction category set;

an adjusting unit, configured to adjust the arrangement order of the prediction categories based on the target arrangement position, to obtain an adjusted prediction category set;

and the result acquisition unit is used for acquiring the class prediction result of the image to be classified based on the adjusted prediction class set.

Optionally, in some embodiments, the determining unit includes a set acquiring subunit, a category acquiring subunit, a location determining subunit, and a determining subunit;

the set obtaining subunit is configured to obtain an image category mapping set, where the image category mapping set includes a mapping relationship between a preset reference category of a first classification network and a preset prediction category of a second classification network;

the category obtaining subunit is configured to obtain, based on the image category mapping set, a target prediction category corresponding to the reference category;

the position determining subunit is configured to determine an arrangement position of the target prediction category in the prediction category set;

the determining subunit is configured to determine, according to the arrangement position, a target arrangement position of the reference category in the prediction category set.

Optionally, in some embodiments, the image classification device further includes a construction unit;

the building unit is specifically configured to obtain a preset reference category set of a first classification network, where the preset reference category set includes a plurality of preset reference categories;

acquiring a preset prediction category set of a second classification network, wherein the preset prediction category set comprises a plurality of preset prediction categories;

and establishing a mapping relation between the preset reference category and the preset prediction category to obtain an image category mapping set.

Optionally, in some embodiments, the adjusting unit includes a mode determining subunit and an adjusting subunit;

the mode determining subunit is configured to determine a sequential adjustment mode and a preset position corresponding to the sequential adjustment mode according to the target arrangement position;

and the adjusting subunit is configured to adjust the target prediction category to the preset position in the prediction category set by using the sequential adjustment manner, so as to obtain an adjusted prediction category set.

Optionally, in some embodiments, the second classification unit includes a feature extraction subunit and an analysis subunit;

the feature extraction subunit is configured to perform feature extraction on the image to be classified through the second classification network to obtain local feature information of the image to be classified;

the analysis subunit is configured to classify the image to be classified through the second classification network based on the local feature information, so as to obtain a prediction category set of the image to be classified.

Optionally, in some embodiments, the image classification apparatus further includes a network training unit;

the network training unit is used for acquiring a plurality of image samples marked with real categories;

and training a preset classification network based on the real classification to obtain the second classification network.

Optionally, in some embodiments, the network training unit includes a sample feature extraction subunit, a sample analysis subunit, a loss calculation subunit, and a training subunit;

the sample feature extraction subunit is configured to perform feature extraction on the image sample to obtain local feature information of the image sample;

the sample analysis subunit is configured to classify the image sample based on the local feature information of the image sample to obtain a prediction category set of the image sample, where the prediction category set of the image sample includes multiple sample prediction categories;

the loss calculating subunit is configured to calculate a category loss between each sample prediction category and the real category to obtain a plurality of category losses;

and the training subunit is configured to train the preset classification network based on the plurality of class losses to obtain a second classification network.

Optionally, in some embodiments, the loss calculating subunit includes an adjusting module and a calculating module;

the adjusting module is configured to adjust a loss weight corresponding to each sample prediction category when the real category exists in the plurality of sample prediction categories;

and the calculating module is used for calculating the category loss between each sample prediction category and the real category based on the loss weight corresponding to the sample prediction category.

Optionally, in some embodiments, the plurality of sample prediction categories are sorted according to prediction probabilities corresponding to the sample prediction categories, and the adjusting module is specifically configured to:

determining the loss weight corresponding to each sample prediction category according to the arrangement position of the real category;

determining the arrangement position of the real category according to the arrangement sequence of the sample prediction categories;

and adjusting the loss weight corresponding to each sample prediction category according to the arrangement position of the real category.

Optionally, in some embodiments, the training subunit is specifically configured to:

selecting a target category loss meeting a preset number from the plurality of category losses;

and training the preset classification network based on the target class loss to obtain a second classification network.

Optionally, in some embodiments, the image category prediction apparatus further includes a block chain storage unit;

and the block chain storage unit is used for storing the class prediction result of the image to be classified into a block chain.

Accordingly, an embodiment of the present application further provides a computer-readable storage medium, where instructions are stored, and when executed by a processor, the instructions implement the steps in the image category prediction method provided in any of the embodiments of the present application.

Accordingly, an embodiment of the present application further provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps in the image category prediction method provided in any of the embodiments of the present application when executing the program.

The method and the device can acquire the image to be classified; classifying the images to be classified through a first classification network to obtain reference classes of the images to be classified; classifying the images to be classified through a second classification network to obtain a prediction category set of the images to be classified, wherein the prediction category set comprises a plurality of prediction categories which are arranged according to prediction probabilities corresponding to the prediction categories; determining a target arrangement position of the reference category in the prediction category set; adjusting the arrangement sequence of the prediction categories based on the target arrangement position to obtain an adjusted prediction category set; the class prediction result of the image to be classified is obtained based on the adjusted prediction class set, and the classification accuracy of the prediction class set can be effectively improved due to the fact that the arrangement sequence of the prediction classes is adjusted based on the reference classes in the embodiment of the application, and therefore the accuracy of the image class prediction is effectively improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments will be briefly introduced below. It is obvious that the drawings in the following description are only some embodiments of the application, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.

Fig. 1 is a schematic flowchart of an image category prediction method provided in an embodiment of the present application;

FIG. 2 is a schematic diagram of an image annotation operation provided in an embodiment of the present application;

FIG. 3 is another schematic diagram of an image annotation operation provided in an embodiment of the present application;

FIG. 4 is another schematic diagram of an image class prediction method according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram of a visual image annotation tool provided by an embodiment of the present application;

FIG. 6 is a schematic diagram illustrating a classification flow of an image class prediction tool according to an embodiment of the present application;

FIG. 7 is a schematic diagram illustrating a training process of a classification network according to an embodiment of the present application;

FIG. 8 is a flow chart of a reform loss function provided by an embodiment of the present application;

fig. 9 is a schematic structural diagram of an image category prediction apparatus according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of an image category prediction apparatus according to an embodiment of the present application;

fig. 11 is a schematic structural diagram of an image category prediction apparatus according to an embodiment of the present application;

fig. 12 is a schematic structural diagram of an image category prediction apparatus according to an embodiment of the present application;

fig. 13 is a schematic structural diagram of an image category prediction apparatus according to an embodiment of the present application;

fig. 14 is a schematic structural diagram of an image category prediction apparatus according to an embodiment of the present application;

fig. 15 is a schematic structural diagram of an image category prediction apparatus according to an embodiment of the present application;

fig. 16 is a schematic structural diagram of an image category prediction apparatus according to an embodiment of the present application;

fig. 17 is a schematic structural diagram of an image category prediction apparatus according to an embodiment of the present application;

fig. 18 is a schematic structural diagram of a computer device provided in an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The embodiment of the application provides an image category prediction method and device, wherein the image category prediction device can be integrated in a server, and the server can comprise a background server, and the like.

For example, taking the example that the image category prediction device is integrated in a server, the server may acquire an image to be classified; classifying the image to be classified through a first classification network to obtain a reference class of the image to be classified; classifying the image to be classified through a second classification network to obtain a prediction category set of the image to be classified, wherein the prediction category set comprises a plurality of prediction categories which are arranged according to prediction probabilities corresponding to the prediction categories; determining a target arrangement position of the reference category in the prediction category set; adjusting the arrangement sequence of the prediction categories based on the target arrangement position to obtain an adjusted prediction category set; and acquiring a category prediction result of the image to be classified based on the adjusted prediction category set.

The following are detailed below. It should be noted that the following description of the embodiments is not intended to limit the preferred order of the embodiments.

In some embodiments, the present application will be described from the perspective of an image class prediction apparatus, which may be specifically integrated in a server.

As shown in fig. 1, an image category prediction method is provided, which may be executed by a server, and the specific flow may be as follows:

101. and acquiring an image to be classified.

The image to be classified may have various representations, for example, in some embodiments, it may be a complete image, which may include a background and an object, or for example, it may be a part of an image, such as an object in an image.

In some embodiments, in order to improve the efficiency of image category prediction, an image labeling tool may be used to perform labeling operation on an image, and the image to be classified is obtained according to the labeling operation, for example, the image labeling tool may be used to label a plurality of extreme points on the image, and then an area surrounded by line segments connected between the extreme points is determined as the image to be classified; for another example, an image labeling tool may be used to label a plurality of extreme points on the image, and then the image is processed according to a preset algorithm, so as to outline the image to be classified.

In some embodiments, in order to obtain a more accurate image to be classified, the contour of the image to be classified may be adjusted based on the operation of the image labeling tool by the labeling person, for example, the contour modification tool provided on the image labeling tool may be used for adjustment, or for example, the modification function provided on the image labeling tool may be operated, and then the image labeling tool may adjust the contour according to a preset modification algorithm; the preset correction algorithm may be various, for example, a deep learning algorithm, and may be specifically set according to a service requirement.

Because the image area to be classified can be selected according to the business requirements, for example, a full image or a local image can be selected as the image to be classified, the business requirements can be met, and the efficiency of image category prediction can be effectively improved.

For example, an image labeling tool may be used to perform a labeling operation on an object to be classified in one image, and then cut the image based on a point of the labeling operation, so as to obtain an image to be classified, as shown in fig. 2, a plurality of extreme points "P1", "P2", "P3", and "P4" may be labeled on a "cart" in the image, and then an area (Bbox, Bounding Box) image "a 1" determined by a line segment connecting between the extreme points is determined as the image to be classified, and then the image labeling tool segments the whole image according to the Bbox image, so as to obtain the image to be classified.

For another example, an image annotation tool may be used to perform an annotation operation on an object to be classified in an image, as shown in fig. 3, a plurality of extreme points "P5", "P6", "P7", and "P8" may be annotated on a "cart" in the image, and then the image annotation tool may outline "a 2" of the "cart" on the image based on a preset algorithm, and a annotator may adjust the outline of the "cart" by using a contour modification tool, or may adjust a frame of the "cart" by operating a modification function on the annotation tool, and then the image annotation tool adjusts the frame of the "cart" by using a preset modification algorithm, so as to obtain a more accurate image to be classified.

102. And classifying the image to be classified through a first classification network to obtain a reference class of the image to be classified.

In some embodiments, in order to effectively improve the classification efficiency, the reference category may be in the form of a reference category label, where the reference category label may include at least one of a chinese character, a number, a binary character, and a letter, for example, the reference category label may be a "Box" reference category label, and for example, the reference category label may also be a "carton" reference category label, and for example, the reference category label may also be a category label of a vehicle, a pedestrian, a building, and the like. The specific label can be set according to actual operation.

For example, the image to be classified may be subjected to feature extraction by a feature extraction layer in the classification network, and then the feature extraction result is processed by a feature classification layer (i.e., a full connection layer), and then the classification result is obtained.

The first classification Network may have a plurality of expressions, for example, the classification Network may be a Convolutional Neural Network (CNN), for example, the classification Network may also be a Residual Network (ResNet), for example, the classification Network may also be an inclusion-V3 (a Convolutional Neural Network structure).

Before the images to be classified are classified through the first classification network, the first classification network may have multiple obtaining modes, for example, the first classification network may be preset by a developer, or for example, the first classification network may also be obtained through pre-training, for example, the first classification network may be obtained through pre-training the classification network by using 1000-class standard labels based on ImageNet (an image network, a large visual database for visual object recognition software research); for another example, the first classification network may be obtained by pre-training the classification network based on other data sets, which may be various data sets, such as an MNIST data set (Modified National Institute of standard and Technology database, which is a data set For handwriting recognition), or a CIFAR-10 data set (National Institute For advanced research-10database, high research Institute of canada-10 data set, which is a data set For computer vision), and may be specifically set according to the requirements of practical applications.

The network parameters of the first classification network can be set according to the requirements of actual application.

Because the preset classification network can be trained by using the data set to obtain the first classification network, the accuracy of the first classification network can be effectively improved based on the abundance of the existing data set, so that the reference value of the reference category is higher, and the accuracy of image category prediction can also be effectively improved.

For example, a ResNet network pre-trained based on ImageNet 1000-type standard labels may be used for classification, where the ResNet network may include a convolution feature extraction layer and a full connection layer, and then the image to be classified may be convolved by the convolution feature extraction layer, and then the result of the convolution processing may be processed by the full connection layer, so as to obtain the reference type of the image to be classified.

For another example, the images to be classified may be classified by using an inclusion-V3 network that may be pre-trained by using standard labels based on ImageNet1000, where the inclusion-V3 network may include a convolution feature extraction layer and a full connection layer, and then the images to be classified are convolved by the convolution feature extraction layer, and then the results of the convolution processing are processed by the full connection layer, so as to obtain the reference category of the images to be classified.

103. Classifying the image to be classified through a second classification network to obtain a prediction category set of the image to be classified, wherein the prediction category set comprises a plurality of prediction categories which are arranged according to prediction probabilities corresponding to the prediction categories.

In some embodiments, in order to effectively improve the efficiency of classification, the prediction category may be in the form of a prediction category tag, where the prediction category tag may include at least one of a chinese character, a number, a binary character, and a letter, for example, the prediction category tag may be a prediction category tag of "Bird", for example, the prediction category tag may also be a prediction category tag of "glass box", and for example, the prediction category tag may also be a category tag of a vehicle, a pedestrian, a building, and the like. The specific label can be set according to actual operation.

In some embodiments, the prediction probability may be obtained when the image to be classified is classified by the second classification network based on the local feature information, and specifically may include:

analyzing the local characteristic information by adopting the second classification network to obtain global characteristic information;

and classifying based on the global characteristic information to obtain a plurality of prediction categories and prediction probabilities corresponding to the prediction categories.

For example, in some embodiments, the second classification network may include a convolution feature extraction layer and a full-link layer, wherein the convolution feature extraction layer is configured to extract local feature information in an image to be classified, and the full-link layer is configured to combine and analyze the local feature information to obtain global feature information, and then predict a category of the image to be classified based on the global feature information to obtain a prediction result, which may be, for example, a plurality of prediction categories; then, the prediction result is processed by a Normalized indication function (also referred to as a Softmax function) to obtain the prediction probability corresponding to each prediction category.

For example, in some embodiments, feature extraction may be performed on the image to be classified to obtain local feature information, and then the local feature information is analyzed to obtain the prediction category set of the image to be classified, which may specifically include:

performing feature extraction on the image to be classified through the second classification network to obtain local feature information of the image to be classified;

classifying the images to be classified through the second classification network based on the local feature information to obtain a prediction class set of the images to be classified.

The local feature information is information of a region which is different from other regions in the image to be classified; the local feature information generally describes a region in the image to be classified, and the region has high distinguishability; generally has the following characteristics: repeatability, distinguishability, accuracy, effectiveness (number of features, efficiency of feature extraction), robustness (stability, invariance).

The second classification network may include a convolution layer (convolution layer), a pooling layer (polling layer), and a non-linear mapping layer (non-linear mapping, also called activation function), where the convolution layer acts on a local image region of the image to be classified through a convolution kernel of a certain size to obtain feature information of the local image region, and then the pooling layer implements down-sampling (down-sampling) operation, and the pooling layer has a function of ensuring feature invariance, feature dimensionality reduction, and preventing overfitting (overfitting) to a certain extent; and then, in order to increase the nonlinear capacity of the whole classification network, processing the image through a nonlinear mapping layer to finally obtain the local characteristic information of the image to be classified.

For example, in some embodiments, the second Classification network may include a full connection layer, the full connection layer has a function of a "classifier", and Classification (Classification) of the local feature information is realized based on the full connection layer, so that the prediction category set of the image to be classified may be obtained.

The network parameters of the second classification network can be set according to actual requirements.

Before classifying the image to be classified by the second classification network, the second classification network may have multiple obtaining manners, for example, a developer may preset the second classification network, and for example, in some embodiments, in order to improve accuracy of image class prediction, the preset classification network may be trained based on a sample of a known real class to obtain the second classification network, which may specifically include:

collecting a plurality of image samples marked with real categories;

In some embodiments, for example, the image may be selected from an existing image database of known real categories as the sample, or for example, training may be performed according to the image of the known real category collected in the actual service, and the setting may be specifically performed according to the actual requirement.

The number of the image samples may be adjusted according to the prediction accuracy, for example, in some embodiments, 200 image samples may be selected, and for example, in some embodiments, 1999 image samples may be selected, and may be specifically set according to actual requirements.

For example, in some embodiments, the preset classification network may be a classification network trained based on image samples of known real categories, for example, the classification network may include a feature extraction layer and a full connection layer, when performing the migration learning training, network parameters of all the feature extraction layers are frozen, and the full connection layer is trained, so that the preset classification network may be obtained, and for example, when performing the migration learning training, network parameters of a part of the feature extraction layers may be frozen, and training of another part of the feature extraction layers and the full connection layer may be performed, and a specific training mode may be set according to actual requirements.

For example, in some embodiments, the second classification network may be trained based on a class loss between a sample prediction class and a real class of an image sample, and specifically may include:

performing feature extraction on the image sample to obtain local feature information of the image sample;

classifying the image samples based on the local feature information of the image samples to obtain a prediction category set of the image samples, wherein the prediction category set of the image samples comprises a plurality of sample prediction categories;

calculating the class loss between each sample prediction class and the real class to obtain a plurality of class losses;

and training the preset classification network based on the plurality of class losses to obtain a second classification network.

For example, the preset classification network is a classification network after transfer learning, and the preset classification network may include a feature extraction layer and a full connection layer, wherein network parameters of the feature extraction layer are frozen, and the full connection layer is trained; the preset classification network can perform feature extraction on the image sample to obtain local feature information of the image sample; then classifying the image samples based on the local characteristic information of the image samples to obtain a plurality of prediction categories of the image samples, and then calculating category loss between each sample prediction category and the real category to obtain a plurality of category losses; and finally, training the preset classification network based on the plurality of class losses to obtain a second classification network.

For example, in some embodiments, a certain number of category losses may be selected according to the requirement of prediction accuracy to train the preset classification network to obtain the second classification network, and specifically, the method may include:

For example, when the sample prediction class set includes 10 prediction classes, the class loss between the 10 sample prediction classes and the real class is calculated, and since only the precision of the top 3 prediction classes needs to be improved in the service requirement, the top 3 class loss is selected to train the preset classification network, so as to obtain the second classification network.

From the above, since a certain amount of category losses can be selected according to the requirement of prediction accuracy to train the preset network, the requirement in the actual image category prediction service can be effectively met.

For example, in some embodiments, the loss weight corresponding to each sample prediction category may be adjusted according to the real category, and then the category loss is calculated, which may specifically include:

when the real category exists in the plurality of sample prediction categories, adjusting the loss weight corresponding to each sample prediction category;

and calculating the category loss between each sample prediction category and the real category based on the loss weight corresponding to the sample prediction category.

In some embodiments, the plurality of sample prediction categories are ordered according to the prediction probabilities corresponding to the sample prediction categories, and then the method for adjusting the loss weight corresponding to each sample prediction category may be various, for example, the method for adjusting the loss weight corresponding to each sample prediction category according to the real category may specifically include:

For example, in some embodiments, the weight to be adjusted may be determined according to the prediction accuracy of the actual requirement, for example, when in practical application, a developer expects that the accuracy of the first N prediction categories is high, and needs the real categories to appear in the first N prediction categories when training the preset classification network, then when the arrangement position of the real categories is not in the first N, the loss weights corresponding to other sample prediction categories are adjusted, so that the possibility of the real categories appearing in the first N is improved; when the arrangement position of the real category is the first N bits, normal weight can be given to maintain the accuracy of prediction.

For example, classifying image samples of known real categories can obtain a plurality of sample prediction categories of "chicken", "duck", "cow" and "goose", wherein the sample prediction categories are arranged according to prediction probabilities, the real category of the image samples is "goose", the arrangement position of the "goose" can be determined to be the fourth position according to the arrangement sequence of the sample prediction categories, the accuracy of the previous three prediction results is improved due to operation needs, and the real category "goose" is not predicted in the previous three sample prediction categories, so that the loss weight corresponding to each sample prediction category needs to be adjusted, for example, N times of weight can be given, so that the real category can appear in the previous three sample prediction categories, and the loss weight can be set according to actual needs.

Therefore, the method for adjusting the loss weight corresponding to each sample prediction category can effectively improve the hit rate in the prediction category set, can determine the numerical value of the loss weight according to the requirement of actual service, and can effectively meet the requirement of operation precision.

104. And determining the target arrangement position of the reference category in the prediction category set.

For example, in some embodiments, a target prediction category corresponding to the reference category may be obtained according to the image category mapping set, and then the target arrangement position is determined according to the target prediction category, which may specifically include:

acquiring an image category mapping set, wherein the image category mapping set comprises a mapping relation between a preset reference category of a first classification network and a preset prediction category of a second classification network;

acquiring a target prediction category corresponding to the reference category based on the image category mapping set,

determining the arrangement position of the target prediction category in the prediction category set;

and determining the target arrangement position of the reference category in the prediction category set according to the arrangement position.

The manner of obtaining the image category mapping set may be various, for example, the image category mapping set may be obtained from a database of a server; for another example, an image category mapping set preset by a developer may be acquired.

For example, the image category mapping set may be preset by a developer, and in some embodiments, the mapping relationship between the preset reference category of the first classification network and the preset prediction category of the second classification network may be constructed and set, which specifically includes:

acquiring a preset reference category set of a first classification network, wherein the preset reference category set comprises a plurality of preset reference categories;

In some embodiments, the preset reference category set may be in the form of a preset reference category label data set, and the preset reference category set of the first classification network may be acquired in various ways, for example, may be preset by a developer; the preset reference category set may have various representations, for example, it may be an ImageNet (an image web, a large visual database for visual object recognition software research) 1000-class standard label set, or an MNIST data set (Modified National Institute of Standards and Technology database, Modified american National Standards and Technology Institute data set, a data set for handwriting recognition), which may be set according to actual requirements.

In some embodiments, the preset prediction category set may be in the form of a preset prediction category label data set, and the preset prediction category set of the second classification network may be acquired in various ways, for example, may be preset by a developer; the preset prediction category set may have a plurality of presentation forms, For example, may be an ImageNet 1000-type standard label set, and may also be, For example, a CIFAR-10 data set (a Canadian Institute For Advanced Research-10database, high canada Institute-10 data set, a data set For computer vision), which may be specifically set according to actual requirements.

The mapping relationship between the preset reference category and the preset prediction category may be established in various ways, for example, in some embodiments, the mapping relationship may be established according to an inclusion relationship between the preset reference category and the preset prediction category.

For example, the preset reference category is "apple" and "banana", the preset prediction category is "fruit", the "fruit" includes "apple" and "banana" according to the actual cognition, and the mapping relationship between "apple" and "banana" and "fruit" can be established according to the inclusion relationship between the "fruit" and the "apple".

For another example, the preset reference category is "carton" and "glass box", the preset prediction category is "box", and since the "carton" and the "glass box" need to be summarized to the "box" category according to the actual business requirements, the mapping relationship between the "carton", "glass box" and "box" can be established according to the inclusion relationship.

For example, the reference category is "car", and the prediction categories are "bicycle", "motorcycle" and "car", wherein the prediction categories are arranged according to respective prediction probabilities; then, an image category mapping set is obtained, a car corresponding to the car can be obtained according to the image category mapping set, namely, the target prediction category is the car, then, the arrangement position of the car in the prediction category set is the third position, and then, the arrangement position of the car in the prediction category set is the third position according to the arrangement position of the third position.

Because the first classification network is based on the preset reference category when classifying, and the second classification network is based on the preset prediction category when classifying, the mapping relation between the preset reference category and the preset prediction category is established to obtain the image category mapping set, so that the classification data sets used by a plurality of classification networks can be effectively combined, the generalization capability of the classification networks is effectively improved, and the prediction accuracy is further effectively improved.

105. And adjusting the arrangement sequence of the prediction categories based on the target arrangement position to obtain an adjusted prediction category set.

For example, in some embodiments, in order to meet the requirement of actual service, a sequential adjustment manner may be determined according to the target arrangement position, and then the sequential adjustment manner is adopted to adjust the prediction category set, which may specifically include:

determining a sequential adjustment mode and a preset position corresponding to the sequential adjustment mode according to the target arrangement position;

and in the prediction category set, adjusting the target prediction category to the preset position by adopting the sequential adjustment mode to obtain an adjusted prediction category set.

For example, in some embodiments, the order adjustment manner may be determined according to an order of first N-bit prediction categories that need to be considered according to actual needs, for example, when the actual business needs only need to consider the order of the first three-bit prediction categories in the prediction category set, the order adjustment manner and the preset position corresponding to the order adjustment manner may be determined according to whether the target arrangement position is the first three bits or not. The setting can be specifically carried out according to actual requirements.

For example, in some embodiments, the adjustment mode may be to move the target prediction category to a preset position, and the remaining prediction categories are sorted by keeping relative positions, and for example, the adjustment mode may be to replace the target prediction category with the prediction category at the preset position, and the remaining prediction categories are kept at the original positions. The specific adjustment mode and the predicted position can be set according to actual requirements.

For example, if the target prediction category is "car", the prediction category set may include "bicycle", "motorcycle" and "car", it may be obtained that the target arrangement position is the third position, and when the actual business demand only needs to consider the order of the first four prediction categories in the prediction category set, according to the third position of the "car", the adjustment mode of the "car" may be determined to move the target prediction category to the preset position, and the remaining prediction categories keep the relative positions for sorting, where the preset position is the first position, the "car" is adjusted to the first position, and the adjusted prediction category set may be "car", "bicycle" and "motorcycle".

106. And acquiring a category prediction result of the image to be classified based on the adjusted prediction category set.

For example, in some embodiments, in order to improve the accuracy of the class prediction result, the adjusted prediction class set may be used as the class prediction result of the image to be classified.

For example, when the adjusted prediction category set is "box", "bird" or "dog", the prediction category set may be used as the category prediction result of the image to be classified.

In some embodiments, when the target arrangement position corresponding to the reference category cannot be determined, the prediction category set may be used as the category prediction result in order to ensure the accuracy of the prediction, for example, when the prediction category set is "banana", "apple", "grape" or "hami melon", when the reference category is "automobile", the target prediction category corresponding to "automobile" cannot be determined in the prediction category set, and the target arrangement position of the target prediction category, the prediction category set may be "banana", "apple", "grape" or "hami melon" in order to ensure the accuracy of the prediction.

In some embodiments, after obtaining the class prediction result of the image to be classified based on the adjusted prediction class set, in order to improve the adaptability of the image class prediction method, the method may further include:

and storing the class prediction result of the image to be classified into a block chain.

The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism and an encryption algorithm. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product services layer, and an application services layer.

The block chain underlying platform can comprise processing modules such as user management, basic service, intelligent contract and operation monitoring. The user management module is responsible for identity information management of all blockchain participants, and comprises public and private key generation maintenance (account management), key management, user real identity and blockchain address corresponding relation maintenance (authority management) and the like, and under the authorization condition, the user management module supervises and audits the transaction condition of certain real identities and provides rule configuration (wind control audit) of risk control; the basic service module is deployed on all block chain node equipment and used for verifying the validity of the service request, recording the service request to storage after consensus on the valid request is completed, for a new service request, the basic service firstly performs interface adaptation analysis and authentication processing (interface adaptation), then encrypts service information (consensus management) through a consensus algorithm, transmits the service information to a shared account (network communication) completely and consistently after encryption, and performs recording and storage; the intelligent contract module is responsible for registering and issuing contracts, triggering the contracts and executing the contracts, developers can define contract logics through a certain programming language, issue the contract logics to a block chain (contract registration), call keys or other event triggering and executing according to the logics of contract clauses, complete the contract logics and simultaneously provide the function of upgrading and canceling the contracts; the operation monitoring module is mainly responsible for deployment, configuration modification, contract setting, cloud adaptation in the product release process and visual output of real-time states in product operation, such as: alarm, monitoring network conditions, monitoring node equipment health status, and the like.

The platform product service layer provides basic capability and an implementation framework of typical application, and developers can complete block chain implementation of business logic based on the basic capability and the characteristics of the superposed business. The application service layer provides the application service based on the block chain scheme for the business participants to use.

As can be seen from the above, the image to be classified can be acquired in the embodiment of the present application; classifying the images to be classified through a first classification network to obtain reference classes of the images to be classified; classifying the images to be classified through a second classification network to obtain a prediction category set of the images to be classified, wherein the prediction category set comprises a plurality of prediction categories which are arranged according to prediction probabilities corresponding to the prediction categories; determining a target arrangement position of the reference category in the prediction category set; adjusting the arrangement sequence of the prediction categories based on the target arrangement position to obtain an adjusted prediction category set; the class prediction result of the image to be classified is obtained based on the adjusted prediction class set, and the classification accuracy of the prediction class set can be effectively improved due to the fact that the arrangement sequence of the prediction classes is adjusted based on the reference classes in the embodiment of the application, and therefore the accuracy of the image class prediction is effectively improved.

The method of the embodiment of the present application is described above by taking an image type prediction apparatus as an example.

According to the method described in the above embodiment, the following description will be made in further detail by taking an example that the image category prediction apparatus is specifically integrated in a server, which may include a background server or the like.

As shown in fig. 4, an image category prediction method is provided, and the specific flow may be as follows:

201. the server obtains the image to be classified.

The method for acquiring the image to be classified by the server may be various, for example, the server may acquire the image to be classified collectively from the images to be classified collected by the operator, and for example, may acquire the image to be classified obtained through the operation processing of the annotating personnel in the image annotating tool.

For example, in some embodiments, as shown in fig. 5, the image annotation tool may be in the form of a visual image annotation tool, which may include an image display box and an image processing toolbar, where the image display box may be used to display the image being annotated and also to display a processing procedure of the image; the image processing toolbar can comprise an annotation tool button, a contour modification tool button, a modification function button and the like, wherein the annotation tool button in the image processing toolbar can be used for annotating extreme points on the image by an annotation person, and then the contour of the image to be classified, which is outlined by the visual image annotation tool according to the extreme points, is displayed in the image display frame; the contour modification tool can be used for a marking person to adjust the contour so as to obtain a more accurate image to be classified; the function modifying button can be used for marking to perform clicking operation, so that the visual image marking tool adjusts the outline according to a preset algorithm, more accurate images to be classified can be obtained, and meanwhile, the labor amount of marking personnel can be reduced, and therefore the efficiency of image category prediction is effectively improved.

For example, the image includes a scene background "sky" and an object "car", the annotating person performs an operation of annotating extreme points on the "car" in the image, then the image annotation tool determines a region to be classified in the image based on a region surrounded by line segments connected between the extreme points, and takes the region to be classified as an image to be classified, and then the server acquires the image to be classified through the image annotation tool.

For another example, the image includes a scene background "sky" and an object "puppy", the annotating person uses the entire image as the image to be classified by the image annotating tool, and then the server obtains the image to be classified by the image annotating tool.

202. And the server classifies the image to be classified through a first classification network to obtain a reference class of the image to be classified.

For example, the image to be classified is an "automobile", the first classification network is an inclusion-V3 network pre-trained based on the ImageNet 1000-class standard label, the server performs convolution processing on the image to be classified through a convolution layer in the pre-trained inclusion-V3 classification network, and then processes a convolution processing result through the full connection layer, so that the reference class label of the image to be classified is a "small automobile".

For another example, the image to be classified includes a scene background "sky" and an object "puppy", the first classification network is a ResNet network pre-trained based on ImageNet 1000-type standard labels, and then the server classifies the image to be classified by the pre-trained ResNet network to obtain a reference class label of the image to be classified as "puppy".

203. And the server classifies the image to be classified through a second classification network to obtain a prediction category set of the image to be classified, wherein the prediction category set comprises a plurality of prediction categories which are arranged according to prediction probabilities corresponding to the prediction categories.

In some embodiments, before the server classifies the network to be classified by the second classification network, the server may train the classification network based on image samples of known real classes to obtain the second classification network.

For example, the preset network is an inclusion-V3 network trained by transfer learning, and the network may include a feature extraction layer and a full connection layer; when the transfer learning training is carried out, about 200 images of known real categories are selected as samples to carry out the transfer learning training, network parameters of all feature extraction layers are frozen during the transfer learning training, and all connection layers are trained;

secondly, training the preset network based on the image samples of known real categories to obtain a second classification network, classifying the image samples based on the preset network during training to obtain a plurality of sample prediction categories, wherein the plurality of sample prediction categories are arranged based on the prediction probabilities corresponding to the sample prediction categories, and since the service requirement is to improve the accuracy of the first 3 prediction categories, when the real categories do not belong to the first 3 prediction categories, the loss weight corresponding to each prediction category is adjusted to improve the accuracy of the predicted real categories in the first three prediction categories;

and then, calculating the loss weight between each prediction category and the real category based on the loss weight corresponding to each prediction category, selecting the category loss of the first 3 bits for back propagation because the service requirement is to improve the accuracy of the prediction category of the first 3 bits, and training the preset classification network to obtain a second classification network.

For example, the image to be classified is "car", the preset classification network is an inclusion-V3 network trained by transfer learning, the second classification network is a preset classification network trained by class loss, and the server obtains a class label of a prediction class set of the image to be classified, namely "motorcycle", "bicycle", "car" and "tricycle", through the second classification network.

For another example, the image to be classified includes a scene background "sky" and an object "puppy", the preset classification network is a ResNet network trained through migration learning, the second classification network is a preset classification network trained through class loss, and the server obtains a prediction class set of the image to be classified as "piglet", "kitten", "lamb", and "puppy" through the second classification network.

204. The server determines a target ranking position of the reference category in the set of predicted categories.

For example, the image to be classified is an "automobile", the reference category label of the image to be classified is "car" obtained based on the first classification network, the prediction category set of the image to be classified is the category label of "motorcycle", "bicycle", "automobile" or "tricycle" obtained based on the second classification network, the server obtains the image category mapping set, the image category mapping set comprises the mapping relation between "car" and "automobile", the server can obtain that the target prediction category label corresponding to "car" is "automobile" according to the image category mapping set, wherein the arrangement position of "automobile" is the third position, and the server can obtain that the target arrangement position of "car" in the prediction category set is the third position.

For another example, the image to be classified includes a scene background "sky" and an object "puppy", a reference category label of the image to be classified is obtained as "puppy" based on the first classification network, a prediction category set of the image to be classified is obtained as a category label of "piglet", "kitten", "lamb", and "puppy" based on the second classification network, the server may obtain an image category mapping set, and a target prediction category label corresponding to the "puppy" is obtained as "puppy" according to the image category mapping set, where an arrangement position of the "puppy" is a fourth position, and a target arrangement position of the "puppy" in the prediction category set is obtained as a fourth position by the server.

205. And the server adjusts the arrangement sequence of the prediction categories based on the target arrangement position to obtain an adjusted prediction category set.

For example, the image to be classified is an automobile, the reference category label of the image to be classified is a small automobile based on the first classification network, the prediction category set of the image to be classified is a category label of a motorcycle, a bicycle, an automobile and a tricycle based on the second classification network, the target arrangement position of the small automobile in the prediction category set is a third position, the actual business requirements only need to consider the sequence of the first three prediction categories in the prediction category set, according to the third position of the automobile, the server can determine the adjustment mode of the automobile to be that the target prediction category is moved to the preset position, the rest prediction categories keep relative positions for ordering, wherein the preset position is the first position, the server adjusts the automobile to the first position, and the adjusted prediction category set can be an automobile, "motorcycle", "bicycle" and "tricycle".

For another example, the image to be classified includes a scene background "sky" and an object "puppy", a reference category label of the image to be classified is obtained as "puppy" based on the first classification network, a prediction category set of the image to be classified is obtained as a category label of "piglet", "kitten", "lamb", "puppy" based on the second classification network, the server can obtain that a target arrangement position of the "puppy" in the prediction category set is a fourth position, according to the fourth position of the "puppy", the server can determine that the adjustment mode of the "puppy" is to replace the target prediction category with the prediction category at the preset position, the rest prediction categories keep the original positions for ordering, wherein the preset position is the first position, the server replaces the category label of the "puppy" and the "piglet", and can obtain that the replaced prediction category set is the "puppy" A kitten, a lamb and a piglet.

206. And the server acquires the class prediction result of the image to be classified based on the adjusted prediction class set.

For example, the image to be classified is "car", the adjusted prediction category set includes "car", "motorcycle", "bicycle", "tricycle", and then the server takes the adjusted prediction category set as the category prediction result of the image to be classified, that is, the category prediction result is: the images to be classified may be "car", "motorcycle", "bicycle", "tricycle", wherein the probability of being "car" is the greatest.

For another example, the image to be classified includes a scene background "sky" and an object "puppy", the adjusted prediction category set includes "puppy", "kitten", "lamb" and "piglet", and then the server takes the adjusted prediction category set as a category prediction result of the image to be classified, that is, the category prediction result is: the images to be classified may be "puppies", "kittens", "lamb" or "piglets", where the probability of being a "puppy" is the greatest.

In some embodiments, when the target arrangement position of the target prediction category cannot be determined in the prediction category set, the prediction category set may be used as a category prediction result in order to ensure the accuracy of prediction.

In some embodiments, after the server obtains the category prediction result of the image to be classified, the server may further display the category prediction result of the image to be classified through a visual image annotation tool, where the visual image annotation tool may include a category display area.

For example, the image to be classified is "car", and then the server obtains the class prediction result as: if the image to be classified may be a car, a bicycle, or a motorcycle, the category prediction result "car", "bicycle, or" motorcycle "may be displayed in the category display region of the visual image labeling tool.

For example, the image to be classified includes a scene background "sky" and an object "puppy", and then the server obtains a category prediction result as: if the image to be classified may be a "dog", "cat", or a "pig", the category prediction result "dog", "cat", or "pig" may be displayed in the category display area of the visual image annotation tool.

In some embodiments, after the server obtains the class prediction result of the image to be classified, in order to improve the stability of the image class prediction, the class prediction result may be further stored in the block chain.

From the above, in the embodiment of the application, the server may obtain the image to be classified; then, the server classifies the images to be classified through a first classification network to obtain reference classes of the images to be classified; then, the server classifies the images to be classified through a second classification network to obtain a prediction category set of the images to be classified, wherein the prediction category set comprises a plurality of prediction categories which are arranged according to prediction probabilities corresponding to the prediction categories; then, the server determines the target arrangement position of the reference category in the prediction category set; the server adjusts the arrangement sequence of the prediction categories based on the target arrangement position to obtain an adjusted prediction category set; and finally, the server acquires the class prediction result of the image to be classified based on the adjusted prediction class set, and the classification accuracy of the prediction class set can be effectively improved and the accuracy of the image class prediction is further effectively improved because the arrangement sequence of the prediction classes is adjusted based on the reference classes in the embodiment of the application.

The method is described below by taking as an example that the image category prediction apparatus is specifically integrated in an image category prediction tool, which may include a segmentation module and a classification module.

The segmentation module can be used for enabling a user to segment an image according to requirements to obtain a region to be classified in the image, so that the classification module classifies the region to be classified as the image to be classified, for example, when the whole image is classified according to business requirements, the whole image is taken as the region to be classified in the segmentation module; for another example, when the local images need to be classified according to the business requirements, the local images are obtained by operating in the segmentation module, and the local images are used as the regions to be classified.

The segmentation module may have a plurality of expression forms, and in some embodiments, the segmentation module may be in a mode of a visual image annotation tool, and the visual image annotation tool may acquire an annotation operation of a user on an image, so as to obtain a region to be classified in the image.

The labeling operation may have various forms, for example, a plurality of extreme points may be labeled on an image, and then an area (Bbox, Bounding Box) surrounded by line segments connecting between the extreme points is determined as an area to be classified; for another example, a plurality of line segments may be marked on the image, and then the region to be classified is determined by connecting the enclosed regions between the line segments.

For example, the user marks four extreme points on the image, and then determines the region surrounded by the connection between the four extreme points as the region to be classified.

The classification module can be used for classifying the regions to be classified to obtain prediction categories; the prediction category may take a variety of forms, for example, the prediction category may take the form of a category label, such as a category label for a vehicle, pedestrian, building, etc. The setting can be specifically carried out according to actual requirements.

In some embodiments, in order to improve the classification accuracy of the classification module, the classification module may include a trained classification network and an untrained classification network, and then the region to be classified may be classified by the trained classification network and the untrained classification network, respectively, to obtain a corresponding prediction class set and a reference class, and the prediction class set is adjusted by the reference class, to obtain a comprehensive prediction class.

The classification Network may have various expressions, for example, the classification Network may be a Convolutional Neural Network (CNN), for example, the classification Network may also be a Residual Network (ResNet), and for example, the classification Network may also be an inclusion-V3 (a Convolutional Neural Network structure).

For example, in some embodiments, the classification network may be trained in a migration learning manner, for example, the residual error network may be trained in a migration learning manner to obtain the residual error network after the migration training. Transfer Learning (Transfer Learning), as used herein, refers to a deep Learning training method that applies knowledge or patterns learned in a certain domain or task to different but related domains or questions. The specific training method can be set according to actual requirements.

As shown in fig. 6, the following describes the classification process of the classification module in the image classification prediction tool.

(1) Unified classes

In order to obtain a comprehensive prediction class by adjusting the prediction class set by referring to classes, before training, a class data set to be migrated in training needs to be interfaced with a class data set used by an untrained classification network in classification.

The category data set may be a category set of images used by the classification network in classification, and the category data set may have multiple presentation forms, for example, the category data set may include multiple category labels, and the category labels may have multiple forms, such as category labels for vehicles, pedestrians, buildings, and the like. Can be specifically set according to actual requirements

For example, in some embodiments, according to the accuracy requirement of image category prediction, a mapping hierarchical relationship may be constructed based on a category data set used when an untrained classification network performs classification, then, by sorting the category data set to be migrated in training, an inclusion relationship between the sorted category data set and the mapping hierarchical relationship is determined, and an image category mapping set is constructed according to the inclusion relationship, so as to achieve unification of category labels, where the image category mapping set includes mapping relationships between categories.

The mapping hierarchical relationship is used for overcoming the inclusion relationship among the category labels, for example, when the category data set may include the category labels "fruit", "apple" and "snow pear", and for the actual business situation, the category "fruit" belongs to both the apple "and the snow pear", the mapping relationship among the "apple", "snow pear" and the "fruit" is established, so as to overcome the inclusion relationship among the category labels.

For example, when the accuracy requirement of the image category prediction in the actual business is low, the category label of "chicken" may be mapped to the category label of "vertebrate" in the mapping hierarchical relationship, and for example, when the accuracy requirement of the image category prediction in the actual business is high, the category label of "chicken" may be mapped to the category label of "bird" in the mapping hierarchical relationship, which may be specifically set according to the actual requirement.

For example, the preset classification network may be a classification network pre-trained based on ImageNet 1000-type standard tags, in order to synthesize the preset classification network and the predicted class tags of the classification network after migration learning training, the class tags to be migrated in the actual service scene need to be docked with the ImageNet 1000-type standard tags to realize unification of the class tags, when docking, a mapping hierarchical relationship may be first established using a syncet (synonym set) of the ImageNet 1000-type standard tags to overcome an inclusion relationship between the class tags, for example, the class tags such as "glass box" and "takeout box" may be induced into a total class tag of "box", then the class tags to be migrated in the actual service scene are collated, for example, the class tags to be migrated in the collated actual service scene include "paper box", and then, an inclusion relationship between the collated class data set and the mapping hierarchical relationship is determined according to the accuracy requirement of image class prediction, and an image class relationship is constructed For example, for an actual service, the category labels of the "cartons" included in the sorted category data set and the total category labels of the "boxes" included in the mapping hierarchy belong to the same category, that is, their inclusion relationship is that "boxes" include "cartons", and then a mapping relationship between "cartons" and "boxes" can be constructed.

For another example, when the category label in the actual business scene is included in the standard labels of the ImageNet1000, for example, the category label in the actual business scene is "duck", and the category label is included in the category labels of the "birds" series in the standard labels of the ImageNet1000, that is, the category labels of the "birds" series include "duck", the category labels can be unified as the category labels of the "birds", and the category labels can be entered into the sync level mapping.

(2) Training and prediction of classification networks

(2.1) training of classification networks

As shown in fig. 7, when performing migration learning training on a preset classification network, a certain number of actual service images of known classes need to be collected as sample images to avoid the situation that the migration learning effect is poor due to only a few sample images in some classes, the sample images are labeled with real class labels, and then, the migration learning is performed based on the sample images.

The number of the sample images of each category is substantially balanced, and the specific number can be set according to the precision requirement of the actual service, for example, the number of the sample images of each category can be set to 200, and for example, the number of the sample images of each category can be set to 198.

The preset classification network may include a convolution feature extraction layer and a full connection layer, where the convolution feature extraction layer may be used to extract feature information in an image, and the full connection layer may be used to classify the image based on the feature information.

For example, after training (namely forward training) of the convolution feature extraction layer and the full connection layer, the modified Loss Function is adopted to calculate the class Loss between the prediction class and the real class, and then back propagation is carried out on the basis of the class Loss to train the preset classification network. For another example, a part of the convolutional layers may be frozen, a part of the convolutional layers and all the fully-connected layers may be trained, then the Loss may be calculated using a modified Loss Function (Loss Function), and then the predetermined classification network may be trained by performing back propagation based on the Loss.

For example, in some embodiments, in order to improve the accuracy of the prediction category and make the prediction category more conform to the actual service requirement, the loss function may be modified according to the classification accuracy of the service requirement, as shown in fig. 8, which specifically includes:

classifying the regions to be classified through a preset classification network after forward training to obtain a prediction classification set, wherein the prediction classification set comprises a plurality of prediction classification labels;

performing normalization processing (Softmax) on the plurality of prediction category labels to obtain a prediction score corresponding to each prediction category label;

when the prediction category label corresponding to the N bits before the prediction score does not contain a correct label (group Truth) category label, giving a loss weight so as to improve the prediction probability of the correct label category label in backward propagation (backward); otherwise, normal weight is given.

For example, when performing the migration learning training, the convolution feature extraction layer may be frozen, the full connection layer may be trained, and when the first N bits of the class label prediction result obtained by the classification network after training do not include a correct label (groudtruth) class label, in order to improve the accuracy of the first N bits of the prediction class, a loss weight may be given, for example, N times of the loss weight may be given; otherwise, a normal weight is given to ensure the accuracy of the trained classification network, for example, a double weight may be given, where N is a natural number.

The method for modifying the loss function relaxes the training standard, can effectively improve the accuracy of the first N prediction class labels, and compared with the mode that other classification networks only obtain a single prediction class label, the method for modifying the loss function has the advantages that the accuracy of the classification network obtained by the modified loss function training is higher, and the method can better meet the actual business requirements.

According to the method for training the preset classification network by adopting the transfer learning mode to obtain the trained classification network, the classification labels in the actual service are arranged and integrated with the classification label set used when the untrained classification network is built, so that the generalization capability of the trained classification network can be effectively improved, then the loss function is also transformed and the transformed loss function is adopted to train the classification network, so that the accuracy of the prediction classes can be effectively improved by the transformed loss function under the same training data scale, and the accuracy of the prediction classes obtained by the trained classification network is further effectively improved.

(2.2) prediction of trained Classification network

After the to-be-classified area of the image is obtained through the segmentation module, the trained classification network can be adopted to classify the to-be-classified area, and a prediction classification set of the to-be-classified area is obtained.

In some embodiments, the prediction category set may include a plurality of prediction category labels, and the plurality of prediction category labels may be arranged according to a preset rule, for example, may be arranged according to a prediction probability corresponding to the prediction category label; for another example, the occurrence frequency of the prediction categories of the actual service scene may be counted according to a preset algorithm, and then the prediction categories may be arranged according to the statistical result. The specific arrangement rule can be set according to the requirements of practical application.

For example, when the region to be classified of the image is a "car", and is in an actual business scene, the region to be classified is classified by using a classification network after transfer learning training, so that a prediction class set can be obtained, and the prediction class set can include prediction class labels, namely "car", "bicycle" and "motorcycle", respectively.

(3) Prediction of untrained classification networks

After the region to be classified of the image is obtained through the segmentation module, the region to be classified can be classified by adopting an unclassified network, and the reference class of the region to be classified is obtained.

The untrained classification Network may have various expressions, for example, the classification Network may be a Convolutional Neural Network (CNN), for example, the classification Network may also be a Residual Network (ResNet), and for example, the classification Network may also be an inclusion-V3 (a Convolutional Neural Network structure).

For example, the region to be classified of the image is a "car", and when the image is in an actual business scene, the region to be classified is classified by using an untrained classification network, so that a reference class label "car" can be obtained.

(4) Comprehensive results

In the above steps, the classes are unified through step (1), and a mapping relationship between a class data set (i.e. a preset reference class data set) used when the classification of the classification network is not trained and a class data set (i.e. a preset prediction class data set) used when the classification network is trained is constructed, so as to obtain an image class mapping set; after the prediction category set and the reference category of the region to be classified are obtained in steps (2) and (3), in order to improve accuracy of image category prediction, the reference category set and the prediction category set need to be synthesized, and the prediction category set is modified based on the reference category, so as to obtain a more accurate classification result, which may specifically include:

acquiring an image category mapping set, and acquiring a target prediction category corresponding to a reference category based on the image category mapping set, wherein the image category mapping set comprises a mapping relation between a preset reference category of an untrained classification network and a preset prediction category of a trained classification network;

determining a target arrangement position of the target prediction category in the prediction category set;

adjusting the arrangement sequence of the prediction categories based on the target arrangement position to obtain an adjusted prediction category set; and obtaining a category prediction result of the area to be classified according to the adjusted prediction category set.

For example, in some embodiments, a corresponding preset position may be determined according to the target arrangement position, and then the target prediction category is adjusted to the preset position, which may specifically include:

determining an adjusting mode and a preset position corresponding to the adjusting mode based on the target arrangement position; and adjusting the target prediction category to the preset position by adopting the adjusting mode to obtain an adjusted prediction category set.

For example, in some embodiments, the adjustment manner may be determined according to an order of first N-bit prediction categories that need to be considered in an actual demand, for example, when the actual business demand only needs to consider the order of the first three prediction categories in the prediction category set, the adjustment manner and the preset position corresponding to the adjustment manner may be determined according to whether the target arrangement position is the first three bits or not. The setting can be specifically carried out according to actual requirements.

For example, the adjustment method may be to move the target prediction category to a predetermined position, for example, when the predetermined position is a first position and the target arrangement position is a third position, the target prediction category may be adjusted to the first position, and the rest of the prediction categories are arranged again while keeping the original relative positions; for another example, the adjustment method may be to replace the target prediction category with the prediction category at the preset position, for example, when the preset position is the first position and the target arrangement position is the fourth position, the target prediction category and the prediction category at the first position are replaced, and the other prediction categories are kept unchanged from the original relative positions. The specific adjustment mode and the predicted position can be set according to actual requirements.

For example, the region to be classified of the image is a "car", and when the classification network after the transfer learning training is used for classification, a prediction class set is obtained, where the prediction class set includes a plurality of prediction class labels, which are: "bicycle", "car", "motorcycle", "scooter", "tricycle", … …, wherein the prediction class labels are arranged according to their respective prediction probabilities; when classification is carried out through an untrained classification network, a reference class label is obtained as a 'small car', because the 'small car' and the 'car' have a mapping relation in an image class mapping set constructed according to actual business requirements, namely the 'small car' belongs to the 'car' class, the 'small car' class label can be determined to be the 'car' class label corresponding to the 'small car' class label, and because the 'car' is arranged at the second position in a prediction class set, the target arrangement position can be obtained as the second position;

then, an adjustment mode and a preset position corresponding to the adjustment mode are determined according to the target arrangement position, the adjustment mode can be to adjust the target prediction type to the preset position, the rest prediction types keep relative positions to be arranged, and the preset position can be the first position; the method comprises the steps that the arrangement sequence of the prediction types of the first three positions in a prediction type set is only needed to be considered according to actual business requirements, an adjustment mode is determined according to whether a target arrangement position is the first three positions, the target arrangement position is the second position located in the first three positions, an automobile type label can be adjusted to the first position, the rest positions are kept unchanged, and finally the adjusted prediction types are 'automobile', 'bicycle', 'motorcycle', 'scooter', 'tricycle', '… …';

then the prediction results of the regions to be classified of the image are "car", "bicycle", "motorcycle", "scooter", "tricycle", … … according to the adjusted prediction classification set.

In some embodiments, when the target arrangement position of the target prediction category cannot be determined in the prediction category set, the prediction category set may be used as the category prediction result in order to ensure the accuracy of the prediction, for example, when the prediction category set is "banana", "apple", "grape", "hami melon", the reference category is "automobile", the target prediction category corresponding to "automobile" cannot be determined in the prediction category set, and the target arrangement position of the target prediction category is used, the prediction category set may be "banana", "apple", "grape" or "hami melon" in order to ensure the accuracy of the prediction.

In some embodiments, for convenience of use of the user, when the classification result is obtained, the classification result may be displayed on the visual image annotation tool, where the display manner of the classification result may be various, for example, in some embodiments, in order to protect the use habit of the user, the classification results corresponding to the three first bits of the prediction probability may be sorted in order and displayed in the list of the visual image annotation tool, and then the classification results corresponding to the three first bits of the prediction probability may be displayed in the remaining lists in a fixed relative order.

The method can be obtained, the untrained classification network is trained on the basis of a large number of existing data sets, and the classification effect of the images of known classes is better than that of the classification network after training; however, for the trained classification network, the classification effect of the image of the unknown class is better than that of the untrained classification network, and the classification effect of the untrained classification network and the classification effect of the trained classification network can be effectively combined by adjusting the arrangement sequence of the prediction classes according to the target arrangement position corresponding to the reference class, so that the generalization capability of the classification network is enhanced, and the classification accuracy is further improved.

In order to better implement the image category prediction method provided by the embodiments of the present application, in some embodiments, an image category prediction apparatus is also provided, and the image category prediction apparatus may be applied to a server. The meaning of the noun is the same as that in the image category prediction method, and the details of the implementation can be referred to the description in the method embodiment.

In some embodiments, an image category prediction apparatus is further provided, which may be specifically integrated in a server, as shown in fig. 9, and may include an obtaining unit 301, a first classifying unit 302, a second classifying unit 303, a determining unit 304, an adjusting unit 305, and a result obtaining unit 306, specifically as follows:

an acquiring unit 301, configured to acquire an image to be classified;

a first classification unit 302, configured to classify the image to be classified through a first classification network to obtain a reference class of the image to be classified;

a second classification unit 303, configured to classify the image to be classified through a second classification network to obtain a prediction category set of the image to be classified, where the prediction category set includes multiple prediction categories, and the multiple prediction categories are arranged according to prediction probabilities corresponding to the prediction categories;

a determining unit 304, configured to determine a target arrangement position of the reference category in the prediction category set;

an adjusting unit 305, configured to adjust the arrangement order of the prediction categories based on the target arrangement position, so as to obtain an adjusted prediction category set;

a result obtaining unit 306, configured to obtain a class prediction result of the image to be classified based on the adjusted prediction class set.

Optionally, in some embodiments, as shown in fig. 10, the determining unit 304 includes a set acquiring subunit 3041, a category acquiring subunit 3042, a position determining subunit 3043, a determining subunit 3044;

the set obtaining subunit 3041, configured to obtain an image category mapping set, where the image category mapping set includes a mapping relationship between a preset reference category of a first classification network and a preset prediction category of a second classification network;

the category obtaining subunit 3042, configured to obtain, based on the image category mapping set, a target prediction category corresponding to the reference category;

the position determining subunit 3043, configured to determine an arrangement position of the target prediction category in the prediction category set;

the determining subunit 3044 is configured to determine a target arrangement position of the reference category in the prediction category set according to the arrangement position.

Optionally, in some embodiments, as shown in fig. 11, the image classification apparatus further includes a construction unit 307;

the constructing unit 307 is specifically configured to obtain a preset reference category set of the first classification network, where the preset reference category set includes a plurality of preset reference categories; acquiring a preset prediction category set of a second classification network, wherein the preset prediction category set comprises a plurality of preset prediction categories; and establishing a mapping relation between the preset reference category and the preset prediction category to obtain an image category mapping set.

Optionally, in some embodiments, as shown in fig. 12, the adjusting unit 305 includes a mode determining subunit 3051, an adjusting subunit 3052;

the mode determination subunit 3051 is configured to determine, according to the target arrangement position, a sequential adjustment mode and a preset position corresponding to the sequential adjustment mode;

the adjusting subunit 3052 is configured to, in the prediction category set, adjust the target prediction category to the preset position in the sequential adjustment manner, so as to obtain an adjusted prediction category set.

Optionally, in some embodiments, as shown in fig. 13, the second classification unit 303 includes a feature extraction sub-unit 3031, an analysis sub-unit 3032;

the feature extraction subunit 3031 is configured to perform feature extraction on the image to be classified through the second classification network to obtain local feature information of the image to be classified;

the analyzing subunit 3032 is configured to classify the image to be classified through the second classification network based on the local feature information, so as to obtain a prediction class set of the image to be classified.

Optionally, in some embodiments, as shown in fig. 14, the image classification apparatus further includes a network training unit 308;

the network training unit 308 is configured to collect a plurality of image samples labeled with real categories; and training a preset classification network based on the real classification to obtain the second classification network.

Optionally, in some embodiments, as shown in fig. 15, the network training unit 308 includes a sample feature extraction sub-unit 3081, a sample analysis sub-unit 3082, a loss calculation sub-unit 3083, and a training sub-unit 3084;

the sample feature extraction subunit 3081 is configured to perform feature extraction on the image sample to obtain local feature information of the image sample;

the sample analysis subunit 3082 is configured to classify the image sample based on the local feature information of the image sample to obtain a prediction category set of the image sample, where the prediction category set of the image sample includes a plurality of sample prediction categories;

the loss calculating subunit 3083, configured to calculate a class loss between each sample prediction class and the true class, to obtain a plurality of class losses;

the training subunit 3084 is configured to train the preset classification network based on the plurality of class losses, so as to obtain a second classification network.

Optionally, in some embodiments, as shown in fig. 16, the loss calculation subunit 3083 includes an adjustment module 30831, a calculation module 30832;

the adjusting module 30831, configured to adjust a loss weight corresponding to each sample prediction category when the true category exists in the plurality of sample prediction categories;

the calculating module 30832 is configured to calculate a class loss between each sample prediction class and the real class based on the loss weight corresponding to the sample prediction class.

Optionally, in some embodiments, the multiple sample prediction categories are sorted according to the prediction probabilities corresponding to the sample prediction categories, and the adjusting module 30831 is specifically configured to:

Optionally, in some embodiments, the training subunit 3084 is specifically configured to:

Optionally, in some embodiments, as shown in fig. 17, the image type prediction apparatus further includes a block chain storage unit 309;

the block chain storage unit 309 is configured to store the class prediction result of the image to be classified into a block chain.

As can be seen from the above, the image category prediction apparatus according to the embodiment of the present application may acquire an image to be classified by the acquisition unit 301; then, the first classification unit 302 classifies the image to be classified through a first classification network to obtain a reference class of the image to be classified; then, the second classification unit 303 classifies the image to be classified through a second classification network to obtain a prediction category set of the image to be classified, where the prediction category set includes multiple prediction categories, and the multiple prediction categories are arranged according to prediction probabilities corresponding to the prediction categories; the determining unit 304 determines a target arrangement position of the reference category in the prediction category set; the adjusting unit 305 adjusts the arrangement order of the prediction categories based on the target arrangement position, and obtains an adjusted prediction category set; the result obtaining unit 306 obtains a class prediction result of the image to be classified based on the adjusted prediction class set.

In addition, an embodiment of the present application further provides a computer device, as shown in fig. 18, which shows a schematic structural diagram of the computer device according to the embodiment of the present application, and specifically:

the computer device may include components such as a processor 401 of one or more processing cores, memory 402 of one or more computer-readable storage media, a power supply 403, and an input unit 404. Those skilled in the art will appreciate that the computer device configurations illustrated in the figures are not meant to be limiting of computer devices and may include more or fewer components than those illustrated, or some components may be combined, or a different arrangement of components. Wherein:

the processor 401 is a control center of the computer device, connects various parts of the entire computer device using various interfaces and lines, and performs various functions of the computer device and processes data by running or executing software programs and/or modules stored in the memory 402 and calling data stored in the memory 402, thereby monitoring the computer device as a whole. Optionally, processor 401 may include one or more processing cores; preferably, the processor 401 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 401.

The memory 402 may be used to store software programs and modules, and the processor 401 executes various functional applications and data processing by operating the software programs and modules stored in the memory 402. The memory 402 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to use of the computer device, and the like. Further, the memory 402 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 402 may also include a memory controller to provide the processor 401 access to the memory 402.

The computer device further comprises a power supply 403 for supplying power to the various components, and preferably, the power supply 403 is logically connected to the processor 401 via a power management system, so that functions of managing charging, discharging, and power consumption are implemented via the power management system. The power supply 403 may also include any component of one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and the like.

The computer device may also include an input unit 404, the input unit 404 being operable to receive input numeric or character information and to generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.

Although not shown, the computer device may further include a display unit and the like, which are not described in detail herein. Specifically, in this embodiment, the processor 401 in the computer device loads an executable file corresponding to a process of one or more application programs into the memory 402 according to the following instructions, and the processor 401 runs the application programs stored in the memory 402, thereby implementing various functions as follows:

acquiring an image to be classified; classifying the images to be classified through a first classification network to obtain reference classes of the images to be classified; classifying the images to be classified through a second classification network to obtain a prediction category set of the images to be classified, wherein the prediction category set comprises a plurality of prediction categories which are arranged according to prediction probabilities corresponding to the prediction categories; determining a target arrangement position of the reference category in the prediction category set; adjusting the arrangement sequence of the prediction categories based on the target arrangement position to obtain an adjusted prediction category set; and acquiring a category prediction result of the image to be classified based on the adjusted prediction category set.

The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.

It will be understood by those skilled in the art that all or part of the steps of the methods of the above embodiments may be performed by instructions or by associated hardware controlled by the instructions, which may be stored in a computer readable storage medium and loaded and executed by a processor.

To this end, the present application further provides a computer-readable storage medium, in which a plurality of instructions are stored, where the instructions can be loaded by a processor to execute the steps in any one of the image category prediction methods provided in the present application. For example, the instructions may perform the steps of:

Wherein the storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.

Since the instructions stored in the storage medium can execute the steps in any image category prediction method provided in the embodiments of the present application, the beneficial effects that can be achieved by any image category prediction method provided in the embodiments of the present application can be achieved, which are detailed in the foregoing embodiments and will not be described again here.

The foregoing detailed description is directed to an image category prediction method, an apparatus, a computer device, and a computer-readable storage medium, which are provided by embodiments of the present application, and specific examples are applied herein to explain the principles and implementations of the present application, and the descriptions of the foregoing embodiments are only used to help understand the method and the core idea of the present application; meanwhile, for those skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. An image class prediction method, comprising:

acquiring an image to be classified;

2. The image class prediction method according to claim 1, wherein determining the target arrangement position of the reference class in the prediction class set comprises:

acquiring a target prediction category corresponding to the reference category based on the image category mapping set;

3. The image category prediction method according to claim 2, further comprising:

4. The image category prediction method according to claim 1, wherein adjusting an arrangement order of the prediction categories based on the target arrangement position to obtain an adjusted prediction category set comprises:

determining a sequence adjustment mode and a preset position corresponding to the sequence adjustment mode according to the target arrangement position;

5. The image class prediction method according to claim 1, wherein classifying the image to be classified through a second classification network to obtain a prediction class set of the image to be classified comprises:

6. The image class prediction method according to claim 1, wherein before classifying the image to be classified by the second classification network, the method further comprises:

collecting a plurality of image samples marked with real categories;

7. The image class prediction method according to claim 6, wherein training a preset classification network based on the real class to obtain the second classification network comprises:

8. The image class prediction method of claim 7, wherein calculating a class loss between each sample prediction class and the true class comprises:

9. The image class prediction method of claim 8, wherein the plurality of sample prediction classes are ordered according to their corresponding prediction probabilities;

when the true class exists in the plurality of sample prediction classes, adjusting the loss weight corresponding to each sample prediction class, including:

10. The method according to claim 7, wherein training the predetermined classification network based on the plurality of class losses to obtain a second classification network comprises:

11. The image class prediction method according to claim 1, further comprising, after obtaining the class prediction result of the image to be classified based on the adjusted prediction class set:

12. An image category prediction device comprising:

an acquisition unit for acquiring an image to be classified;