CN113627455A - Image category determination method and device - Google Patents

Image category determination method and device Download PDF

Info

Publication number
CN113627455A
CN113627455A CN202010386765.9A CN202010386765A CN113627455A CN 113627455 A CN113627455 A CN 113627455A CN 202010386765 A CN202010386765 A CN 202010386765A CN 113627455 A CN113627455 A CN 113627455A
Authority
CN
China
Prior art keywords
category
image
class
vector
vectors
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010386765.9A
Other languages
Chinese (zh)
Inventor
赵康
徐盈辉
潘攀
张迎亚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN202010386765.9A priority Critical patent/CN113627455A/en
Publication of CN113627455A publication Critical patent/CN113627455A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24147Distances to closest patterns, e.g. nearest neighbour classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the specification provides an image type determining method and device, wherein the image type determining method comprises the steps of receiving an image sample set, wherein the image sample set comprises image samples and image labels corresponding to the image samples; determining an initial image category set corresponding to the image sample based on the image label, and constructing a category chart corresponding to the image sample based on the initial image category set; and determining the target image category corresponding to the image sample according to the category chart.

Description

Image category determination method and device
Technical Field
The embodiment of the specification relates to the technical field of machine learning, in particular to an image class determination method. One or more embodiments of the present specification also relate to a classification model training method, an image class determination apparatus, a classification model training apparatus, a computing device, and a computer-readable storage medium.
Background
In recent years, a wave of breakthrough has emerged in the field of artificial intelligence due to advances in deep learning and explosive growth of data sets. With this trend, large-scale classification involving outlier classes becomes an important task.
Large scale classification, however, presents many new challenges, of which the computational difficulties in training may be most prominent. In particular, current classifiers typically employ a deep network architecture that typically includes a series of translation layers for feature extraction and a softmax layer that connects feature representations with each type of response. The parameter size of this softmax layer is proportional to the number of classes. In training such a classifier, for each small batch of samples, the response of all relevant classes will be calculated by taking the dot product between the class specific weights and the extracted features.
While the above algorithm will face two significant difficulties when there are a large number of classes: first, the parameter size may exceed the memory capacity, especially when the network is trained on a GPU of limited capacity; second, the computational cost will increase substantially, even to a prohibitive level.
Therefore, it is urgently needed to provide a category determination method which can make the softmax layer accelerate the training speed without losing the training precision.
Disclosure of Invention
In view of this, the present specification provides an image category determination method. One or more embodiments of the present disclosure relate to a classification model training method, an image class determination apparatus, a classification model training apparatus, a computing device, and a computer-readable storage medium, which are used to solve technical problems of the related art.
According to a first aspect of embodiments of the present specification, there is provided an image category determination method including:
receiving an image sample set, wherein the image sample set comprises image samples and image labels corresponding to the image samples;
determining an initial image category set corresponding to the image sample based on the image label, and constructing a category chart corresponding to the image sample based on the initial image category set;
and determining the target image category corresponding to the image sample according to the category chart.
According to a second aspect of embodiments of the present specification, there is provided a classification model training method, including:
receiving an image sample set, wherein the image sample set comprises image samples and image labels corresponding to the image samples;
determining an initial image category set corresponding to the image sample based on the image label, and constructing a category chart corresponding to the image sample based on the initial image category set;
determining a target image category corresponding to the image sample according to the category chart;
training a classification model based on the image sample set and the target image category to obtain the classification model.
According to a third aspect of embodiments herein, there is provided an image category determination apparatus including:
a first image sample receiving module configured to receive an image sample set, wherein the image sample set includes image samples and image labels corresponding to the image samples;
a first graph construction module configured to determine an initial image category set corresponding to the image sample based on the image label, and construct a category graph corresponding to the image sample based on the initial image category set;
the first category determination module is configured to determine a target image category corresponding to the image sample according to the category chart.
According to a fourth aspect of embodiments herein, there is provided a classification model training apparatus including:
a second image sample receiving module configured to receive an image sample set, wherein the image sample set includes image samples and image labels corresponding to the image samples;
a second graph construction module configured to determine an initial image category set corresponding to the image sample based on the image label, and construct a category graph corresponding to the image sample based on the initial image category set;
the second category determination module is configured to determine a target image category corresponding to the image sample according to the category chart;
a model training module configured to train a classification model based on the image sample set and the target image class to obtain the classification model.
According to a fifth aspect of embodiments herein, there is provided a computing device comprising:
a memory and a processor;
the memory is configured to store computer-executable instructions that, when executed by the processor, perform the steps of the image class determination method or the classification model training method.
According to a sixth aspect of embodiments herein, there is provided a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, implement the steps of the image class determination method or the steps of the classification model training method.
One embodiment of the present specification implements an image category determining method and an image category determining apparatus, where the image category determining method includes receiving an image sample set, where the image sample set includes image samples and image labels corresponding to the image samples; determining an initial image category set corresponding to the image sample based on the image label, and constructing a category chart corresponding to the image sample based on the initial image category set; determining a target image category corresponding to the image sample according to the category chart; the image category determining method realizes the construction of the category chart corresponding to the image sample through the normalized initial image category set, so that the image sample can be effectively determined based on the category chart, the small-batch target image categories adjacent to the image sample can be determined, the distribution of computing resources can be adaptively adjusted through the k nearest neighbor classification algorithm, the training performance of a model in large-scale classification can be greatly improved, and the computing cost can be saved.
Drawings
Fig. 1 is an exemplary diagram of a specific application scenario of a category determination method provided in an embodiment of the present specification;
fig. 1a is an architectural diagram of a machine learning model applied to a specific class determination method according to an embodiment of the present specification;
FIG. 2 is a flowchart illustrating a process of a method for determining a category according to an embodiment of the present disclosure;
fig. 3 is a schematic structural diagram illustrating a method for determining a class according to an embodiment of the present disclosure, in which a ring structure is used to implement transmission between class vectors on different computing nodes and similarity calculation;
FIG. 4 is a flowchart of a process of a classification model training method according to an embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of a category determining apparatus according to an embodiment of the present specification;
FIG. 6 is a schematic structural diagram of a classification model training apparatus according to an embodiment of the present disclosure;
fig. 7 is a block diagram of a computing device according to an embodiment of the present disclosure.
Detailed Description
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present description. This description may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein, as those skilled in the art will be able to make and use the present disclosure without departing from the spirit and scope of the present disclosure.
The terminology used in the description of the one or more embodiments is for the purpose of describing the particular embodiments only and is not intended to be limiting of the description of the one or more embodiments. As used in one or more embodiments of the present specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It will be understood that, although the terms first, second, etc. may be used herein in one or more embodiments to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first can also be referred to as a second and, similarly, a second can also be referred to as a first without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
First, the noun terms to which one or more embodiments of the present specification relate are explained.
kNN: k nearest neighbor classification algorithm, english abbreviation: kNN, english full name: k-nearest neighbor (r),
FC layer: the last layer of the full-connection layer in the deep learning network, which is a general classification task model, is an FC layer, and the output layer number is consistent with the classification number.
softmax: one loss, which is used to solve the classification task, is usually placed behind the last module of the deep learning network, the FC layer.
CUDA: and calculating an inner product program of the vectors.
In the present specification, an image category determination method is provided. One or more embodiments of the present specification relate to a classification model training method, an image class determination apparatus, a classification model training apparatus, a computing device, and a computer-readable storage medium, which are described in detail in the following embodiments one by one.
Referring to fig. 1, fig. 1 illustrates an exemplary view of a specific application scenario of an image category determination method provided in an embodiment of the present specification.
The application scenario of fig. 1 includes a terminal and a server, and specifically, a user sends an image sample set to the server through the terminal, where the image sample set includes an image sample and an image tag; after receiving the image sample and the image label, the server obtains an initial image category set through the image label, then constructs a category chart corresponding to the image sample based on the initial image category set, and finally obtains a target image category corresponding to the image sample according to the category chart.
Referring to fig. 1a, fig. 1a is a schematic diagram of an architecture of a machine learning model to which a category determination method is applied according to one or more embodiments of the present disclosure.
The model architecture in FIG. 1a includes an input layer, a feature extraction layer (i.e., convolutional layer), a fully-connected layer, a softmax layer, and an output layer.
In specific implementation, the input layer receives image samples in the acquired image sample set and image labels corresponding to the image samples, and then inputs the image samples into the feature extraction layer;
the characteristic extraction layer is used for extracting the characteristics of the image sample to obtain the characteristics of the image sample, and sending the characteristics of the image sample and the image label corresponding to the characteristics of the image sample to the full-connection layer;
the full connection layer determines an initial class set corresponding to the image sample features through tags, L2 normalization is carried out on the image sample features and an initial class (W in FIG. 1 a) in the initial class set, a complete k neighbor Graph (i 2i in FIG. 1 a) is constructed based on the result of the initial class normalization, then after classes in the Graph are added or deleted through the adjacent classes corresponding to the image sample features, a target class set (selected W in FIG. 1 a) corresponding to the image sample features is finally obtained, and the image sample features and the corresponding target class set are sent to the softmax layer;
the softmax layer receives image sample characteristics transmitted by the full-connection layer and a corresponding target category set, outputs a plurality of vectors with the same dimension as the target category, performs error measurement on the vectors through a cross entropy loss function, and outputs the vectors; specifically, the value in the vector is the probability value of each image sample belonging to each category of target, and the cross entropy loss function is a measure of how good the probability obtained through the softmax layer is, and the lower the cross entropy is, the better the probability value is proved.
In practical application, a machine learning model of a specific application of the category determination method is borne by a deep learning software system based on a GPU, and the system comprises at least one GPU.
As can be seen from fig. 1a, the class determination method provided in the embodiment of the present specification is mainly applied to a full connection layer of a machine learning model to achieve acquisition of a target class corresponding to training data, and a specific implementation manner of the class determination method may be referred to in the following embodiments.
Referring to fig. 2, fig. 2 is a flowchart illustrating a category determination method according to an embodiment of the present disclosure, including the following steps:
step 202: a sample set of images is received.
The image sample set comprises image samples and image labels corresponding to the image samples.
In practice, the image sample includes, but is not limited to, pictures, videos, and the like.
In particular, in order to reduce the data processing amount and increase the processing speed of the processor, the image sample sets are transmitted to a plurality of GPUs for parallel processing, and the processing steps of the image samples distributed to each GPU are the same.
Specifically, the image sample set includes a plurality of image samples and an image label corresponding to each image sample, where the label is an object to be predicted, such as a future price of a commodity displayed in the image, an animal species displayed in the image, and the like, and in a case where the image sample includes an animal, the corresponding image label can be understood as an animal species displayed in the image, and the like.
For example, in the case where the image sample is an image, receiving the image sample set may be understood as: an image sample set is received that includes a plurality of image samples and an image label corresponding to each image sample.
Step 204: an initial image category set corresponding to the image sample is determined based on the image label, and a category chart corresponding to the image sample is constructed based on the initial image category set.
Specifically, after receiving an image sample set, determining an initial image category set corresponding to each image sample based on an image label corresponding to the image sample; for example, if the image sample corresponds to an image label of tiger, wild goose, peach blossom, etc., then based on the label, it may be determined that the initial image category corresponding to the image sample includes a first category: tigers, class ii: wild geese, the third category: peach blossom, etc., which form an initial set of image categories to which the image sample corresponds.
In specific implementation, the constructing the category chart corresponding to the image sample based on the initial image category set includes:
preprocessing the initial image category in the initial image category set to obtain an image category vector set of the initial image category;
and constructing a category chart corresponding to the image sample according to the image category vector set.
Specifically, preprocessing the initial image categories in the initial image category set may be understood as performing L2 normalization on the initial image categories in the initial image category set to obtain a category vector of each initial image category, where the category vectors of all the initial image categories form an image category vector set.
The L2 normalization of the initial image classes is to perform vectorization on each initial image class according to the features of each initial image class to obtain a class vector of the same dimension of each initial image class in the image class vector set, the class vector obtained after the L2 normalization can enable the machine learning model to quickly converge without reducing the precision during training, and the euclidean distance between every two initial image classes can be calculated after the L2 norm normalization.
In specific implementation, the constructing a category chart corresponding to the image sample according to the image category vector set includes:
determining k neighbor class vectors of each class vector in the image class vector set in a linear search mode, wherein k is a positive integer;
and constructing a category chart corresponding to the image sample based on k adjacent category vectors of each category vector in the image category vector set.
The linear search is to calculate each class vector and other class vectors in the image class vector set, for example, 1000 class vectors in the image class vector set, and calculate each class vector and other 999 class vectors in sequence to determine respective k neighboring class vectors.
Specifically, k may be set according to practical applications, and is not limited herein, for example, 10, 12, or 15 may be set.
In practical application, k neighboring category vectors of each category vector in the image category vector set are determined in a linear search mode, that is, k neighboring category vectors of each category vector in the image category vector set can be calculated in a linear search mode, and then a category chart corresponding to an image sample is constructed based on the k neighboring category vectors of each category vector in the image category vector set; the k neighbor class vectors of each class vector are searched by adopting a linear search mode, so that the loss of the neighbor class vectors of the class vectors can be avoided, and the accuracy of the neighbor class vectors of each searched class vector is ensured.
Since constructing the class diagram corresponding to the image sample is a time-consuming process, the embodiment of the present specification implements composition by several iterations, does not occupy additional GPU computing resources, and implements the construction of the class diagram by multiplexing the GPU during training, in which the specific implementation manner is as follows:
the determining k neighboring class vectors of each class vector in the image class vector set by means of a linear search comprises:
distributing the category vectors in the image category vector set to different computing nodes based on a preset distribution mode;
and sequentially calculating the similarity between each category vector distributed on each computing node and the category vectors distributed on other computing nodes to obtain k adjacent category vectors of each category vector in the image category vector set.
For convenience of understanding, in the embodiments of the present specification, the preset allocation manner is described by taking average allocation as an example.
In specific implementation, firstly, distributing the category vectors in the image category vector set to different computing nodes in an average distribution mode; then, sequentially calculating the similarity between the category vector distributed on each computing node and the category vectors distributed on other computing nodes, and then determining k adjacent category vectors corresponding to each category vector in the image category vector set based on the similarity; different computational nodes can be understood as different GPUs, namely the GPUs capable of processing the distributed image samples, at the moment, the training process of the machine learning model on the GPU is suspended, so that the GPU during the training is multiplexed to realize the construction of the class diagram, and the occupation of extra GPU computational resources is avoided.
In practical application, because the class vectors are stored in different computing nodes in a distributed manner, the class vectors on different computing nodes can be transmitted by adopting a ring structure, and the phenomenon that the display memory is exploded due to the fact that all the class vectors are transmitted to one computing node can be avoided, and the specific implementation mode is as follows:
the sequentially calculating the similarity between each category vector distributed on each computing node and the category vectors distributed on other computing nodes comprises:
calculating Euclidean distances among all first class vectors distributed on the ith calculation node, wherein i is a positive integer;
sending the first category vectors distributed on the ith computing node to the (i + 1) th computing node, and calculating Euclidean distances between each second category vector distributed on the (i + 1) th computing node and each first category vector;
receiving third category vectors distributed on the i-1 th computing node, and computing Euclidean distances between each first category vector distributed on the i-th computing node and each third category vector;
sending the third category vectors distributed on the (i-1) th computing node to the (i + 1) th computing node, and calculating Euclidean distances between each second category vector distributed on the (i + 1) th computing node and each third category vector;
until the Euclidean distance between the category vector distributed on each computing node and the category vectors distributed on other computing nodes is calculated, and the similarity between each category vector distributed on each computing node and the category vectors distributed on other computing nodes is determined according to the Euclidean distance.
Specifically, taking i as one of all the computing nodes, taking one computing node as an example, the detailed description is given to the computation of k neighboring category vectors of the category vectors allocated to each computing node.
First, the ith computing node calculates the first class vectors allocated to itself, and calculates the euclidean distance between each class vector and other class vectors, for example, the first class vectors allocated to the ith computing node include a1, a2, and a3, and then calculates the euclidean distance between each first class vector allocated to the ith computing node, that is, calculates the euclidean distances between a1 and a2 and a3, between a2 and a1 and a3, and between a3 and a1 and a2, respectively.
Then sending the first category vectors distributed on the ith computing node to the (i + 1) th computing node, and computing Euclidean distances between each second category vector distributed on the (i + 1) th computing node and each first category vector, wherein the (i + 1) th computing node is a next computing node of the ith computing node; still taking the above as an example, if the second category vectors allocated on the i +1 th computing node include b1, b2, and b3, at this time, the first category vectors a1, a2, and a3 are sent to the i +1 th computing node, and euclidean distances between the first category vectors a1, b2, and b3 are calculated, that is, euclidean distances between a1 and a2, a3, b1, b2, and b3 are calculated, and between a2 and a1, a3, b1, b2, and b3 are calculated until the euclidean distances between each category vector of the i +1 th computing node and other category vectors are obtained;
meanwhile, when the first category vector distributed on the ith computing node is sent to the (i + 1) th computing node, the ith computing node receives the third category vectors distributed on the (i-1) th computing node, and then calculates the Euclidean distance between each first category vector distributed on the ith computing node and each third category vector; wherein, the ith-1 computation node is the last computation node of the ith computation node; still taking the above example as an example, if the third category vectors allocated on the i-1 st computing node include c1, c2, and c3, at this time, the third category vectors c1, c2, and c3 are sent to the i-th computing node, and the euclidean distances between the first category vectors a1, a2, and a3 are calculated, that is, the euclidean distances between a1 and a2, a3, c1, c2, and c3 are calculated, and the euclidean distances between a2 and a1, a3, c1, c2, and c3 are calculated until the euclidean distances between each category vector in the i-th computing node and other category vectors are obtained by calculation;
finally, the ith computing node sends the received third category vectors distributed on the (i-1) th computing node to the (i + 1) th computing node, and calculates Euclidean distances between each second category vector distributed on the (i + 1) th computing node and each third category vector; taking the above as an example, the third category vectors c1, c2, c3 are sent to the (i + 1) th computing node, and the euclidean distances between the third category vectors c 3578, c2, c3 and the second category vectors b1, b2, b3 are calculated;
until the Euclidean distance between the category vector distributed on each computing node and the category vectors distributed on other computing nodes is calculated, and the similarity between each category vector distributed on each computing node and the category vectors distributed on other computing nodes is determined according to the Euclidean distance; that is, the smaller the Euclidean distance is, the higher the similarity is; in this case, the euclidean distances between each class vector and the other class vectors may be sorted in descending order, and then the top k class vectors are taken as the neighbor class vectors for each class vector.
For clearer transfer between class vectors on different computing nodes and computation of similarity achieved by using a ring structure, refer to fig. 3, where fig. 3 is a schematic structural diagram illustrating transfer between class vectors on different computing nodes and computation of similarity achieved by using a ring structure in a class determination method provided in one or more embodiments of the present disclosure.
In fig. 3, each id corresponds to one compute node, i.e. one GPU, and each GPU has different category vectors stored thereon.
For example, a category vector stored in a GPU with id being 1 is a, a category vector stored in a GPU with id being 2 is b, a category vector stored in a GPU with id being 3 is c, a category vector stored in a GPU with id being 0 is d, and a category vector stored in a GPU with id being T-1 is f, transfer between category vectors on different computing nodes and similarity calculation are introduced.
First iterative computation: each GPU calculates the Euclidean distance between the category vectors stored in the GPU; and (3) second iteration: transmitting a category vector a stored on a GPU with id being 1 to a GPU with id being 2, transmitting a category vector b stored on a GPU with id being 2 to a GPU with id being 3, transmitting a category vector d stored on a GPU with id being 0 to a GPU with id being 1, and then performing respective Euclidean distance calculation on the transmitted category vector and the category vector stored by each GPU, wherein the category vector stored by each GPU is always existed when each GPU performs category vector transmission; in the third iteration, the category vector d received on the GPU with id being 1 is transferred to the GPU with id being 2, the category vector a received on the GPU with id being 2 is transferred to the GPU with id being 3.
It is simply understood that when the class vector transfer on different GPUs is realized by adopting an annular structure, each computing node copies the class vector of itself to form a same class vector, and transfers the copied class vector through other computing nodes until the copied class vector is received, the whole iteration process can be determined to be completed, and the transfer is finished.
In order to further accelerate the efficiency of composition, when k neighboring category vectors of each category vector in the category vector set are determined through a ring structure for calculation, the category vectors are subjected to semi-precision conversion, so that the euclidean distance between the category vectors is calculated by using tensorcore acceleration matrix multiplication, and the specific implementation manner is as follows:
before sequentially calculating the similarity between each category vector distributed on each computing node and the category vectors distributed on other computing nodes, the method further includes:
and performing first precision conversion on the category vectors distributed on each computing node.
In practical applications, the standard precision of the machine learning model training is float32, and in order to increase the calculation speed, the precision of the class vector allocated to each calculation node can be converted from the standard precision to a half precision, namely float 16.
In order to amplify the recalled neighboring category vector of each category vector after performing half-precision conversion on the category vector without losing precision, and then perform full-precision calculation on the recalled neighboring category vector, a specific implementation manner is as follows:
the sequentially calculating the similarity between each category vector distributed on each computing node and the category vectors distributed on other computing nodes to obtain k neighboring category vectors of each category vector in the image category vector set includes:
based on the converted first precision, sequentially calculating first similarity between each category vector distributed on each computing node and category vectors distributed on other computing nodes;
obtaining k × n neighboring category vectors of each category vector in the image category vector set, wherein n is a positive number;
converting each category vector and the corresponding k × n neighboring category vectors into a second precision, and calculating a second similarity between each category vector and the corresponding k × n neighboring category vectors based on the converted second precision;
determining k neighboring class vectors for each class vector in the set of image class vectors based on the second similarity.
Specifically, the above-mentioned ring structure pair is used to calculate a first similarity between each half-precision category vector allocated on each computing node and half-precision category vectors allocated on other computing nodes, and then k × n neighboring category vectors of each half-precision category vector are obtained based on the first similarity, where n is set according to practical applications, for example, n may be 1.1, that is, k × 1.1 neighboring category vectors of each half-precision category vector are obtained according to the first similarity, and in this way, the recalled neighboring category vectors of each half-precision category vector can be amplified.
Then converting each half-precision category vector and k × n neighbor category vectors of each half-precision category vector into full precision, namely float32, calculating a second similarity between each category vector of float32 and the corresponding k × n neighbor category vectors, sorting the k × n neighbor category vectors in a descending order based on the second similarity, and taking the k neighbor category vectors at the top of the sorting of each category vector as a final neighbor category vector.
In this embodiment of the present description, in order to calculate the similarity of each class vector in an image class vector set, and when determining k neighbor class vectors of each class vector, violent linear search is used to ensure the accuracy of the k neighbor class vectors of each obtained class vector, and a ring structure is used to transmit the class vectors on different GPUs, and the k neighbor class vectors of each class vector are obtained based on the efficient GPU link, so that the construction of a class diagram can be implemented in a reasonable time range, and in order to not occupy additional GPU computing resources, the class diagram can be constructed by multiplexing the GPUs during training (at this time, the training is suspended); in addition, since the Euclidean distance and the inner product between the category vectors are equivalent after the category vectors are subjected to L2 normalization, and the calculation of the inner product on the CUDA is a matrix multiplication, the Euclidean distance can be simplified into a multiplication op, and the Euclidean distance between each category vector and other category vectors is calculated in this way; in order to further accelerate the composition efficiency, each category vector is subjected to semi-precision transformation during specific calculation, so that Euclidean distances between each category vector and other category vectors are calculated by utilizing tensircore acceleration matrix multiplication, when a neighbor category vector recall is carried out on each category vector for not losing precision, recalled k neighbor category vectors are firstly amplified, then standard float32 calculation is carried out on each category vector and the recalled k neighbor category vectors, and the final k neighbor category vectors of each category vector are determined.
In the specific implementation, because training cannot be performed during class chart construction, composition is performed only once after epoch (data of one training set) is executed, for example, 1 hundred million class-scale k neighbor class charts are constructed on a 128-card V100 video memory, which requires about 1.5 h.
In another embodiment of the present specification, after the constructing the class chart corresponding to the image sample based on the k neighboring class vectors of each class vector in the image class vector set, the method further includes:
and screening the category vectors in the category graphs based on the image sample characteristics of the image samples stored in each computing node, and respectively storing the screened category graphs to the corresponding computing nodes.
Specifically, each GPU is allocated with a part of image samples, and then feature extraction is performed on the image samples through a feature extraction layer on each GPU to extract image sample features of the extracted image samples.
Each GPU also has a part of category vectors stored in a distributed manner, so that the category vectors in the category graph can be screened based on the image sample characteristics of the image samples stored in each compute node, and the specific implementation manner is as follows:
the filtering the category vectors in the category graph based on the image sample features stored in each compute node comprises:
determining an image label corresponding to the image sample feature stored in each computing node;
determining a category vector corresponding to the image sample feature based on the label;
and matching the category vector corresponding to the image sample characteristic with the category vector in the category chart, and deleting the category vector without matching relation in the category chart.
Specifically, firstly, an image label corresponding to the image sample feature stored in each GPU is determined, then a category vector corresponding to the label is determined based on the label, then the category vector corresponding to the label is matched with the category vector in the category graph, and the category vector in the category graph without a matching relationship is deleted.
For example, the image labels corresponding to the image sample features stored in the computing node 1 are label 1 and label 2, the category vector corresponding to the label 1 is category vector 1, the category vector corresponding to the label 2 is category vector 2, the category graph has k neighboring category vectors of the category vector 1 and the category vector 1, k neighboring category vectors of the category vector 2 and the category vector 2, and k neighboring category vectors of the category vector 3 and the category vector 3, at this time, after the category vector corresponding to the label is matched with the category vector in the category graph, it is determined that the category vector 3 in the category graph has no matching relationship with both the label 1 and the label 2, at this time, the k neighboring category vectors of the category vector 3 and the category vector 3 are deleted; and stores the class graph with the class vector 3 and the k neighboring class vectors of the class vector 3 deleted on the computation node 1.
It is also determined whether k neighbor class vectors of class vector 1 and k neighbor class vectors of class vector 2 are stored in the computing node 1, and if not, the neighbor class vectors not present in the computing node 1 are deleted.
In practical application, the category diagrams stored on each computing node are compressed by adopting the mode, so that the storage of the category diagrams is realized under the condition of not occupying a large video memory space.
Step 206: and determining the target image category corresponding to the image sample according to the category chart.
Specifically, the category chart stores k neighboring category vectors corresponding to each category vector, and each image sample can determine the corresponding category vector through the corresponding image label, and then finds the corresponding k neighboring category vectors based on the category vectors; therefore, all neighboring class vectors corresponding to all image samples can be found based on the class chart.
After the class diagram is compressed, K neighboring class vectors of each class vector in the class diagram are not the same any more, and in order to ensure that the training data can still be quickly acquired to the corresponding target image class based on the class diagram, the method is implemented by the following steps:
the determining a target image category corresponding to the image sample based on the category graph comprises:
and determining a target image category corresponding to the image sample in the category chart based on a preset function.
In specific implementation, although the class diagram construction process is completely implemented on the GPU, training is completely implemented on the GPU, and a problem is also faced: storage of Graph (i.e., category Graph). In the case of one hundred million classes, let K be 1000, and the Graph size be 1 hundred million by 1000, about 372G, and each node (i.e. GPU) needs to query the complete Graph to select active classes (i.e. the target training class of the image sample) during training. That is, each node needs to have a complete Graph, and the complete Graph needs to occupy a large memory, and cannot be stored in each node.
In order to fully use the GPU to implement the training of the machine learning model, two methods can be adopted to load the Graph into the GPU video memory (e.g. 32G V100 video memory):
1) and (3) compressing the Graph: although a complete Graph is required on each node, because w (i.e. the class vector) on each node is local, this means that it is not possible for a w on a node to be selected by the mini-batch (image sample) on that node. Therefore, the Graph on the node can delete w not on the node, and assuming 128 calories, on average, can be compressed to 372G/128-2.9G;
2) quick access: after Graph completes compression, K (i.e., K neighbor class vectors) for each W in Graph is no longer the same. For example, a1 hundred million by K two-dimensional tensor is transformed into a continuously stored one-dimensional tensor. There is a problem here: the mini-batch cannot quickly acquire W _ active (i.e., the target training category) based on Graph. In order to solve this problem, a kernel function can be added to the bottom layer of the pytorech to complete fast access to the compressed Graph: that is, the K value after each W compression is counted first, and the K value before the corresponding position W is accumulated by another tenor (the accumulated result is the offset of the W in the compressed Graph). In the training process, different threads in the kernel are used for finding the offset of all x (training data characteristics) in the mini-batch in the Graph concurrently, so that the active class can be taken quickly.
While the neighboring class vectors corresponding to different image samples may be repeated, and in order to ensure the training speed, the number of the target training classes in the target image class set of the image samples is limited, which is specifically implemented as follows:
the determining a target image category corresponding to the image sample based on the category graph comprises:
determining a real category of the image sample based on an image label corresponding to the image sample;
determining a target class corresponding to a class vector adjacent to a real class of the image sample in the class chart based on a preset function, and determining a corresponding target image class based on the target class.
Specifically, the image labels corresponding to the image samples, that is, the real categories of the image samples, complete quick access to the compressed Graph based on a kernel function, determine the target categories corresponding to the category vectors adjacent to the real categories of all the image samples, and then determine the corresponding target image categories based on the target categories.
In specific implementation, the determining the corresponding target image category based on the target category includes:
de-duplicating the target category;
judging whether the number of the target categories after the duplication removal is larger than or equal to a preset number threshold value or not,
if so, deleting the first preset number of the de-duplicated target classes to form target image classes,
and if not, selecting a second preset number of target categories corresponding to the category vectors which are not adjacent to the real categories of the image samples from the category chart, and adding the target categories into the de-duplicated target categories to form target image categories.
The preset number threshold may be set according to practical applications, and is not limited herein, for example, the preset number threshold is 1000, 10000, and the like.
In practical application, rapid access to the compressed Graph is completed based on a kernel function, the target classes corresponding to the class vectors adjacent to the real classes of all the image samples are determined, and the target class vectors corresponding to the class vectors adjacent to the real classes of different image samples are repeated, for example, the target classes corresponding to the class vectors adjacent to the real classes of the image sample 1 are class 1 and class 2, and the target classes corresponding to the class vectors adjacent to the real classes of the image sample 2 are class 1 and class 3, so that the target classes 1 of the image sample 1 and the target classes 1 of the image sample 2 are repeated; at this point, the target class may be deduplicated and one class 1 may be deleted.
In practical applications, in order to ensure the training result, the number of target training classes corresponding to a batch of image samples is limited, for example, 1000 target training classes are required.
At this time, after the target categories are deduplicated, whether the number of the deduplicated target categories is greater than or equal to 1000 is judged, if yes, some target categories are deleted randomly, so that the number of the target training categories is 1000, and a target image category is formed, specifically, the first preset number is the number of the target categories which is greater than a preset number threshold; if not, selecting a target category corresponding to the real category of the image sample and the category vector without the neighbor relation from the category chart, adding the target category to the de-duplicated target category, so that the number of the target training categories is 1000, and forming the target image category, specifically, the second preset number is obtained by subtracting the current target category number from the preset number threshold.
In specific implementation, the method for determining the category provided in the embodiment of the present specification includes: firstly, performing L2 normalization on x (image sample characteristics extracted by the mini-batch) and W in a training process, then constructing a complete K-nearest Graph, such as Graph (N x K), according to all W (after L2 normalization), and based on the Graph, rapidly acquiring Active Classes of the mini-batch through label of image samples in the mini-batch: w _ active [ [ list _ y (1); the following steps are carried out; list _ y (M), wherein the list _ y (j) is a K neighbor of W _ y (j), W is normalized, W corresponding to data label is certainly the first of the Graph corresponding row, and therefore is selected certainly, and then after W _ active is de-duplicated, the relation between the residual W after de-duplication and M (the number of active classes) is compared, and the subtraction is performed to obtain the target image category of the mini-batch; by adaptively adjusting the distribution of the computing resources, the training performance of the machine learning model in large-scale classification can be greatly improved, and the computing cost is saved.
In another embodiment of the present specification, before receiving the image sample set, the method further includes:
displaying an image sample input interface for a user based on a call request of the user;
receiving an image sample set input by the user based on the image sample input interface. Or
Before the receiving the image sample set, the method further comprises:
receiving a call request sent by a user, wherein the call request carries an image sample set.
Therefore, the image category determining method provided by the embodiment of the present specification can also be provided as a service for determining an image category for other callers to use, and can be provided to the caller in an interface input manner for use in the case that the caller does not have professional use capability, and can be called by the caller directly based on a call request of the caller in the case that the caller has professional use capability.
The class determination method provided by one or more embodiments of the present specification is implemented based on application to a machine learning model, and when a large number of classes exist in the machine learning model, the method may be based on an active class selection algorithm of kNN Graph, a distributed GPU composition algorithm, and a Graph storage access optimization algorithm, so that the machine learning model may be trained on a GPU with limited capacity without increasing computational cost; in addition, the class determination method is applied to softmax of kNN, is realized by all GPUs, has high execution efficiency, and achieves almost the same precision as the original softmax on the scale of 1M, 10M and 100M by L2 normalization and lossless Graph construction, see table 1.
TABLE 1
#methods 1M 10M 100M
Selective Softmax 86.39% 79.02% 71.98%
MACH 80.11% 71.34% 59.82%
KNN Softmax 87.46% 80.99% 74.54%
Full Softmax 87.43% 81.01% 74.52%
Referring to fig. 4, fig. 4 is a flowchart illustrating a classification model training method according to an embodiment of the present disclosure, including the following steps:
step 402: receiving an image sample set, wherein the image sample set comprises image samples and image labels corresponding to the image samples.
Step 404: an initial image category set corresponding to the image sample is determined based on the image label, and a category chart corresponding to the image sample is constructed based on the initial image category set.
Step 406: and determining the target image category corresponding to the image sample according to the category chart.
Step 408: training a classification model based on the image sample set and the target image category to obtain the classification model.
In the embodiment of the present specification, firstly, the target image category corresponding to the picture sample is determined based on steps 402 to 408, then, the classification model is quickly trained based on the feature vector of the picture sample in the picture sample and the obtained target image category, and the accuracy of the classification model can be ensured.
For the process of determining the target image category corresponding to the image sample according to steps 402 to 408, reference may be made to the above-mentioned embodiment, which is not described herein again.
In this embodiment of the present specification, after the target image category of the image sample set is obtained based on the image category determining method, the image sample set and the corresponding target image category may be cached in the classification model; specifically, a unique identifier is set for each image sample in the image sample set in the classification model, then each image sample carrying the unique identifier and a target image category corresponding to each image sample are cached in a cache created for the classification model, and when the image samples are subsequently adopted for model training, the model training can be directly realized based on the image samples and the corresponding target image categories, the target image categories of the image samples do not need to be screened by the image category method, so that the model training time is saved, and the user experience is improved.
In the embodiment of the present specification, in the classification model training method, after a picture sample set is received, a corresponding initial image category set is determined according to an image label corresponding to the picture sample, then a category chart corresponding to the picture sample is constructed according to the normalized initial image category set, then target image categories corresponding to the picture sample and having a smaller number than the initial image category set are determined based on the category chart, and finally, a classification model is trained based on the picture sample set and the target image categories, so that the trained classification model can be quickly and accurately obtained.
Corresponding to the above method embodiment, the present specification further provides an image category determining apparatus embodiment, and fig. 5 shows a schematic structural diagram of an image category determining apparatus provided in an embodiment of the present specification.
As shown in fig. 5, the apparatus includes:
a first image sample receiving module 1102 configured to receive an image sample set, wherein the image sample set includes image samples and image labels corresponding to the image samples;
a first graph construction module 1104 configured to determine an initial set of image categories to which the image samples correspond based on the image labels, and construct a category graph to which the image samples correspond based on the initial set of image categories;
a first category determining module 1106 configured to determine a target image category corresponding to the image sample according to the category chart.
Optionally, the first chart constructing module 1104 is further configured to:
preprocessing the initial image category in the initial image category set to obtain an image category vector set of the initial image category;
and constructing a category chart corresponding to the image sample according to the image category vector set.
Optionally, the first chart constructing module 1104 is further configured to:
determining k neighbor class vectors of each class vector in the image class vector set in a linear search mode, wherein k is a positive integer;
and constructing a category chart corresponding to the image sample based on k adjacent category vectors of each category vector in the image category vector set.
Optionally, the first chart constructing module 1104 is further configured to:
distributing the category vectors in the image category vector set to different computing nodes based on a preset distribution mode;
and sequentially calculating the similarity between each category vector distributed on each computing node and the category vectors distributed on other computing nodes to obtain k adjacent category vectors of each category vector in the image category vector set.
Optionally, the first chart constructing module 1104 is further configured to:
calculating Euclidean distances among all first class vectors distributed on the ith calculation node, wherein i is a positive integer;
sending the first category vectors distributed on the ith computing node to the (i + 1) th computing node, and calculating Euclidean distances between each second category vector distributed on the (i + 1) th computing node and each first category vector;
receiving third category vectors distributed on the i-1 th computing node, and computing Euclidean distances between each first category vector distributed on the i-th computing node and each third category vector;
sending the third category vectors distributed on the (i-1) th computing node to the (i + 1) th computing node, and calculating Euclidean distances between each second category vector distributed on the (i + 1) th computing node and each third category vector;
until the Euclidean distance between the category vector distributed on each computing node and the category vectors distributed on other computing nodes is calculated, and the similarity between each category vector distributed on each computing node and the category vectors distributed on other computing nodes is determined according to the Euclidean distance.
Optionally, the apparatus further includes:
and the precision conversion module is configured to perform first precision conversion on the category vectors distributed on each computing node.
Optionally, the first chart constructing module 1104 is further configured to:
based on the converted first precision, sequentially calculating first similarity between each category vector distributed on each computing node and category vectors distributed on other computing nodes;
obtaining k × n neighboring category vectors of each category vector in the image category vector set, wherein n is a positive number;
converting each category vector and the corresponding k × n neighboring category vectors into a second precision, and calculating a second similarity between each category vector and the corresponding k × n neighboring category vectors based on the converted second precision;
determining k neighboring class vectors for each class vector in the set of image class vectors based on the second similarity.
Optionally, the apparatus further includes:
and the screening module is configured to screen the category vectors in the category graphs based on the image sample characteristics of the image samples stored in each computing node, and store the screened category graphs to the corresponding computing nodes respectively.
Optionally, the screening module is further configured to:
determining an image label corresponding to the image sample feature stored in each computing node;
determining a category vector corresponding to the image sample feature based on the label;
and matching the category vector corresponding to the image sample characteristic with the category vector in the category chart, and deleting the category vector without matching relation in the category chart.
Optionally, the first category determining module 1106 is further configured to:
and determining a target image category corresponding to the image sample in the category chart based on a preset function.
Optionally, the first category determining module 1106 is further configured to:
determining a real category of the image sample based on an image label corresponding to the image sample;
determining a target class corresponding to a class vector adjacent to a real class of the image sample in the class chart based on a preset function, and determining a corresponding target image class based on the target class.
Optionally, the first category determining module 1106 is further configured to:
de-duplicating the target category;
judging whether the number of the target categories after the duplication removal is larger than or equal to a preset number threshold value or not,
if so, deleting the first preset number of the de-duplicated target classes to form target image classes,
and if not, selecting a second preset number of target categories from the de-duplicated target categories, and adding the second preset number of target categories to the de-duplicated target categories to form target image categories.
Optionally, the apparatus further includes:
the interface display module is configured to display an image sample input interface for a user based on a call request of the user;
a sample receiving module configured to receive a set of image samples input by the user based on the image sample input interface.
Optionally, the apparatus further includes:
the system comprises a calling request receiving module and a calling request sending module, wherein the calling request receiving module is configured to receive a calling request sent by a user, and the calling request carries an image sample set.
The above is an illustrative scheme of an image category determination apparatus of the present embodiment. It should be noted that the technical solution of the image type determining apparatus and the technical solution of the image type determining method belong to the same concept, and details that are not described in detail in the technical solution of the image type determining apparatus can be referred to the description of the technical solution of the image type determining method.
Corresponding to the above method embodiment, the present specification further provides an embodiment of a classification model training device, and fig. 6 shows a schematic structural diagram of a classification model training device provided in an embodiment of the present specification.
As shown in fig. 6, the apparatus includes:
a second image sample receiving module 1202 configured to receive an image sample set, wherein the image sample set includes image samples and image labels corresponding to the image samples;
a second graph constructing module 1203, configured to determine an initial image category set corresponding to the image sample based on the image label, and construct a category graph corresponding to the image sample based on the initial image category set;
a second category determination module 1204, configured to determine a target image category corresponding to the image sample according to the category chart;
a model training module 1206 configured to train a classification model based on the set of image samples and the target image class to obtain the classification model.
The above is a schematic scheme of the classification model training apparatus of this embodiment. It should be noted that the technical solution of the classification model training apparatus and the technical solution of the classification model training method belong to the same concept, and details that are not described in detail in the technical solution of the classification model training apparatus can be referred to the description of the technical solution of the classification model training method.
FIG. 7 illustrates a block diagram of a computing device 1700, according to one embodiment of the present description. Components of the computing device 1700 include, but are not limited to, memory 1710 and a processor 1720. Processor 1720 is coupled to memory 1710 via bus 1730, and database 1750 is used to store data.
Computing device 1700 also includes access device 1740, access device 1740 enabling computing device 1700 to communicate via one or more networks 1760. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. The access device 1740 may include one or more of any type of network interface (e.g., a Network Interface Card (NIC)) whether wired or wireless, such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.
In one embodiment of the present description, the aforementioned components of computing device 1700 and other components not shown in FIG. 7 may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device architecture shown in FIG. 7 is for purposes of example only and is not limiting as to the scope of the present description. Those skilled in the art may add or replace other components as desired.
Computing device 1700 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smartphone), wearable computing device (e.g., smartwatch, smartglasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 1700 may also be a mobile or stationary server.
Wherein processor 1720 is configured to, when executing computer-executable instructions, perform the steps of the class determination method or the steps of the classification model training method.
The above is an illustrative scheme of a computing device of the present embodiment. It should be noted that the technical solution of the computing device belongs to the same concept as the technical solution of the image category determining method or the classification model training method, and details that are not described in detail in the technical solution of the computing device can be referred to the description of the technical solution of the image category determining method or the classification model training method.
An embodiment of the present specification further provides a computer readable storage medium storing computer instructions which, when executed by a processor, implement the steps of the image class determination method or implement the steps of the classification model training method.
The above is an illustrative scheme of a computer-readable storage medium of the present embodiment. It should be noted that the technical solution of the storage medium belongs to the same concept as the technical solution of the image category determining method or the classification model training method, and details that are not described in detail in the technical solution of the storage medium can be referred to the description of the technical solution of the image category determining method or the classification model training method.
Another embodiment of the present specification further provides a category determination method, including:
receiving a training sample data set, wherein the training sample data set comprises training sample data and a label corresponding to the training sample data;
determining an initial training class set corresponding to the training sample data based on the label;
preprocessing an initial training category in the initial training category set to obtain a category vector set of the initial training category;
constructing a category chart corresponding to the training sample data according to the category vector set;
and determining a target training class set corresponding to the training sample data based on the class chart.
Optionally, the constructing a category chart corresponding to the training sample data according to the category vector set includes:
determining k neighboring category vectors of each category vector in the category vector set in a linear search mode, wherein k is a positive integer;
and constructing a class chart corresponding to the training sample data based on k adjacent class vectors of each class vector in the class vector set.
Optionally, the determining k neighboring class vectors of each class vector in the class vector set by a linear search includes:
distributing the category vectors in the category vector set to different computing nodes based on a preset distribution mode;
and sequentially calculating the similarity between each category vector distributed on each computing node and the category vectors distributed on other computing nodes to obtain k adjacent category vectors of each category vector in the category vector set.
Optionally, the sequentially calculating the similarity between each category vector allocated to each computing node and category vectors allocated to other computing nodes includes:
calculating Euclidean distances among all first class vectors distributed on the ith calculation node, wherein i is a positive integer;
sending the first category vectors distributed on the ith computing node to the (i + 1) th computing node, and calculating Euclidean distances between each second category vector distributed on the (i + 1) th computing node and each first category vector;
receiving third category vectors distributed on the i-1 th computing node, and computing Euclidean distances between each first category vector distributed on the i-th computing node and each third category vector;
sending the third category vectors distributed on the (i-1) th computing node to the (i + 1) th computing node, and calculating Euclidean distances between each second category vector distributed on the (i + 1) th computing node and each third category vector;
until the Euclidean distance between the category vector distributed on each computing node and the category vectors distributed on other computing nodes is calculated, and the similarity between each category vector distributed on each computing node and the category vectors distributed on other computing nodes is determined according to the Euclidean distance.
Optionally, before the sequentially calculating the similarity between each category vector allocated on each computing node and the category vectors allocated on other computing nodes, the method further includes:
and performing first precision conversion on the category vectors distributed on each computing node.
Optionally, the sequentially calculating the similarity between each category vector allocated on each computing node and category vectors allocated on other computing nodes to obtain k neighboring category vectors of each category vector in the category vector set includes:
based on the converted first precision, sequentially calculating first similarity between each category vector distributed on each computing node and category vectors distributed on other computing nodes;
obtaining k × n neighboring category vectors of each category vector in the category vector set, wherein n is a positive number;
converting each category vector and the corresponding k × n neighboring category vectors into a second precision, and calculating a second similarity between each category vector and the corresponding k × n neighboring category vectors based on the converted second precision;
determining k neighboring class vectors for each class vector in the set of class vectors based on the second similarity.
Optionally, after the constructing the class chart corresponding to the training sample data based on the k neighboring class vectors of each class vector in the class vector set, the method further includes:
and screening the class vectors in the class diagram based on the training sample characteristics of the training sample data stored in each computing node, and respectively storing the screened class diagrams to the corresponding computing nodes.
Optionally, the screening the category vectors in the category graph based on the training sample features stored in each computing node includes:
determining a label corresponding to the training sample feature stored in each computing node;
determining a category vector corresponding to the training sample features based on the label;
and matching the class vector corresponding to the training sample characteristic with the class vector in the class diagram, and deleting the class vector without matching relation in the class diagram.
Optionally, the determining, based on the class chart, a target training class set corresponding to the training sample data includes:
and determining a target training class set corresponding to the training sample data in the class diagram based on a preset function.
Optionally, the determining, based on the class chart, a target training class set corresponding to the training sample data includes:
determining the real category of the training sample data based on the label corresponding to the training sample data;
and determining a target class corresponding to a class vector adjacent to the real class of the training sample data in the class diagram based on a preset function, and determining a corresponding target training class set based on the target class.
Optionally, the determining a corresponding target training class set based on the target class includes:
de-duplicating the target category;
judging whether the number of the target categories after the duplication removal is larger than or equal to a preset number threshold value or not,
if so, deleting the first preset number of the de-duplicated target classes to form a target training class set,
if not, selecting a second preset number of target categories from the de-duplicated target categories, and adding the second preset number of target categories to the de-duplicated target categories to form a target training category set.
Optionally, the training sample data comprises images.
The class determination method provided in the embodiments of the present specification may be applied to any large-scale classification field, and the number of training samples may be reduced and the model training speed may be increased by the class determination method under the condition that the precision of the training samples is not changed, where the training samples include, but are not limited to, images, videos, or commodities, for example: under the condition that the training sample is an image, the target image category of the image can be obtained through the category determination method; under the condition that the training sample is a video, the target video label category of the video can be obtained through the category determination method; under the condition that the training sample is a commodity, the target commodity label category of the commodity can be obtained through the category determination method; subsequently, a small amount of accurate target training categories obtained by the method can be used for quickly and accurately training machine learning models such as classification models or recognition models, so that the machine learning models obtained through training can be better used in specific applications.
One embodiment of the present description implements a method for class determination comprising receiving a set of training sample data; determining an initial training class set corresponding to the training sample data based on the label corresponding to the training sample data in the training sample data set; preprocessing an initial training category in the initial training category set to obtain a category vector set of the initial training category; constructing a category chart corresponding to the training sample data according to the category vector set; determining a target training class set corresponding to the training sample data based on the class diagram; the class determination method realizes the construction of the class chart corresponding to the training sample data through the normalized class vector set, so that the training sample data can be effectively determined based on the class chart, a small-batch target training class set adjacent to the training sample data is obtained, the distribution of computing resources is adaptively adjusted through the k nearest neighbor classification algorithm, the training performance of a model in large-scale classification can be greatly improved, and the computing cost is saved.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The computer instructions comprise computer program code which may be in the form of source code, object code, an executable file or some intermediate form, or the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
It should be noted that, for the sake of simplicity, the foregoing method embodiments are described as a series of acts, but those skilled in the art should understand that the present embodiment is not limited by the described acts, because some steps may be performed in other sequences or simultaneously according to the present embodiment. Further, those skilled in the art should also appreciate that the embodiments described in this specification are preferred embodiments and that acts and modules referred to are not necessarily required for an embodiment of the specification.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
The preferred embodiments of the present specification disclosed above are intended only to aid in the description of the specification. Alternative embodiments are not exhaustive and do not limit the invention to the precise embodiments described. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the embodiments and the practical application, to thereby enable others skilled in the art to best understand and utilize the embodiments. The specification is limited only by the claims and their full scope and equivalents.

Claims (19)

1. An image category determination method, comprising:
receiving an image sample set, wherein the image sample set comprises image samples and image labels corresponding to the image samples;
determining an initial image category set corresponding to the image sample based on the image label, and constructing a category chart corresponding to the image sample based on the initial image category set;
and determining the target image category corresponding to the image sample according to the category chart.
2. The image category determination method of claim 1, the constructing a category chart corresponding to the image sample based on the initial set of image categories comprising:
preprocessing the initial image category in the initial image category set to obtain an image category vector set of the initial image category;
and constructing a category chart corresponding to the image sample according to the image category vector set.
3. The method of claim 2, the constructing a category graph corresponding to the image sample from the set of image category vectors comprising:
determining k neighbor class vectors of each class vector in the image class vector set in a linear search mode, wherein k is a positive integer;
and constructing a category chart corresponding to the image sample based on k adjacent category vectors of each category vector in the image category vector set.
4. The method of claim 3, the determining k neighbor class vectors for each class vector in the set of image class vectors by way of a linear search comprising:
distributing the category vectors in the image category vector set to different computing nodes based on a preset distribution mode;
and sequentially calculating the similarity between each category vector distributed on each computing node and the category vectors distributed on other computing nodes to obtain k adjacent category vectors of each category vector in the image category vector set.
5. The method of claim 4, wherein said sequentially calculating the similarity between the respective class vector assigned to each compute node and the class vectors assigned to other compute nodes comprises:
calculating Euclidean distances among all first class vectors distributed on the ith calculation node, wherein i is a positive integer;
sending the first category vectors distributed on the ith computing node to the (i + 1) th computing node, and calculating Euclidean distances between each second category vector distributed on the (i + 1) th computing node and each first category vector;
receiving third category vectors distributed on the i-1 th computing node, and computing Euclidean distances between each first category vector distributed on the i-th computing node and each third category vector;
sending the third category vectors distributed on the (i-1) th computing node to the (i + 1) th computing node, and calculating Euclidean distances between each second category vector distributed on the (i + 1) th computing node and each third category vector;
until the Euclidean distance between the category vector distributed on each computing node and the category vectors distributed on other computing nodes is calculated, and the similarity between each category vector distributed on each computing node and the category vectors distributed on other computing nodes is determined according to the Euclidean distance.
6. The method of claim 4, before sequentially calculating the similarity between each class vector allocated on each computing node and the class vectors allocated on other computing nodes, further comprising:
and performing first precision conversion on the category vectors distributed on each computing node.
7. The method of claim 6, wherein the sequentially calculating the similarity between each class vector allocated on each computing node and the class vectors allocated on other computing nodes to obtain k neighboring class vectors of each class vector in the image class vector set comprises:
based on the converted first precision, sequentially calculating first similarity between each category vector distributed on each computing node and category vectors distributed on other computing nodes;
obtaining k × n neighboring category vectors of each category vector in the image category vector set, wherein n is a positive number;
converting each category vector and the corresponding k × n neighboring category vectors into a second precision, and calculating a second similarity between each category vector and the corresponding k × n neighboring category vectors based on the converted second precision;
determining k neighboring class vectors for each class vector in the set of image class vectors based on the second similarity.
8. The method of claim 4, further comprising, after constructing the class graph corresponding to the image sample based on the k neighboring class vectors of the respective class vectors in the set of image class vectors:
and screening the category vectors in the category graphs based on the image sample characteristics of the image samples stored in each computing node, and respectively storing the screened category graphs to the corresponding computing nodes.
9. The method of claim 8, the filtering category vectors in the category graph based on the image sample features stored in each compute node comprising:
determining an image label corresponding to the image sample feature stored in each computing node;
determining a category vector corresponding to the image sample feature based on the label;
and matching the category vector corresponding to the image sample characteristic with the category vector in the category chart, and deleting the category vector without matching relation in the category chart.
10. The method of claim 9, the determining a target image category to which the image sample corresponds based on the category graph comprising:
and determining a target image category corresponding to the image sample in the category chart based on a preset function.
11. The method of claim 10, the determining a target image category to which the image sample corresponds based on the category graph comprising:
determining a real category of the image sample based on an image label corresponding to the image sample;
determining a target class corresponding to a class vector adjacent to a real class of the image sample in the class chart based on a preset function, and determining a corresponding target image class based on the target class.
12. The method of claim 11, the determining a corresponding target image class based on the target class comprising:
de-duplicating the target category;
judging whether the number of the target categories after the duplication removal is larger than or equal to a preset number threshold value or not,
if so, deleting the first preset number of the de-duplicated target classes to form target image classes,
and if not, selecting a second preset number of target categories from the de-duplicated target categories, and adding the second preset number of target categories to the de-duplicated target categories to form target image categories.
13. The image category determination method of claim 1, further comprising, prior to receiving the sample set of images:
displaying an image sample input interface for a user based on a call request of the user;
receiving an image sample set input by the user based on the image sample input interface.
14. The image category determination method of claim 1, further comprising, prior to receiving the sample set of images:
receiving a call request sent by a user, wherein the call request carries an image sample set.
15. A classification model training method, comprising:
receiving an image sample set, wherein the image sample set comprises image samples and image labels corresponding to the image samples;
determining an initial image category set corresponding to the image sample based on the image label, and constructing a category chart corresponding to the image sample based on the initial image category set;
determining a target image category corresponding to the image sample according to the category chart;
training a classification model based on the image sample set and the target image category to obtain the classification model.
16. An image category determination apparatus comprising:
a first image sample receiving module configured to receive an image sample set, wherein the image sample set includes image samples and image labels corresponding to the image samples;
a first graph construction module configured to determine an initial image category set corresponding to the image sample based on the image label, and construct a category graph corresponding to the image sample based on the initial image category set;
the first category determination module is configured to determine a target image category corresponding to the image sample according to the category chart.
17. A classification model training apparatus comprising:
a second image sample receiving module configured to receive an image sample set, wherein the image sample set includes image samples and image labels corresponding to the image samples;
a second graph construction module configured to determine an initial image category set corresponding to the image sample based on the image label, and construct a category graph corresponding to the image sample based on the initial image category set;
the second category determination module is configured to determine a target image category corresponding to the image sample according to the category chart;
a model training module configured to train a classification model based on the image sample set and the target image class to obtain the classification model.
18. A computing device, comprising:
a memory and a processor;
the memory is configured to store computer-executable instructions, and the processor is configured to execute the computer-executable instructions, wherein the processor implements the steps of the image class determination method according to any one of claims 1 to 14 or the classification model training method according to claim 15 when executing the computer-executable instructions.
19. A computer readable storage medium storing computer instructions which, when executed by a processor, carry out the steps of the image class determination method of any one of claims 1 to 14 or carry out the steps of the classification model training method of claim 15.
CN202010386765.9A 2020-05-09 2020-05-09 Image category determination method and device Pending CN113627455A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010386765.9A CN113627455A (en) 2020-05-09 2020-05-09 Image category determination method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010386765.9A CN113627455A (en) 2020-05-09 2020-05-09 Image category determination method and device

Publications (1)

Publication Number Publication Date
CN113627455A true CN113627455A (en) 2021-11-09

Family

ID=78377531

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010386765.9A Pending CN113627455A (en) 2020-05-09 2020-05-09 Image category determination method and device

Country Status (1)

Country Link
CN (1) CN113627455A (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102298606A (en) * 2011-06-01 2011-12-28 清华大学 Random walking image automatic annotation method and device based on label graph model
CN102750289A (en) * 2011-04-19 2012-10-24 富士通株式会社 Tag group classifying method and equipment as well as data mixing method and equipment
CN103189836A (en) * 2010-08-30 2013-07-03 国际商业机器公司 Method for classification of objects in a graph data stream
CN103559504A (en) * 2013-11-04 2014-02-05 北京京东尚科信息技术有限公司 Image target category identification method and device
CN104657718A (en) * 2015-02-13 2015-05-27 武汉工程大学 Face recognition method based on face image feature extreme learning machine
CN105303195A (en) * 2015-10-20 2016-02-03 河北工业大学 Bag-of-word image classification method
CN105469095A (en) * 2015-11-17 2016-04-06 电子科技大学 Vehicle model identification method based on pattern set histograms of vehicle model images
US10013436B1 (en) * 2014-06-17 2018-07-03 Google Llc Image annotation based on label consensus
CN108304882A (en) * 2018-02-07 2018-07-20 腾讯科技(深圳)有限公司 A kind of image classification method, device and server, user terminal, storage medium
CN108351985A (en) * 2015-06-30 2018-07-31 亚利桑那州立大学董事会 Method and apparatus for large-scale machines study
CN108629373A (en) * 2018-05-07 2018-10-09 苏州大学 A kind of image classification method, system, equipment and computer readable storage medium
CN111125422A (en) * 2019-12-13 2020-05-08 北京达佳互联信息技术有限公司 Image classification method and device, electronic equipment and storage medium

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103189836A (en) * 2010-08-30 2013-07-03 国际商业机器公司 Method for classification of objects in a graph data stream
CN102750289A (en) * 2011-04-19 2012-10-24 富士通株式会社 Tag group classifying method and equipment as well as data mixing method and equipment
CN102298606A (en) * 2011-06-01 2011-12-28 清华大学 Random walking image automatic annotation method and device based on label graph model
CN103559504A (en) * 2013-11-04 2014-02-05 北京京东尚科信息技术有限公司 Image target category identification method and device
US10013436B1 (en) * 2014-06-17 2018-07-03 Google Llc Image annotation based on label consensus
CN104657718A (en) * 2015-02-13 2015-05-27 武汉工程大学 Face recognition method based on face image feature extreme learning machine
CN108351985A (en) * 2015-06-30 2018-07-31 亚利桑那州立大学董事会 Method and apparatus for large-scale machines study
CN105303195A (en) * 2015-10-20 2016-02-03 河北工业大学 Bag-of-word image classification method
CN105469095A (en) * 2015-11-17 2016-04-06 电子科技大学 Vehicle model identification method based on pattern set histograms of vehicle model images
CN108304882A (en) * 2018-02-07 2018-07-20 腾讯科技(深圳)有限公司 A kind of image classification method, device and server, user terminal, storage medium
CN108629373A (en) * 2018-05-07 2018-10-09 苏州大学 A kind of image classification method, system, equipment and computer readable storage medium
CN111125422A (en) * 2019-12-13 2020-05-08 北京达佳互联信息技术有限公司 Image classification method and device, electronic equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
J. MA等: "Semi-Supervised Classification With Graph Structure Similarity and Extended Label Propagation", IEEE ACCESS, 30 April 2019 (2019-04-30), pages 58010 - 58022 *
王进等: "一种基于增量式超网络的多标签分类方法", 重庆邮电大学学报(自然科学版), vol. 31, no. 04, 31 August 2019 (2019-08-31), pages 538 - 549 *

Similar Documents

Publication Publication Date Title
CN110807495B (en) Multi-label classification method, device, electronic equipment and storage medium
CN110188227B (en) Hash image retrieval method based on deep learning and low-rank matrix optimization
CN109711422B (en) Image data processing method, image data processing device, image data model building method, image data model building device, computer equipment and storage medium
CN110210560B (en) Incremental training method, classification method and device, equipment and medium of classification network
CN111382868B (en) Neural network structure searching method and device
CN110752028A (en) Image processing method, device, equipment and storage medium
CN110046249A (en) Training method, classification method, system, equipment and the storage medium of capsule network
CN112508094A (en) Junk picture identification method, device and equipment
CN112348081A (en) Transfer learning method for image classification, related device and storage medium
CN113326930A (en) Data processing method, neural network training method, related device and equipment
CN116580257A (en) Feature fusion model training and sample retrieval method and device and computer equipment
CN109523016B (en) Multi-valued quantization depth neural network compression method and system for embedded system
CN113127667A (en) Image processing method and device, and image classification method and device
US20200151518A1 (en) Regularized multi-metric active learning system for image classification
CN112132279A (en) Convolutional neural network model compression method, device, equipment and storage medium
CN112257855A (en) Neural network training method and device, electronic equipment and storage medium
Gu et al. No-reference image quality assessment with reinforcement recursive list-wise ranking
CN110717405A (en) Face feature point positioning method, device, medium and electronic equipment
CN112668718B (en) Neural network training method, device, electronic equipment and storage medium
CN116629375A (en) Model processing method and system
CN116362294A (en) Neural network searching method and device and readable storage medium
CN113627455A (en) Image category determination method and device
CN114155388B (en) Image recognition method and device, computer equipment and storage medium
KR20210038027A (en) Method for Training to Compress Neural Network and Method for Using Compressed Neural Network
CN115908882A (en) Picture clustering method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination