WO2020083073A1 - 非机动车图像多标签分类方法、系统、设备及存储介质 - Google Patents

非机动车图像多标签分类方法、系统、设备及存储介质 Download PDF

Info

Publication number
WO2020083073A1
WO2020083073A1 PCT/CN2019/111320 CN2019111320W WO2020083073A1 WO 2020083073 A1 WO2020083073 A1 WO 2020083073A1 CN 2019111320 W CN2019111320 W CN 2019111320W WO 2020083073 A1 WO2020083073 A1 WO 2020083073A1
Authority
WO
WIPO (PCT)
Prior art keywords
classification
network model
motor vehicle
training
layer
Prior art date
Application number
PCT/CN2019/111320
Other languages
English (en)
French (fr)
Inventor
谢晓汶
陈燕娟
黑光月
周延培
陈曲
周峰
孙新
章勇
曹李军
陈卫东
Original Assignee
苏州科达科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州科达科技股份有限公司 filed Critical 苏州科达科技股份有限公司
Publication of WO2020083073A1 publication Critical patent/WO2020083073A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Definitions

  • the present application relates to the technical field of image processing, and in particular to a method, system, device and storage medium for multi-label classification of non-motor vehicle images.
  • convolutional neural network can automatically learn image feature extraction, and it has a good effect in most computer vision tasks.
  • Conventional methods can use multiple multi-classification networks to complete the image recognition of multiple class labels, but this method will cause the algorithm to increase linearly with the size of the classification attribute in the application, and the efficiency is very low.
  • the purpose of this application is to provide a method, system, equipment and storage medium for multi-label classification of non-motor vehicle images.
  • Using a classification network model can realize multi-attribute classification of non-motor vehicles and facilitate training. High classification accuracy.
  • An embodiment of the present application provides a multi-label classification method for non-motor vehicle images.
  • the label of the non-motor vehicle image includes classification results of multiple attributes.
  • the method includes the following steps:
  • the classification network model includes a feature extraction layer and a plurality of classification units corresponding to the attributes one-to-one;
  • the feature extraction layer of the classification network model extracts the features in the tested non-motor vehicle image
  • Multiple classification units of the classification network model respectively calculate the classification results of each attribute according to the extracted features
  • the multiple classification units of the classification network model respectively calculate the classification results of each attribute based on the extracted features, including the following steps:
  • a plurality of classification units of the classification network model respectively calculate the probability that the tested non-motor vehicle image belongs to each category in the corresponding attribute, and select the category with the highest probability as the classification result of the corresponding attribute.
  • the feature extraction layer includes at least one convolutional layer and at least one pooling layer, and the classification unit is a softmax layer.
  • a first fully connected layer and multiple branch fully connected layers are further provided between the feature extraction layer and the classification unit, the multiple branch fully connected layers correspond to the classification unit in one-to-one correspondence, and the features
  • the output of the extraction layer is connected to the plurality of branch fully connected layers after passing through the first fully connected layer, and the output of each branch fully connected layer is connected to the corresponding classification unit.
  • the output of the first fully connected layer is input to a second fully connected layer through a dropout layer, and the second fully connected layer is connected to the plurality of branch fully connected layers through a dropout layer.
  • the classification network model is trained using the following steps:
  • the training set includes training image data and label data corresponding to each training image, the label data includes an image path and a category of the training image in each attribute;
  • the training set is input into the classification network model for iterative training, the weighted sum of the loss values of each classification unit is used as the loss of the classification network, and iterative training is performed until the model converges;
  • the method further includes the following steps:
  • the inputting the training set into the classification network model for iterative training includes the following steps:
  • the training network is used to iteratively train the classification network model, and the learning rate is multiplied by the value of k after each iteration of training i, as the learning rate of subsequent iteration training, where i is the number of cycles to adjust the learning rate by default, and k is the pre-training
  • the learning rate adjustment coefficient is set, and k ⁇ 1;
  • An embodiment of the present application also provides a non-motor vehicle image multi-label classification system, which is applied to the non-motor vehicle image multi-label classification method.
  • the system includes:
  • the image input module is used to input the tested non-motor vehicle image into the trained classification network model, the classification network model includes a feature extraction layer and a plurality of classification units corresponding to the attributes one-to-one;
  • a feature extraction module for extracting features in the tested non-motor vehicle image using the feature extraction layer of the classification network model
  • An image classification module which is used to calculate the classification results of each attribute according to the extracted features by using multiple classification units of the classification network model;
  • the result output module is used to combine the classification results of various attributes as a label of the tested non-motor vehicle image.
  • An embodiment of the present application also provides a non-motor vehicle image multi-label classification device, including:
  • a memory in which executable instructions of the processor are stored
  • the processor is configured to execute the steps of the non-motor vehicle image multi-label classification method by executing the executable instructions.
  • An embodiment of the present application further provides a computer-readable storage medium for storing a program, which, when executed, implements the steps of the multi-label classification method for non-motor vehicle images.
  • the non-motor vehicle image multi-label classification method, system, device and storage medium provided by the present application have the following advantages:
  • This application uses deep learning to extract features and combine multiple classification units.
  • Using a classification network model can achieve multi-attribute classification of non-motor vehicles. It is convenient to train and has high classification accuracy, thus solving the problem of using multiple image classifications in the prior art. The steps of the model are cumbersome and inefficient.
  • FIG. 1 is a flowchart of a non-motor vehicle image multi-label classification method according to an embodiment of the present application
  • FIG. 2 is an example diagram of an annotation style of non-motor vehicle images of a training set according to an embodiment of the present application
  • FIG. 3 is a schematic structural diagram of a classification network model according to an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of a multi-label classification system for non-motor vehicle images according to an embodiment of the present application
  • FIG. 5 is a schematic structural diagram of a multi-label classification device for non-motor vehicle images according to an embodiment of the present application
  • FIG. 6 is a schematic structural diagram of a computer storage medium according to an embodiment of the present application.
  • an embodiment of the present application provides a non-motor vehicle image multi-label classification method.
  • the label of the non-motor vehicle image includes classification results of multiple attributes.
  • the method includes the following steps:
  • S100 Input the tested non-motor vehicle image into the trained classification network model, the classification network model includes a feature extraction layer and a plurality of classification units corresponding to the attributes one-to-one;
  • S200 The feature extraction layer of the classification network model extracts features in the tested non-motor vehicle image
  • S300 A plurality of classification units of the classification network model respectively calculate the classification results of each attribute according to the extracted features
  • Each attribute may be a preset attribute that does not exist in a mutually exclusive relationship.
  • multiple attributes may include power type, applicable gender, vehicle use, whether there is a parasol, etc.
  • non-machine Motor vehicles can be divided into electric vehicles, motorcycles, bicycles, etc.
  • applicable genders can be divided into male vehicles and female vehicles
  • vehicle uses can be divided into takeaway vehicles, courier vehicles, non-working vehicles, etc., whether there is a parasol
  • the label of a non-motorized vehicle can be a combination of the classification results of multiple attributes.
  • the label of a non-motorized vehicle can be an electric vehicle, a female vehicle, a non-working vehicle, or a parasol.
  • this application combines feature extraction through deep learning and multiple classification units, and adopts a classification network model to realize non-motor vehicle multi-attribute classification, which is convenient for training and has high classification accuracy.
  • the multiple classification units of the classification network model respectively calculate the classification result of each attribute based on the extracted features, including the following steps:
  • a plurality of classification units of the classification network model respectively calculate the probability that the tested non-motor vehicle image belongs to each category in the corresponding attribute, and select the category with the highest probability as the classification result of the corresponding attribute.
  • the classification unit can use a softmax layer.
  • the trained softmax layer inputs the feature vector extracted by the feature extraction layer, and the output result is a T * 1 vector.
  • the value of T corresponds to the number of categories under the attribute, and T * 1
  • Each value in the vector represents the probability that the feature belongs to the next category of the attribute.
  • the category with the highest probability is selected as the classification result of the classification unit.
  • the selected category is the test non-motor vehicle image under this attribute. The category that is most likely to belong, so that the classification result of the tested non-motor vehicle image with the highest accuracy can be accurately obtained.
  • the feature extraction layer includes at least one convolutional layer and at least one pooling layer
  • the classification unit is a softmax layer.
  • a first fully connected layer and multiple branch fully connected layers are also provided between the feature extraction layer and the classification unit, the multiple branch fully connected layers correspond to the classification unit in one-to-one correspondence, and the output of the feature extraction layer
  • the first fully connected layer is connected to the plurality of branch fully connected layers, and the output of each branch fully connected layer is connected to the corresponding classification unit.
  • the output of the first fully connected layer may be input to a second fully connected layer through a dropout layer, and the second fully connected layer is connected to the plurality of branch fully connected layers through a dropout layer.
  • Each convolutional layer in a convolutional neural network is composed of several convolutional units, and the parameters of each convolutional unit are optimized by a back-propagation algorithm.
  • the purpose of the convolution operation is to extract different features of the input.
  • the first convolutional layer may only extract some low-level features such as edges, lines, and corners. More layers of the network can iteratively extract more complex features from the low-level features. Characteristics.
  • the pooling layer is also called the sampling layer. It is also composed of multiple feature surfaces immediately after the convolutional layer. Each feature surface corresponds to a feature surface of the layer above it, and does not change the number of feature surfaces. .
  • the pooling layer aims to obtain features with spatial invariance by reducing the resolution of the feature plane.
  • the pooling layer plays the role of secondary feature extraction, and each neuron of it performs pooling operation on the local receiving domain.
  • Common pooling methods include maximum pooling, that is, taking the point with the largest value in the local acceptance domain, and average pooling, which is to average all the values in the local acceptance domain, random pooling, etc. In this example, the maximum pooling method is mainly used.
  • the reasonable cooperation of the convolutional layer and the pooling layer can better extract the characteristics of the tested non-motor vehicle images to further improve the accuracy of classification by the classification unit.
  • the specific application of each layer will be further introduced in the specific examples shown in FIGS. 2 and 3 below.
  • the classification network model is trained using the following steps:
  • the training set includes training image data and label data corresponding to each training image, the label data includes image path and the category of the training image in each attribute;
  • the training image data may be already acquired Non-motor vehicle images that have been classified and know the classification results, add labels to each trained non-motor vehicle image according to the known classification results, and add the trained non-motor vehicle images and labels to the training set together to classify the network model Conduct training;
  • the training set is input into the classification network model for iterative training, the weighted sum of the loss values of each classification unit is used as the loss of the classification network, and iterative training is performed until the model converges; since the loss is taken into account the loss value of each classification unit, In the iterative process, the total loss value continues to decrease, which also optimizes and improves the recognition accuracy of the taxonomy corresponding to each attribute;
  • the converged classification network model can be used for the classification and recognition of the non-motor vehicle images tested in this application, and in use, you can choose to accurately identify non-motor vehicle images and add them to the training set. Continuously enrich the training set, and periodically retrain the classification network model, and constantly optimize the recognition effect of the classification network model to continuously improve the accuracy of model recognition during use.
  • Random initialization is performed on multiple branch fully connected layers and multiple classification units of the constructed classification network model.
  • the random initialization may use normal distribution to initialize the weights, but the application is not limited thereto.
  • the inputting the training set into the classification network model for iterative training includes the following steps:
  • the training network is used to iteratively train the classification network model, and the learning rate is multiplied by the value of k after each iteration of training i, as the learning rate of subsequent iteration training, where i is the number of cycles to adjust the learning rate by default, and k is the pre-training
  • the learning rate adjustment coefficient is set, and k ⁇ 1;
  • the learning rate is gradually decayed, a new balance is found between training time and loss reduction, and the training time is appropriately controlled by setting the maximum number of iterations;
  • the non-motor vehicle image multi-label classification includes the following steps:
  • Step 1 Non-motor vehicle multi-label data set collation, step one corresponds to the step of obtaining the training set in the training process of the above classification network model; the specific step one is as follows:
  • Step 1.1 Obtain a large number of trained non-motor vehicle images from the real scene, and number the images;
  • Step 1.2 Design the labeling information of the image, and divide it according to multiple attributes, which are respectively represented by one-digit labeling bits.
  • Each group of attributes contains multiple attributes, which are represented by numbers from 1 to N, respectively.
  • the labeling format is: [image path] [attribute category code of feature 1] [attribute category code of feature 2] ..., for example, as shown in FIG. 2, a labeling method including five attribute labeling information of non-motor vehicle images;
  • Step 1.3 Use the above-mentioned labeling method to label all non-motor vehicle images, and organize the label information and image information into corresponding data sets.
  • the data sets are in LMDB format.
  • Step 2 The non-motor vehicle multi-label classification network model is constructed.
  • the constructed classification network model is shown in FIG. 3.
  • step two the five convolutional layers, three pooling layers, and fully connected layers used in the above embodiments are used.
  • the construction method is introduced in detail.
  • the construction process of the specific model is as follows:
  • Step 2.1 The training network consists of an input layer, five convolutional layers, three pooling layers, 2 + N fully connected layers, and N softmax layers, where N is the number of non-motor vehicle attributes determined.
  • Step 2.3 Pooling is followed by a second convolutional layer, followed by Relu activation function after the second convolution, and a second batch of normalization processing layers after the activation function, followed by a second pooling layer after normalization, and second pooling
  • the layer adopts maximum pooling
  • Relu Rectified Linear Unit
  • Step 2.4 The second pooling layer is followed by three convolutional layers, in the order of the third convolutional layer, the Relu activation function, the fourth convolutional layer, the Relu activation function, the fifth convolutional layer, and the Relu activation function;
  • Step 2.5 Connect the third pooling layer after the previous step, connect the first fully connected layer after pooling, and connect the Relu activation function and dropout layer after the first fully connected layer, and then connect the second fully connected layer, first
  • the settings of the fully-connected layer and the second fully-connected layer are the same; each node of the fully-connected layer is connected to all the nodes of the previous layer to synthesize the features extracted from the front.
  • Step 2.6 After the second fully connected layer, there are N branch fully connected layers, namely branch fully connected layer 1, branch fully connected layer 2 ... branch fully connected layer n, each branch is connected to another corresponding layer
  • the softmax layer is used as the corresponding classification unit, and the output of the classification unit is output as the final attribute category of each type of image feature, which are respectively classification unit 1, classification unit 2, ... classification unit n.
  • each neuron in the fully connected layer is fully connected with all the neurons in the previous layer.
  • the fully connected layer can integrate local information with class distinction in the convolutional layer or pooling layer.
  • the ReLU function is used as the excitation function of each neuron in the fully connected layer.
  • the output value of the fully connected layer of each branch is passed to a softmax layer for classification.
  • the algorithm of the Softmax layer can be understood as normalization. For example, there are x types of picture classification, and the output of the softmax layer is an x-dimensional vector. The first value in the vector is the probability that the current picture belongs to the first category, and the second value in the vector is the probability that the current picture belongs to the second category ... The sum of the x-dimensional vectors is 1.
  • Step 3 Non-motor vehicle multi-label classification network training
  • step three corresponds to the step of inputting the training set into the classification network model for iterative training in the training of the classification network model described above, and the loss of each classification unit during the training process The weighted sum of values is used as the loss of the classification network, and iterative training is performed until the model converges.
  • the process of the specific step three is as follows:
  • Step 3.1 According to the LMDB data set compiled in Step 1, first calculate the mean file of the training data set, save it as a .binaryproto file format, and specify the location of the binary mean file in the training network;
  • Step 3.2 Using the finetune training method, use the weight file trained on the ImageNet public data set to initialize some layer weights of the current network and randomly initialize other layers.
  • Random initialization can use normal distribution to initialize the weights, but this application is not limited to this;
  • Step 3.3 When setting the batch size, initial learning rate, and maximum number of iterations for training the classification network model, set the batch size batch_size to 32, the initial learning rate to 0.001, and the maximum number of iterations to 200,000, using the step method Modify the learning rate during the training process.
  • the learning rate is multiplied by 0.9 after every 1,000 iterations.
  • the training data is stochastic gradient descent algorithm, and the network model is set to be saved once every 10,000 iterations.
  • the random gradient descent algorithm can speed up the update of the weights of each layer. Value file and offset data, random here means that during each iteration, the samples must be randomly shuffled.
  • the shuffling can effectively reduce the parameter update offset problem caused by the samples.
  • the parameters are updated every step by subtracting its gradient. For large-scale machine learning tasks, the performance of the stochastic gradient descent algorithm is very impressive.
  • each softmax layer is 1 / N, but this application is not limited to this.
  • the distribution of weight values can be set as needed.
  • the saved network model is used as the pre-training model for the next iteration training, to avoid the loss of model data during the training process, continue training, and after the training reaches the maximum number of iterations, determine whether the loss reaches the preset threshold In the following, if it is, the training is ended, otherwise the training is continued until the loss reaches below the preset threshold, and the training converges to end the training.
  • Step 4 Non-motor vehicle multi-label image classification.
  • Steps 1 and 3 are the preparation process of the classification network model of non-motor vehicle multi-label image.
  • Step 4 corresponds to the above steps S100 to S400, that is, the trained classification The steps of the network model to classify the image.
  • the process of step four is as follows:
  • Step 4.1 Send the pre-processed test data to the trained classification network model to extract non-motor vehicle image features, which corresponds to steps S100 and S200;
  • Step 4.2 Send the extracted non-motor vehicle image features to the softmax layer, output the probability that N features belong to each category in a specific attribute, take the category with the highest probability of each attribute as the classification result of the attribute, and identify N groups
  • the classification results of are combined into a final output category list, which corresponds to steps S300 and S400.
  • the classification results of non-motor vehicle images under multiple attributes can be obtained at one time, and there is no need to separately build a classification model for each attribute. Fast multi-label classification of motor vehicle images.
  • An embodiment of the present application also provides a non-motor vehicle image multi-label classification system, which is applied to the non-motor vehicle image multi-label classification method.
  • the system includes:
  • the image input module M100 is used to input the tested non-motor vehicle image into the trained classification network model, and the classification network model includes a feature extraction layer and a plurality of classification units corresponding to the attributes in one-to-one correspondence;
  • the feature extraction module M200 is used to extract the features in the test image using the feature extraction layer of the classification network model
  • the image classification module M300 is used to calculate the classification results of each attribute according to the extracted features by using multiple classification units of the classification network model;
  • the result output module M400 is used to combine the classification results of various attributes as a label of the tested non-motor vehicle image.
  • the present application combines the deep learning feature extraction layer in the feature extraction module M200 and multiple classification units in the image classification module M300, and adopts a classification network model to realize non-motor vehicle multi-attribute classification, which is convenient for training and has high classification accuracy.
  • An embodiment of the present application further provides a non-motor vehicle image multi-label classification device, including a processor; a memory, in which executable instructions of the processor are stored; wherein, the processor is configured to execute the executable instructions To perform the steps of the multi-label classification method for non-motor vehicle images.
  • the electronic device 600 according to this embodiment of the present application is described below with reference to FIG. 5.
  • the electronic device 600 shown in FIG. 5 is only an example, and should not bring any limitation to the functions and use scope of the embodiments of the present application.
  • the electronic device 600 is represented in the form of a general-purpose computing device.
  • Components of the electronic device 600 may include, but are not limited to: at least one processing unit 610, at least one storage unit 620, a bus 630 connecting different system components (including the storage unit 620 and the processing unit 610), a display unit 640, and the like.
  • the storage unit stores a program code
  • the program code can be executed by the processing unit 610, so that the processing unit 610 executes various exemplary according to the present application described in the above-mentioned electronic prescription circulation processing method section of this specification Implementation steps.
  • the processing unit 610 may perform the steps shown in FIG.
  • the storage unit 620 may include a readable medium in the form of a volatile storage unit, such as a random access storage unit (RAM) 6201 and / or a cache storage unit 6202, and may further include a read-only storage unit (ROM) 6203.
  • RAM random access storage unit
  • ROM read-only storage unit
  • the storage unit 620 may further include a program / utility tool 6204 having a set of (at least one) program modules 6205.
  • program modules 6205 include but are not limited to: an operating system, one or more application programs, other program modules, and programs Data, each of these examples or some combination may include the implementation of the network environment.
  • the bus 630 may be one or more of several types of bus structures, including a storage unit bus or a storage unit controller, a peripheral bus, a graphics acceleration port, a processing unit, or a local area using any of a variety of bus structures bus.
  • the electronic device 600 may also communicate with one or more external devices 700 (eg, keyboard, pointing device, Bluetooth device, etc.), and may also communicate with one or more devices that enable a user to interact with the electronic device 600, and / or This enables the electronic device 600 to communicate with any device (eg, router, modem, etc.) that communicates with one or more other computing devices. Such communication may be performed through an input / output (I / O) interface 650.
  • the electronic device 600 can also communicate with one or more networks (such as a local area network (LAN), a wide area network (WAN), and / or a public network, such as the Internet) through a network adapter 660.
  • the network adapter 660 can communicate with other modules of the electronic device 600 through the bus 630.
  • An embodiment of the present application further provides a computer-readable storage medium for storing a program, which, when executed, implements the steps of the multi-label classification method for non-motor vehicle images.
  • various aspects of the present application may also be implemented in the form of a program product, which includes program code, and when the program product runs on a terminal device, the program code is used to enable the The terminal device executes the steps according to various exemplary embodiments of the present application described in the above-mentioned electronic prescription circulation processing method section of this specification.
  • a program product 800 for implementing the above method according to an embodiment of the present application may use a portable compact disk read-only memory (CD-ROM) and include program codes, and may be used in a terminal device. For example running on a personal computer.
  • CD-ROM compact disk read-only memory
  • the program product of the present application is not limited to this.
  • the readable storage medium may be any tangible medium containing or storing a program, which may be used by or in combination with an instruction execution system, apparatus, or device.
  • the program product may employ any combination of one or more readable media.
  • the readable medium may be a readable signal medium or a readable storage medium.
  • the readable storage medium may be, for example but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination of the above. More specific examples of readable storage media (non-exhaustive list) include: electrical connections with one or more wires, portable disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the foregoing.
  • the computer-readable storage medium may include a data signal propagated in baseband or as part of a carrier wave, in which readable program code is carried. This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the above.
  • the readable storage medium may also be any readable medium other than the readable storage medium, and the readable medium may send, propagate, or transmit a program for use by or in combination with the instruction execution system, apparatus, or device.
  • the program code contained on the readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wired, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
  • the program code for performing the operation of the present application can be written in any combination of one or more programming languages, which include object-oriented programming languages such as Java, C ++, etc., and also include the conventional procedural formula Programming language-such as "C" language or similar programming language.
  • the program code may be executed entirely on the user's computing device, partly on the user's device, as an independent software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server To execute.
  • the remote computing device may be connected to the user computing device through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computing device (for example, using Internet service provision Business to connect via the Internet).
  • LAN local area network
  • WAN wide area network
  • Internet service provision Business for example, using Internet service provision Business to connect via the Internet.
  • the non-motor vehicle image multi-label classification method, system, device and storage medium provided by the present application have the following advantages:
  • This application uses deep learning to extract features and combine multiple classification units.
  • Using a classification network model can achieve multi-attribute classification of non-motor vehicles. It is convenient to train and has high classification accuracy, thus solving the problem of using multiple image classifications in the prior art. The steps of the model are cumbersome and inefficient.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

本申请提供了一种非机动车图像多标签分类方法、系统、设备及存储介质,所述非机动车图像的标签包括多个属性的分类结果,所述方法包括如下步骤:将测试的非机动车图像输入训练好的分类网络模型中,所述分类网络模型包括特征提取层和与所述属性一一对应的多个分类单元;所述分类网络模型的特征提取层提取测试图像中的特征;所述分类网络模型的多个分类单元分别根据提取的特征计算各个属性的分类结果;将各个属性的分类结果合并,作为测试的非机动车图像的标签。本申请采用了一个分类网络模型即可以实现非机动车多属性分类,训练方便,分类精度高。

Description

非机动车图像多标签分类方法、系统、设备及存储介质 技术领域
本申请涉及图像处理技术领域,尤其涉及一种非机动车图像多标签分类方法、系统、设备及存储介质。
背景技术
卷积神经网络作为深度学习的代表方法,能够自动学习图像特征提取,在大部分的计算机视觉任务中都有很好的效果。但在某些领域的应用中,如非机动车图像领域,往往需要获得目标图像的多个属性下类别标签的输出。常规做法可以使用多个多分类的网络来完成多个类别标签的图像识别,但这种方法会导致算法耗时随着应用中分类属性的规模而线性增加,效率很低。
发明内容
针对现有技术中的问题,本申请的目的在于提供一种非机动车图像多标签分类方法、系统、设备及存储介质,采用一个分类网络模型即可以实现非机动车多属性分类,训练方便,分类精度高。
本申请实施例提供一种非机动车图像多标签分类方法,所述非机动车图像的标签包括多个属性的分类结果,所述方法包括如下步骤:
将测试的非机动车图像输入训练好的分类网络模型中,所述分类网络模型包括特征提取层和与所述属性一一对应的多个分类单元;
所述分类网络模型的特征提取层提取测试的非机动车图像中的特征;
所述分类网络模型的多个分类单元分别根据提取的特征计算各个属性的分类结果;
将各个属性的分类结果合并,作为测试的非机动车图像的标签。
可选地,所述分类网络模型的多个分类单元分别根据提取的特征计算各个属性的分类结果,包括如下步骤:
所述分类网络模型的多个分类单元分别计算所述测试的非机动车图像属于所对应的属性中各个类别的概率,选择概率最大的类别作为所对应的属性的分类结果。
可选地,所述特征提取层包括至少一卷积层和至少一池化层,所述分类单元为softmax层。
可选地,所述特征提取层和分类单元之间还设置有第一全连接层和多个分支全连接层,所述多个分支全连接层与所述分类单元一一对应,所述特征提取层的输出通过所述第一全连接层后连接至所述多个分支全连接层,各个所述分支全连接层的输出连接至所对应的分类单元。
可选地,所述第一全连接层的输出通过一dropout层输入一第二全连接层,所述第二全连接层通过一dropout层连接至所述多个分支全连接层。
可选地,所述分类网络模型采用如下步骤进行训练:
构建包括特征提取层和多个分类单元的分类网络模型,所述分类单元与非机动车图像的属性一一对应;
获取训练集,所述训练集包括训练图像数据和与各个训练图像所对应的标签数据,所述标签数据包括图像路径和训练图像在各个属性中的类别;
将所述训练集输入所述分类网络模型进行迭代训练,将各个分类单元的损失值加权求和作为分类网络的损失,迭代训练至模型收敛;
保存训练完成的分类网络模型。
可选地,所述构建包括特征提取层和多个分类单元的分类网络模型之后,还包括如下步骤:
获取ImageNet公开数据集上训练好的权值文件对构建的分类网络模型的特征提取层和第一全连接层进行初始化;
对构建的分类网络模型的多个分支全连接层和多个分类单元进行随机初始化。
可选地,所述将所述训练集输入所述分类网络模型进行迭代训练,包括如下步骤:
设定训练所述分类网络模型的批尺寸、初始学习率和最大迭代次数;
采用所述训练集迭代训练所述分类网络模型,每迭代训练i次后学习 率乘以k值,作为后续迭代训练的学习率,其中,i为预设调整学习率的周期次数,k为预设的学习率调整系数,且k<1;
训练到达最大迭代次数后,判断所述分类网络模型的损失值是否小于预设阈值;
如果是,则迭代训练完成;
否则,继续采用所述训练集迭代训练所述分类网络模型,至所述分类网络模型的损失值小于预设阈值。
本申请实施例还提供一种非机动车图像多标签分类系统,应用于所述的非机动车图像多标签分类方法,所述系统包括:
图像输入模块,用于将测试的非机动车图像输入训练好的分类网络模型中,所述分类网络模型包括特征提取层和与所述属性一一对应的多个分类单元;
特征提取模块,用于采用所述分类网络模型的特征提取层提取测试的非机动车图像中的特征;
图像分类模块,用于采用所述分类网络模型的多个分类单元分别根据提取的特征计算各个属性的分类结果;
结果输出模块,用于将各个属性的分类结果合并,作为所述测试的非机动车图像的标签。
本申请实施例还提供一种非机动车图像多标签分类设备,包括:
处理器;
存储器,其中存储有所述处理器的可执行指令;
其中,所述处理器配置为经由执行所述可执行指令来执行所述的非机动车图像多标签分类方法的步骤。
本申请实施例还提供一种计算机可读存储介质,用于存储程序,所述程序被执行时实现所述的非机动车图像多标签分类方法的步骤。
本申请所提供的非机动车图像多标签分类方法、系统、设备及存储介质具有下列优点:
本申请通过深度学习提取特征和多个分类单元结合,采用一个分类网 络模型即可以实现非机动车多属性分类,训练方便,分类精度高,从而解决了现有技术中多个采用多个图像分类模型的步骤繁琐、效率低下的问题。
附图说明
通过阅读参照以下附图对非限制性实施例所作的详细描述,本申请的其它特征、目的和优点将会变得更明显。
图1是本申请一实施例的非机动车图像多标签分类方法的流程图;
图2是本申请一实施例的训练集的非机动车图像的标注样式示例图;
图3是本申请一实施例的分类网络模型的结构示意图;
图4是本申请一实施例的非机动车图像多标签分类系统的结构示意图;
图5是本申请一实施例的非机动车图像多标签分类设备的结构示意图;
图6是本申请一实施例的计算机存储介质的结构示意图。
具体实施方式
现在将参考附图更全面地描述示例实施方式。然而,示例实施方式能够以多种形式实施,且不应被理解为限于在此阐述的实施方式;相反,提供这些实施方式使得本申请将全面和完整,并将示例实施方式的构思全面地传达给本领域的技术人员。在图中相同的附图标记表示相同或类似的结构,因而将省略对它们的重复描述。
如图1所示,本申请实施例提供一种非机动车图像多标签分类方法,所述非机动车图像的标签包括多个属性的分类结果,所述方法包括如下步骤:
S100:将测试的非机动车图像输入训练好的分类网络模型中,所述分类网络模型包括特征提取层和与所述属性一一对应的多个分类单元;
S200:所述分类网络模型的特征提取层提取测试的非机动车图像中的特征;
S300:所述分类网络模型的多个分类单元分别根据提取的特征计算各 个属性的分类结果;
S400:将各个属性的分类结果合并,作为测试的非机动车图像的标签。
各个属性可以是预设的彼此间不存在互斥关系的属性,例如多个属性可以包括动力类型、适用性别、车辆用途、是否有遮阳伞等等,对应地,在动力类型属性下面,非机动车可以分为电动车、摩托车、自行车等,适用性别可以分为男性车、女性车,车辆用途可以分为外卖车、快递车、非工作用车等,是否有遮阳伞可以分为有遮阳伞和无遮阳伞,一辆非机动车的标签可以是多个属性的分类结果的组合,例如一辆非机动车的标签可以是电动车、女性车、非工作用车、有遮阳伞。
因此,本申请通过深度学习提取特征和多个分类单元结合,采用一个分类网络模型即可以实现非机动车多属性分类,训练方便,分类精度高。
在该实施例中,所述分类网络模型的多个分类单元分别根据提取的特征计算各个属性的分类结果,包括如下步骤:
所述分类网络模型的多个分类单元分别计算所述测试的非机动车图像属于所对应的属性中各个类别的概率,选择概率最大的类别作为所对应的属性的分类结果。其中,分类单元可以采用softmax层,训练好的softmax层中输入由特征提取层提取的特征向量,输出的结果为一个T*1的向量,T的值对应该属性下类别的个数,而T*1的向量中每一个数值表示的是特征属于该属性下一个类别的概率,选择概率最大的类别作为该分类单元的分类结果,该选择的类别即为测试的非机动车图像在该属性下最可能属于的类别,从而可以准确得到测试的非机动车图像准确性最高的分类结果。
在该实施例中,所述特征提取层包括至少一卷积层和至少一池化层,所述分类单元为softmax层。所述特征提取层和分类单元之间还设置有第一全连接层和多个分支全连接层,所述多个分支全连接层与所述分类单元一一对应,所述特征提取层的输出通过所述第一全连接层后连接至所述多个分支全连接层,各个所述分支全连接层的输出连接至所对应的分类单元。进一步地,所述第一全连接层的输出可以通过一dropout层输入一第二全连接层,所述第二全连接层通过一dropout层连接至所述多个分支全连接层。卷积神经网络中每层卷积层由若干卷积单元组成,每个卷积单元的参数都是通过反向传播算法最佳化得到的。卷积运算的目的是提取输入 的不同特征,第一层卷积层可能只能提取一些低级的特征如边缘、线条和角等层级,更多层的网路能从低级特征中迭代提取更复杂的特征。池化层也叫取样层,紧跟在卷积层之后,同样由多个特征面组成,它的每一个特征面对应于其上一层的一个特征面,不会改变特征面的个数。池化层旨在通过降低特征面的分辨率来获得具有空间不变性的特征。池化层起到二次提取特征的作用,它的每个神经元对局部接受域进行池化操作。常用的池化方法有最大池化即取局部接受域中值最大的点、均值池化即对局部接受域中的所有值求均值、随机池化等,本实例主要采用最大池化方法。
因此,卷积层和池化层的合理配合可以更好地提取测试的非机动车图像的特征,以进一步提高分类单元进行分类的准确性。下面在图2和图3中示出的具体实例中将进一步介绍各个层的具体应用。
在该实施例中,所述分类网络模型采用如下步骤进行训练:
构建包括特征提取层和多个分类单元的分类网络模型,所述分类单元与非机动车图像的属性一一对应;
获取训练集,所述训练集包括训练图像数据和与各个训练图像所对应的标签数据,所述标签数据包括图像路径和训练图像在各个属性中的类别;训练图像数据可以是实现采集好的已经进行了分类并且知道分类结果的非机动车图像,根据已知的分类结果为各个训练的非机动车图像增加标签,并将训练的非机动车图像和标签一起加入训练集中,以对分类网络模型进行训练;
将所述训练集输入所述分类网络模型进行迭代训练,将各个分类单元的损失值加权求和作为分类网络的损失,迭代训练至模型收敛;由于损失是考虑到各个分类单元的损失值的,在迭代过程中,总损失值的不断减小,也就同时优化提高了各个属性对应的分类单元的识别准确率;
保存训练完成的分类网络模型,收敛后的分类网络模型即可以用于本申请的测试的非机动车图像的分类识别,并且在使用中可以选择识别准确的非机动车图像新加入到训练集中,不断丰富训练集,并且定期重新训练分类网络模型,不断优化分类网络模型的识别效果,以在使用过程中不断提高模型识别的准确率。
在该实施例中,所述构建包括特征提取层和多个分类单元的分类网络 模型之后,还包括如下步骤:
获取ImageNet公开数据集上训练好的权值文件对构建的分类网络模型的特征提取层和第一全连接层进行初始化;由于ImageNet已经有了比较成熟的卷积层、池化层和全连接层的权值文件,该实施例可以直接将其用来对本申请的分类网络模型进行初始化,可以大大节省分类网络模型训练的时间;
对构建的分类网络模型的多个分支全连接层和多个分类单元进行随机初始化,随机初始化可以采用正态分布对权值进行初始化,但本申请不限于此。
在该实施例中,所述将所述训练集输入所述分类网络模型进行迭代训练,包括如下步骤:
设定训练所述分类网络模型的批尺寸、初始学习率和最大迭代次数;
采用所述训练集迭代训练所述分类网络模型,每迭代训练i次后学习率乘以k值,作为后续迭代训练的学习率,其中,i为预设调整学习率的周期次数,k为预设的学习率调整系数,且k<1;
在训练模型的过程中,为了平衡模型的训练速度和损失而选择了相对合适的学习率,但是训练集的损失可能下降到一定程度就不再下降,此时通过适当降低学习率可以进一步减小损失,但是学习率的下降会延长训练所需的时间。因此,该实施例采用学习率逐步衰减的方法,在训练时间和降低损失之间寻找新的平衡,并且通过设定最大迭代次数适当控制训练时间;
训练到达最大迭代次数后,判断所述分类网络模型的损失值是否小于预设阈值;
如果是,则说明分类网络模型已经达到收敛,迭代训练完成;
否则,继续采用所述训练集迭代训练所述分类网络模型,至所述分类网络模型的损失值小于预设阈值,即训练至分类网络模型收敛为止。
下面结合图2和图3,以一个具体实例来进一步介绍本申请的非机动车图像多标签分类方法。在该具体实例中,所述非机动车图像多标签分类包括如下步骤:
步骤一:非机动车多标签数据集整理,步骤一即对应于上述的分类网 络模型的训练过程中获取训练集的步骤;具体步骤一的过程如下:
步骤1.1:从现实场景中获得大量训练的非机动车图像,为图像编号;
步骤1.2:对图像的标注信息进行设计,按多个属性进行划分,分别用一位数字的标注位来表示,每组属性包含多种属性,分别用1~N的数字来表示。标注格式为:[图像路径][特征1的属性类别代码][特征2的属性类别代码]...,例如图2所示,包含非机动车图像的五种属性标注信息的标注方法;
步骤1.3:用以上所述标注方法对所有非机动车图像进行标注,对标签信息和图像信息分别整理成对应的数据集。数据集均为LMDB格式。
步骤二:非机动车多标签分类网络模型构建,构建的分类网络模型如图3所示,步骤二中对于上述实施例中采用的五个卷积层、三个池化层和全连接层的构建方式进行了具体介绍,具体模型的构建过程如下:
步骤2.1:训练时网络由输入层、五个卷积层、三个池化层、2+N个全连接层、N个softmax层组成,其中N的值为所确定的非机动车属性数。步骤2.2:data输入层后接第一卷积层和Relu激活函数,并在Relu激活函数后接第一批规范化处理层,在此后面接第一个池化层用最大值池化;
步骤2.3:池化后接第二卷积层,第二卷积后加Relu激活函数,并在激活函数后接第二批规范化处理层,规范化处理后接第二池化层,第二池化层采用最大值池化;
在深度神经网络中,通常使用一种叫修正线性单元(Rectified linear unit,Relu)作为神经元的激活函数。通过ReLU实现稀疏后的模型能够更好地挖掘相关特征,拟合训练数据。
步骤2.4:第二池化层后连续接三个卷积层,排列顺序为第三卷积层、Relu激活函数、第四卷积层、Relu激活函数、第五卷积层、Relu激活函数;
步骤2.5:在上一步之后接第三池化层,池化后接第一全连接层,并在第一全连接层后接Relu激活函数和dropout层,然后接第二全连接层,第一全连接层和第二全连接层设置相同;全连接层的每一个结点都与上一层的所有结点相连,用来把前边提取到的特征综合起来。
步骤2.6:在第二全连接层后接N个分支全连接层,分别为分支全连 接层1、分支全连接层2……分支全连接层n,每个分支全连接层后再接一个对应的softmax层,作为对应的分类单元,分类单元的输出作为最终的每类图像特征的属性类别输出,分别为分类单元1、分类单元2……分类单元n。
在本申请中,经多个卷积层和池化层后,连接着2个全连接层以及N个分支全连接层。全连接层中的每个神经元与其前一层的所有神经元进行全连接。全连接层可以整合卷积层或者池化层中具有类别区分性的局部信息。为了提升卷积神经网络的网络性能,全连接层每个神经元的激励函数采用ReLU函数。各个分支全连接层的输出值被传递给一个softmax层进行分类。Softmax层的算法可以理解为归一化,如目前图片分类有x种,那经过softmax层的输出就是一个x维的向量。向量中的第一个值就是当前图片属于第一类的概率值,向量中的第二个值就是当前图片属于第二类的概率值…这x维的向量之和为1。
步骤三:非机动车多标签分类网络训练,步骤三对应于上述的分类网络模型的训练中将所述训练集输入所述分类网络模型进行迭代训练的步骤,训练过程中将各个分类单元的损失值加权求和作为分类网络的损失,迭代训练至模型收敛,具体步骤三的过程如下:
步骤3.1:根据步骤一中整理的LMDB数据集,先计算训练数据集的均值文件,保存为.binaryproto文件的格式,并在训练网络中指定二进制均值文件的位置;
步骤3.2:采用finetune的训练方式,使用在ImageNet公开数据集上训练好的权值文件对当前网络进行部分层权值进行初始化,并对其他层进行随机初始化的方式,如上所述,在该实例中,使用在ImageNet公开数据集上训练好的权值文件对卷积层、池化层、第一全连接层和第二全连接层初始化,而对分支全连接层和分类单元进行随机初始化,随机初始化可以采用正态分布对权值进行初始化,但本申请不限于此;
步骤3.3:在设定训练所述分类网络模型的批尺寸、初始学习率和最大迭代次数时,设定批尺寸batch_size为32,初始学习率为0.001,最大迭代次数为200000次,采用step的方式在训练过程中进行学习率修改,每迭代1000次后学习率乘以0.9,采用随机梯度下降算法训练数据,设定 每迭代10000次保存一次网络模型;随机梯度下降算法可以加快更新各个层的权值文件和偏置数据,这里的随机是指每次迭代过程中,样本都要被随机打乱,打乱可以有效减小样本之间造成的参数更新抵消问题。在最基本的随机梯度下降算法中,参数每一步通过减去它的梯度来更新,对于大规模的机器学习任务,随机梯度下降算法表现的性能十分可观。
训练时,向初始化的网络中输入训练样本和标签,计算各输入softmax层的损失值加权和作为最终的损失,每个softmax层的权重为1/N,但本申请不以此为限,具体权重值的分配可以根据需要进行设置。通过不断的前向传播和反向传播两个步骤,反复训练使得在训练过程中损失不断降低,直到达到最大的迭代次数;
每次训练完成之后,将保存的网络模型作为下一次迭代训练的预训练模型,避免在训练过程中出现模型数据的丢失,继续训练,训练达到最大的迭代次数之后,判断损失是否达到预设阈值以下,如果是,则结束训练,否则继续训练直到损失达到预设阈值以下,训练收敛则结束训练。
步骤四:非机动车多标签图像分类,步骤一和步骤三都是非机动车多标签图像的分类网络模型的准备过程,而步骤四即对应于上述步骤S100~步骤S400,即采用训练好的分类网络模型对图像进行分类的步骤,具体地,步骤四的过程如下:
步骤4.1:将预处理好的测试数据送入训练好的分类网络模型,提取非机动车图像特征,即对应于步骤S100和S200;
步骤4.2:将提取到的非机动车图像特征送入softmax层,输出N个特征属于特定属性中各个类别的概率,取每个属性中最大概率的类别作为该属性的分类结果,将N组识别的分类结果合并为最终的输出类别列表,即对应于步骤S300和S400。
由于本申请采用的分类网络模型的结构,可以一次性得到非机动车图像在多个属性下的分类结果,并且无需针对每个属性单独构建分类模型,因此经过上述的步骤操作,即可实现非机动车图像的快速多标签分类。
本申请实施例还提供一种非机动车图像多标签分类系统,应用于所述的非机动车图像多标签分类方法,所述系统包括:
图像输入模块M100,用于将测试的非机动车图像输入训练好的分类 网络模型中,所述分类网络模型包括特征提取层和与所述属性一一对应的多个分类单元;
特征提取模块M200,用于采用所述分类网络模型的特征提取层提取测试图像中的特征;
图像分类模块M300,用于采用所述分类网络模型的多个分类单元分别根据提取的特征计算各个属性的分类结果;
结果输出模块M400,用于将各个属性的分类结果合并,作为所述测试的非机动车图像的标签。
因此,本申请通过特征提取模块M200中深度学习的特征提取层和图像分类模块M300中多个分类单元结合,采用一个分类网络模型即可以实现非机动车多属性分类,训练方便,分类精度高。
本申请实施例还提供一种非机动车图像多标签分类设备,包括处理器;存储器,其中存储有所述处理器的可执行指令;其中,所述处理器配置为经由执行所述可执行指令来执行所述的非机动车图像多标签分类方法的步骤。
所属技术领域的技术人员能够理解,本申请的各个方面可以实现为系统、方法或程序产品。因此,本申请的各个方面可以具体实现为以下形式,即:完全的硬件实施方式、完全的软件实施方式(包括固件、微代码等),或硬件和软件方面结合的实施方式,这里可以统称为“电路”、“模块”或“系统”。
下面参照图5来描述根据本申请的这种实施方式的电子设备600。图5显示的电子设备600仅仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。
如图5所示,电子设备600以通用计算设备的形式表现。电子设备600的组件可以包括但不限于:至少一个处理单元610、至少一个存储单元620、连接不同系统组件(包括存储单元620和处理单元610)的总线630、显示单元640等。
其中,所述存储单元存储有程序代码,所述程序代码可以被所述处理单元610执行,使得所述处理单元610执行本说明书上述电子处方流转处理方法部分中描述的根据本申请各种示例性实施方式的步骤。例如,所述 处理单元610可以执行如图1中所示的步骤。
所述存储单元620可以包括易失性存储单元形式的可读介质,例如随机存取存储单元(RAM)6201和/或高速缓存存储单元6202,还可以进一步包括只读存储单元(ROM)6203。
所述存储单元620还可以包括具有一组(至少一个)程序模块6205的程序/实用工具6204,这样的程序模块6205包括但不限于:操作系统、一个或者多个应用程序、其它程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。
总线630可以为表示几类总线结构中的一种或多种,包括存储单元总线或者存储单元控制器、外围总线、图形加速端口、处理单元或者使用多种总线结构中的任意总线结构的局域总线。
电子设备600也可以与一个或多个外部设备700(例如键盘、指向设备、蓝牙设备等)通信,还可与一个或者多个使得用户能与该电子设备600交互的设备通信,和/或与使得该电子设备600能与一个或多个其它计算设备进行通信的任何设备(例如路由器、调制解调器等等)通信。这种通信可以通过输入/输出(I/O)接口650进行。并且,电子设备600还可以通过网络适配器660与一个或者多个网络(例如局域网(LAN),广域网(WAN)和/或公共网络,例如因特网)通信。网络适配器660可以通过总线630与电子设备600的其它模块通信。应当明白,尽管图中未示出,可以结合电子设备600使用其它硬件和/或软件模块,包括但不限于:微代码、设备驱动器、冗余处理单元、外部磁盘驱动阵列、RAID系统、磁带驱动器以及数据备份存储系统等。
本申请实施例还提供一种计算机可读存储介质,用于存储程序,所述程序被执行时实现所述的非机动车图像多标签分类方法的步骤。在一些可能的实施方式中,本申请的各个方面还可以实现为一种程序产品的形式,其包括程序代码,当所述程序产品在终端设备上运行时,所述程序代码用于使所述终端设备执行本说明书上述电子处方流转处理方法部分中描述的根据本申请各种示例性实施方式的步骤。
参考图6所示,描述了根据本申请的实施方式的用于实现上述方法的程序产品800,其可以采用便携式紧凑盘只读存储器(CD-ROM)并包括 程序代码,并可以在终端设备,例如个人电脑上运行。然而,本申请的程序产品不限于此,在本文件中,可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。
所述程序产品可以采用一个或多个可读介质的任意组合。可读介质可以是可读信号介质或者可读存储介质。可读存储介质例如可以为但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。
所述计算机可读存储介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了可读程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。可读存储介质还可以是可读存储介质以外的任何可读介质,该可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。可读存储介质上包含的程序代码可以用任何适当的介质传输,包括但不限于无线、有线、光缆、RF等等,或者上述的任意合适的组合。
可以以一种或多种程序设计语言的任意组合来编写用于执行本申请操作的程序代码,所述程序设计语言包括面向对象的程序设计语言—诸如Java、C++等,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算设备上执行、部分地在用户设备上执行、作为一个独立的软件包执行、部分在用户计算设备上部分在远程计算设备上执行、或者完全在远程计算设备或服务器上执行。在涉及远程计算设备的情形中,远程计算设备可以通过任意种类的网络,包括局域网(LAN)或广域网(WAN),连接到用户计算设备,或者,可以连接到外部计算设备(例如利用因特网服务提供商来通过因特网连接)。
综上所述,与现有技术相比,本申请所提供的非机动车图像多标签分类方法、系统、设备及存储介质具有下列优点:
本申请通过深度学习提取特征和多个分类单元结合,采用一个分类网络模型即可以实现非机动车多属性分类,训练方便,分类精度高,从而解决了现有技术中多个采用多个图像分类模型的步骤繁琐、效率低下的问题。
以上内容是结合具体的优选实施方式对本申请所作的进一步详细说明,不能认定本申请的具体实施只局限于这些说明。对于本申请所属技术领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干简单推演或替换,都应当视为属于本申请的保护范围。

Claims (11)

  1. 一种非机动车图像多标签分类方法,其特征在于,所述非机动车图像的标签包括多个属性的分类结果,所述方法包括如下步骤:
    将测试的非机动车图像输入训练好的分类网络模型中,所述分类网络模型包括特征提取层和与所述属性一一对应的多个分类单元;
    所述分类网络模型的特征提取层提取测试的非机动车图像中的特征;
    所述分类网络模型的多个分类单元分别根据提取的特征计算各个属性的分类结果;
    将各个属性的分类结果合并,作为测试的非机动车图像的标签。
  2. 根据权利要求1所述的非机动车图像多标签分类方法,其特征在于,所述分类网络模型的多个分类单元分别根据提取的特征计算各个属性的分类结果,包括如下步骤:
    所述分类网络模型的多个分类单元分别计算所述测试的非机动车图像属于所对应的属性中各个类别的概率,选择概率最大的类别作为所对应的属性的分类结果。
  3. 根据权利要求1所述的非机动车图像多标签分类方法,其特征在于,所述特征提取层包括至少一卷积层和至少一池化层,所述分类单元为softmax层。
  4. 根据权利要求1所述的非机动车图像多标签分类方法,其特征在于,所述特征提取层和分类单元之间还设置有第一全连接层和多个分支全连接层,所述多个分支全连接层与所述分类单元一一对应,所述特征提取层的输出通过所述第一全连接层后连接至所述多个分支全连接层,各个所述分支全连接层的输出连接至所对应的分类单元。
  5. 根据权利要求4所述的非机动车图像多标签分类方法,其特征在于,所述第一全连接层的输出通过一dropout层输入一第二全连接层,所述第二全连接层通过一dropout层连接至所述多个分支全连接层。
  6. 根据权利要求4所述的非机动车图像多标签分类方法,其特征在于,所述分类网络模型采用如下步骤进行训练:
    构建包括特征提取层和多个分类单元的分类网络模型,所述分类单元 与非机动车图像的属性一一对应;
    获取训练集,所述训练集包括训练图像数据和与各个训练图像所对应的标签数据,所述标签数据包括图像路径和训练图像在各个属性中的类别;
    将所述训练集输入所述分类网络模型进行迭代训练,将各个分类单元的损失值加权求和作为分类网络的损失,迭代训练至模型收敛;
    保存训练完成的分类网络模型。
  7. 根据权利要求6所述的非机动车图像多标签分类方法,其特征在于,所述构建包括特征提取层和多个分类单元的分类网络模型之后,还包括如下步骤:
    获取ImageNet公开数据集上训练好的权值文件对构建的分类网络模型的特征提取层和第一全连接层进行初始化;
    对构建的分类网络模型的多个分支全连接层和多个分类单元进行随机初始化。
  8. 根据权利要求7所述的非机动车图像多标签分类方法,其特征在于,所述将所述训练集输入所述分类网络模型进行迭代训练,包括如下步骤:
    设定训练所述分类网络模型的批尺寸、初始学习率和最大迭代次数;
    采用所述训练集迭代训练所述分类网络模型,每迭代训练i次后学习率乘以k值,作为后续迭代训练的学习率,其中,i为预设调整学习率的周期次数,k为预设的学习率调整系数,且k<1;
    训练到达最大迭代次数后,判断所述分类网络模型的损失值是否小于预设阈值;
    如果是,则迭代训练完成;
    否则,继续采用所述训练集迭代训练所述分类网络模型,至所述分类网络模型的损失值小于预设阈值。
  9. 一种非机动车图像多标签分类系统,其特征在于,应用于权利要求1至8中任一项所述的非机动车图像多标签分类方法,所述系统包括:
    图像输入模块,用于将测试的非机动车图像输入训练好的分类网络模型中,所述分类网络模型包括特征提取层和与所述属性一一对应的多个分 类单元;
    特征提取模块,用于采用所述分类网络模型的特征提取层提取测试的非机动车图像中的特征;
    图像分类模块,用于采用所述分类网络模型的多个分类单元分别根据提取的特征计算各个属性的分类结果;
    结果输出模块,用于将各个属性的分类结果合并,作为所述测试的非机动车图像的标签。
  10. 一种非机动车图像多标签分类设备,其特征在于,包括:
    处理器;
    存储器,其中存储有所述处理器的可执行指令;
    其中,所述处理器配置为经由执行所述可执行指令来执行权利要求1至8中任一项所述的非机动车图像多标签分类方法的步骤。
  11. 一种计算机可读存储介质,用于存储程序,其特征在于,所述程序被执行时实现权利要求1至8中任一项所述的非机动车图像多标签分类方法的步骤。
PCT/CN2019/111320 2018-10-23 2019-10-15 非机动车图像多标签分类方法、系统、设备及存储介质 WO2020083073A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811240000.3 2018-10-23
CN201811240000.3A CN109325547A (zh) 2018-10-23 2018-10-23 非机动车图像多标签分类方法、系统、设备及存储介质

Publications (1)

Publication Number Publication Date
WO2020083073A1 true WO2020083073A1 (zh) 2020-04-30

Family

ID=65262678

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/111320 WO2020083073A1 (zh) 2018-10-23 2019-10-15 非机动车图像多标签分类方法、系统、设备及存储介质

Country Status (2)

Country Link
CN (1) CN109325547A (zh)
WO (1) WO2020083073A1 (zh)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111582409A (zh) * 2020-06-29 2020-08-25 腾讯科技(深圳)有限公司 图像标签分类网络的训练方法、图像标签分类方法及设备
CN111651668A (zh) * 2020-05-06 2020-09-11 上海晶赞融宣科技有限公司 用户画像的标签生成方法及装置、存储介质、终端
CN111832580A (zh) * 2020-07-22 2020-10-27 西安电子科技大学 结合少样本学习与目标属性特征的sar目标识别方法
CN112001258A (zh) * 2020-07-27 2020-11-27 上海东普信息科技有限公司 物流货车准时到站识别方法、装置、设备和存储介质
CN112070093A (zh) * 2020-09-22 2020-12-11 网易(杭州)网络有限公司 生成图像分类模型的方法、图像分类方法、装置和设备
CN112287751A (zh) * 2020-09-21 2021-01-29 深圳供电局有限公司 励磁涌流识别方法、装置、计算机设备和存储介质
CN112508078A (zh) * 2020-12-02 2021-03-16 携程旅游信息技术(上海)有限公司 图像多任务多标签识别方法、系统、设备及介质
CN112541542A (zh) * 2020-12-11 2021-03-23 第四范式(北京)技术有限公司 多分类样本数据的处理方法、装置及计算机可读存储介质
CN112651438A (zh) * 2020-12-24 2021-04-13 世纪龙信息网络有限责任公司 多类别图像的分类方法、装置、终端设备和存储介质
CN113408482A (zh) * 2021-07-13 2021-09-17 杭州联吉技术有限公司 一种训练样本的生成方法及生成装置
CN113610766A (zh) * 2021-07-12 2021-11-05 北京阅视智能技术有限责任公司 显微图像分析方法、装置、存储介质及电子设备
CN113673583A (zh) * 2021-07-30 2021-11-19 浙江大华技术股份有限公司 一种图像识别方法、识别网络的训练方法及相关装置
CN113688840A (zh) * 2020-05-19 2021-11-23 武汉Tcl集团工业研究院有限公司 图像处理模型的生成方法、处理方法、存储介质及终端
CN114445884A (zh) * 2022-01-04 2022-05-06 深圳数联天下智能科技有限公司 训练多目标检测模型的方法、检测方法及相关装置
CN114612681A (zh) * 2022-01-30 2022-06-10 西北大学 基于gcn的多标签图像分类方法、模型构建方法及装置
CN114638787A (zh) * 2022-02-23 2022-06-17 青岛海信网络科技股份有限公司 检测非机动车是否挂牌的方法及电子设备
CN116091867A (zh) * 2023-01-12 2023-05-09 北京邮电大学 一种模型训练、图像识别方法、装置、设备及存储介质
CN117496275A (zh) * 2023-12-29 2024-02-02 深圳市软盟技术服务有限公司 基于类增学习的深度图像分类网络训练方法、电子设备及存储介质

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109325547A (zh) * 2018-10-23 2019-02-12 苏州科达科技股份有限公司 非机动车图像多标签分类方法、系统、设备及存储介质
CN109947940B (zh) * 2019-02-15 2023-09-05 平安科技(深圳)有限公司 文本分类方法、装置、终端及存储介质
CN111368931B (zh) * 2020-03-09 2023-11-17 第四范式(北京)技术有限公司 确定图像分类模型的学习率的方法
CN111783574B (zh) * 2020-06-17 2024-02-23 李利明 膳食图像识别方法、装置以及存储介质
CN111898475A (zh) * 2020-07-10 2020-11-06 浙江大华技术股份有限公司 非机动车的状态估计方法及装置、存储介质、电子装置
CN111737521B (zh) * 2020-08-04 2020-11-24 北京微播易科技股份有限公司 一种视频分类方法和装置
CN112115880A (zh) * 2020-09-21 2020-12-22 成都数之联科技有限公司 基于多标签学习的船舶污染监测方法及系统及装置及介质
CN112598076B (zh) * 2020-12-29 2023-09-19 北京易华录信息技术股份有限公司 一种机动车属性识别方法及系统
CN112446439B (zh) * 2021-01-29 2021-04-23 魔视智能科技(上海)有限公司 深度学习模型动态分支选择的推理方法及系统
CN113313079B (zh) * 2021-07-16 2021-11-12 深圳市安软科技股份有限公司 一种车辆属性识别模型的训练方法、系统及相关设备
CN114429638B (zh) * 2022-04-06 2022-07-08 四川省大数据中心 一种施工图审查管理系统

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106203330A (zh) * 2016-07-08 2016-12-07 西安理工大学 一种基于卷积神经网络的车辆分类方法
CN108256498A (zh) * 2018-02-01 2018-07-06 上海海事大学 一种基于EdgeBoxes和FastR-CNN的非机动车辆目标检测方法
CN109325547A (zh) * 2018-10-23 2019-02-12 苏州科达科技股份有限公司 非机动车图像多标签分类方法、系统、设备及存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105654066A (zh) * 2016-02-02 2016-06-08 北京格灵深瞳信息技术有限公司 一种车辆识别方法及装置
US10776664B2 (en) * 2016-03-15 2020-09-15 Imra Europe S.A.S. Method for classification of unique/rare cases by reinforcement learning in neural networks
CN107330396B (zh) * 2017-06-28 2020-05-19 华中科技大学 一种基于多属性和多策略融合学习的行人再识别方法
CN107886073B (zh) * 2017-11-10 2021-07-27 重庆邮电大学 一种基于卷积神经网络的细粒度车辆多属性识别方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106203330A (zh) * 2016-07-08 2016-12-07 西安理工大学 一种基于卷积神经网络的车辆分类方法
CN108256498A (zh) * 2018-02-01 2018-07-06 上海海事大学 一种基于EdgeBoxes和FastR-CNN的非机动车辆目标检测方法
CN109325547A (zh) * 2018-10-23 2019-02-12 苏州科达科技股份有限公司 非机动车图像多标签分类方法、系统、设备及存储介质

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111651668A (zh) * 2020-05-06 2020-09-11 上海晶赞融宣科技有限公司 用户画像的标签生成方法及装置、存储介质、终端
CN111651668B (zh) * 2020-05-06 2023-06-09 上海晶赞融宣科技有限公司 用户画像的标签生成方法及装置、存储介质、终端
CN113688840A (zh) * 2020-05-19 2021-11-23 武汉Tcl集团工业研究院有限公司 图像处理模型的生成方法、处理方法、存储介质及终端
CN111582409B (zh) * 2020-06-29 2023-12-26 腾讯科技(深圳)有限公司 图像标签分类网络的训练方法、图像标签分类方法及设备
CN111582409A (zh) * 2020-06-29 2020-08-25 腾讯科技(深圳)有限公司 图像标签分类网络的训练方法、图像标签分类方法及设备
CN111832580A (zh) * 2020-07-22 2020-10-27 西安电子科技大学 结合少样本学习与目标属性特征的sar目标识别方法
CN111832580B (zh) * 2020-07-22 2023-07-28 西安电子科技大学 结合少样本学习与目标属性特征的sar目标识别方法
CN112001258A (zh) * 2020-07-27 2020-11-27 上海东普信息科技有限公司 物流货车准时到站识别方法、装置、设备和存储介质
CN112001258B (zh) * 2020-07-27 2023-07-11 上海东普信息科技有限公司 物流货车准时到站识别方法、装置、设备和存储介质
CN112287751B (zh) * 2020-09-21 2024-05-07 深圳供电局有限公司 励磁涌流识别方法、装置、计算机设备和存储介质
CN112287751A (zh) * 2020-09-21 2021-01-29 深圳供电局有限公司 励磁涌流识别方法、装置、计算机设备和存储介质
CN112070093A (zh) * 2020-09-22 2020-12-11 网易(杭州)网络有限公司 生成图像分类模型的方法、图像分类方法、装置和设备
CN112508078A (zh) * 2020-12-02 2021-03-16 携程旅游信息技术(上海)有限公司 图像多任务多标签识别方法、系统、设备及介质
CN112541542A (zh) * 2020-12-11 2021-03-23 第四范式(北京)技术有限公司 多分类样本数据的处理方法、装置及计算机可读存储介质
CN112541542B (zh) * 2020-12-11 2023-09-29 第四范式(北京)技术有限公司 多分类样本数据的处理方法、装置及计算机可读存储介质
CN112651438A (zh) * 2020-12-24 2021-04-13 世纪龙信息网络有限责任公司 多类别图像的分类方法、装置、终端设备和存储介质
CN113610766A (zh) * 2021-07-12 2021-11-05 北京阅视智能技术有限责任公司 显微图像分析方法、装置、存储介质及电子设备
CN113408482A (zh) * 2021-07-13 2021-09-17 杭州联吉技术有限公司 一种训练样本的生成方法及生成装置
CN113408482B (zh) * 2021-07-13 2023-10-10 杭州联吉技术有限公司 一种训练样本的生成方法及生成装置
CN113673583A (zh) * 2021-07-30 2021-11-19 浙江大华技术股份有限公司 一种图像识别方法、识别网络的训练方法及相关装置
CN114445884A (zh) * 2022-01-04 2022-05-06 深圳数联天下智能科技有限公司 训练多目标检测模型的方法、检测方法及相关装置
CN114445884B (zh) * 2022-01-04 2024-04-30 深圳数联天下智能科技有限公司 训练多目标检测模型的方法、检测方法及相关装置
CN114612681A (zh) * 2022-01-30 2022-06-10 西北大学 基于gcn的多标签图像分类方法、模型构建方法及装置
CN114638787B (zh) * 2022-02-23 2024-03-22 青岛海信网络科技股份有限公司 检测非机动车是否挂牌的方法及电子设备
CN114638787A (zh) * 2022-02-23 2022-06-17 青岛海信网络科技股份有限公司 检测非机动车是否挂牌的方法及电子设备
CN116091867B (zh) * 2023-01-12 2023-09-29 北京邮电大学 一种模型训练、图像识别方法、装置、设备及存储介质
CN116091867A (zh) * 2023-01-12 2023-05-09 北京邮电大学 一种模型训练、图像识别方法、装置、设备及存储介质
CN117496275A (zh) * 2023-12-29 2024-02-02 深圳市软盟技术服务有限公司 基于类增学习的深度图像分类网络训练方法、电子设备及存储介质
CN117496275B (zh) * 2023-12-29 2024-04-02 深圳市软盟技术服务有限公司 基于类增学习的深度图像分类网络训练方法、电子设备及存储介质

Also Published As

Publication number Publication date
CN109325547A (zh) 2019-02-12

Similar Documents

Publication Publication Date Title
WO2020083073A1 (zh) 非机动车图像多标签分类方法、系统、设备及存储介质
US11537884B2 (en) Machine learning model training method and device, and expression image classification method and device
CN112163465B (zh) 细粒度图像分类方法、系统、计算机设备及存储介质
CN111583284B (zh) 一种基于混合模型的小样本图像语义分割方法
US20210342643A1 (en) Method, apparatus, and electronic device for training place recognition model
Zhang et al. Unsupervised and semi-supervised image classification with weak semantic consistency
CN109783666B (zh) 一种基于迭代精细化的图像场景图谱生成方法
CN109284406B (zh) 基于差异循环神经网络的意图识别方法
CN109063719B (zh) 一种联合结构相似性和类信息的图像分类方法
WO2016062044A1 (zh) 一种模型参数训练方法、装置及系统
CN110297888B (zh) 一种基于前缀树与循环神经网络的领域分类方法
WO2022052445A1 (zh) 基于深度学习的图像增强方法、系统、设备及存储介质
CN113343974B (zh) 考虑模态间语义距离度量的多模态融合分类优化方法
CN105631416A (zh) 采用新型密度聚类进行人脸识别的方法
CN111428557A (zh) 基于神经网络模型的手写签名的自动校验的方法和装置
WO2021042857A1 (zh) 图像分割模型的处理方法和处理装置
CN112199532A (zh) 一种基于哈希编码和图注意力机制的零样本图像检索方法及装置
CN111008224A (zh) 一种基于深度多任务表示学习的时间序列分类和检索方法
CN114282059A (zh) 视频检索的方法、装置、设备及存储介质
CN112418320A (zh) 一种企业关联关系识别方法、装置及存储介质
CN112529068A (zh) 一种多视图图像分类方法、系统、计算机设备和存储介质
Miao et al. Evolving convolutional neural networks by symbiotic organisms search algorithm for image classification
CN110111365B (zh) 基于深度学习的训练方法和装置以及目标跟踪方法和装置
CN111046655A (zh) 一种数据处理方法、装置及计算机可读存储介质
CN111126501B (zh) 一种图像识别方法、终端设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19877341

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19877341

Country of ref document: EP

Kind code of ref document: A1