CN114092920A

CN114092920A - Model training method, image classification method, device and storage medium

Info

Publication number: CN114092920A
Application number: CN202210051986.XA
Authority: CN
Inventors: 李德辉
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-01-18
Filing date: 2022-01-18
Publication date: 2022-02-25
Anticipated expiration: 2042-01-18
Also published as: CN114092920B

Abstract

The application discloses a model training method which can be applied to the fields of maps, navigation, Internet of vehicles, vehicle road collaboration, intelligent driving service, instant messaging and the like. The application scenario at least comprises various terminals, such as: mobile phones, computers, vehicle-mounted terminals, and the like. The application includes: acquiring the size characteristic of each first image sample in an original sample set; clustering the original sample set to obtain N clustering results and a clustering center of each clustering result; carrying out scaling processing on the size of the first image sample in the clustering result according to the clustering center to obtain a target sample set; and updating the model parameters of the image classification model according to the target sample set and the class label of the first image sample. The application also provides an image classification method, an image classification device and a medium. The method and the device can meet the requirement of sample size consistency in batch training, and can avoid serious deformation of the image, so that the accuracy of model training is improved.

Description

Model training method, image classification method, device and storage medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a model training method, an image classification device, and a storage medium.

Background

Artificial Intelligence (AI) covers Computer Vision (CV), and CV technology is also a major research area for AI. The CV technique can be applied to an image classification task, an image recognition task, an image retrieval task and the like, wherein in the image classification task, the classification to which the CV technique belongs can be judged by using trained classification.

Currently, the mainstream method of the image classification task is to use a Convolutional Neural Network (CNN). CNN typically uses a batch training approach, iteratively calculating the loss of each batch of samples and updating network parameters by error back-propagation. The image samples of each batch need to have the same dimensions, and therefore, all image samples need to be adjusted (resize) to a fixed size before training.

The inventor finds that at least the following problem exists in the existing scheme, although the method for resetting the image sample to a fixed size meets the requirement of batch training, when the length-width ratio difference of the image sample is large, the image is easy to generate serious deformation, thereby increasing the difficulty of model training. If a fixed size is used, as in the three image samples shown in fig. 1, either size will result in significant distortion of a portion of the image samples.

Disclosure of Invention

The embodiment of the application provides a model training method, an image classification device and a storage medium. The method and the device can meet the requirement of sample size consistency in batch training, and can avoid serious deformation of the image to a certain extent, so that the accuracy of model training is improved.

In view of the above, an aspect of the present application provides a method for model training, including:

obtaining the size characteristics of each first image sample in an original sample set, wherein the original sample set comprises M first image samples, each first image sample is labeled with a corresponding class label, and M is an integer greater than 1;

clustering the original sample set according to the size characteristics of each first image sample to obtain N clustering results and a clustering center of each clustering result, wherein each clustering result comprises at least one first image sample, N is an integer greater than 1 and less than M;

for each clustering result, carrying out scaling processing on the size of a first image sample in the clustering result according to the clustering center of the clustering result to obtain a target sample set corresponding to the clustering result;

and updating the model parameters of the image classification model according to the target sample set corresponding to each clustering result and the class label of each first image sample in the original sample set.

Another aspect of the present application provides an image classification method, including:

acquiring a target image;

and obtaining a classification result of the target image through an image classification model based on the target image, wherein the image classification model is obtained by training by adopting the model training method provided by the aspects.

Another aspect of the present application provides a model training apparatus, including:

the device comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring the size characteristics of each first image sample in an original sample set, the original sample set comprises M first image samples, each first image sample is labeled with a corresponding class label, and M is an integer greater than 1;

the clustering module is used for clustering the original sample set according to the size characteristics of each first image sample to obtain N clustering results and a clustering center of each clustering result, wherein each clustering result comprises at least one first image sample, N is an integer which is greater than 1 and less than M;

the processing module is used for carrying out scaling processing on the size of the first image sample in the clustering result according to the clustering center of the clustering result aiming at each clustering result to obtain a target sample set corresponding to the clustering result;

and the training module is used for updating the model parameters of the image classification model according to the target sample set corresponding to each clustering result and the class label of each first image sample in the original sample set.

In one possible design, in a first implementation of another aspect of an embodiment of the present application,

an obtaining module, specifically configured to obtain an original sample set;

for each first image sample, acquiring a height value and a width value of the first image sample;

for each first image sample, generating a size feature of the first image sample according to the height value and the width value of the first image sample, wherein the size feature is represented as a two-dimensional vector.

the clustering module is specifically used for randomly generating N initial clustering centers;

calculating the distance between each first image sample and each initial clustering center according to the size characteristics of each image sample and the N initial clustering centers;

dividing an original sample set into N clustering clusters according to the distance between each first image sample and each initial clustering center;

and calculating the clustering center of each clustering cluster in the N clustering clusters until the clustering center convergence condition is met, and determining the N clustering results and the clustering center of each clustering result according to the clustering center of each clustering cluster.

an obtaining module, specifically configured to obtain an original sample set;

for each first image sample, generating a size proportion of the first image sample according to the height value and the width value of the first image sample;

and for each first image sample, taking the size proportion of the first image sample as the size characteristic of the first image sample.

the clustering module is specifically used for generating N clustering results according to the size characteristics of each first image sample and N size proportion intervals, wherein each size proportion interval corresponds to one clustering result;

acquiring a central size proportion aiming at each size proportion interval;

and regarding each size proportion interval, taking the center size proportion as the clustering center of the clustering result.

the processing module is specifically used for taking the clustering center of the clustering result as a target size aiming at each clustering result, wherein the target size comprises a target height value and a target width value;

and for each clustering result, scaling the height value of the first image sample in the clustering result to a target height value, and scaling the width value of the first image sample in the clustering result to a target width value to obtain a target sample set corresponding to the clustering result, wherein the target sample set comprises at least one second image sample, and the second image sample is the first image sample after size adjustment.

aiming at each clustering result, if the clustering result meets a clustering grouping condition, taking a clustering center of the clustering result as a target size, wherein the target size comprises a target height value and a target width value;

generating a target size proportion according to the target size aiming at each clustering result;

generating K target sub-sizes according to the target size proportion aiming at each clustering result, wherein each target sub-size comprises a target sub-height value and a target sub-width value;

and aiming at each clustering result, acquiring a target sample set corresponding to the clustering result according to the K target sub-sizes, wherein the target sample set comprises the K target sample sub-sets, and each second image sample in the same target sample sub-set has the same target sub-size.

and the processing module is further used for determining that the clustering result meets the clustering grouping condition if the number of the first image samples in the clustering result is greater than or equal to the number threshold value aiming at each clustering result.

and the processing module is further used for determining that the clustering result meets the clustering grouping condition if the size proportion of the first image samples in the clustering result is larger than or equal to the first size proportion or the size proportion of the first image samples in the clustering result is smaller than or equal to the second size proportion.

the training module is specifically used for generating a training sample set according to the target sample set corresponding to each clustering result;

based on each second image sample in the training sample set, obtaining a class distribution vector corresponding to each second image sample in the training sample set through an image classification model;

and updating the model parameters of the image classification model by adopting a loss function according to the class distribution vector corresponding to each second image sample in the training sample set and the class label of the first image sample corresponding to each second image sample in the training sample set.

the training module is specifically used for acquiring a first feature map through a convolution module included in the image classification model based on each second image sample in the training sample set;

for each second image sample in the training sample set, based on a first feature map corresponding to the second image sample, obtaining a second feature map through T residual convolution modules included in the image classification model, wherein T is an integer greater than or equal to 1;

for each second image sample in the training sample set, acquiring a target feature vector through a pooling layer included in the image classification model based on a second feature map corresponding to the second image sample;

and aiming at each second image sample in the training sample set, obtaining a category distribution vector through a full connection layer included in the image classification model based on a target feature vector corresponding to the second image sample.

In one possible design, in a first implementation manner of another aspect of the embodiment of the present application, the model training apparatus further includes a sending module;

the obtaining module is further used for updating model parameters of the image classification model according to the target sample set corresponding to each clustering result and the category label of each first image sample in the original sample set, and then generating a verification sample set according to the target sample set corresponding to each clustering result;

the obtaining module is further used for obtaining a category distribution vector corresponding to each second image sample in the verification sample set through an image classification model based on each second image sample in the verification sample set;

the obtaining module is further configured to determine a classification result corresponding to each second image sample in the verification sample set according to the class distribution vector corresponding to each second image sample in the verification sample set;

the obtaining module is further used for determining the model accuracy according to the classification result corresponding to each second image sample in the verification sample set and the class label of the first image sample corresponding to each second image sample in the verification sample set;

and the sending module is used for sending the model parameters of the image classification model to at least one terminal device if the model accuracy is greater than or equal to the accuracy threshold.

Another aspect of the present application provides an image classification apparatus, including:

the acquisition module is used for acquiring a target image;

and the classification module is used for acquiring a classification result of the target image through an image classification model based on the target image, wherein the image classification model is obtained by adopting the model training method provided by the aspects.

In one possible design, in a first implementation manner of another aspect of the embodiment of the present application, the image classification apparatus further includes a control module;

the acquisition module is specifically used for acquiring a target video frame acquired at the current moment through the image acquisition device when a target vehicle runs;

based on the target video frame, intercepting a target image from the target video frame through an image detection model;

and the control module is used for controlling the running state of the target vehicle according to a preset vehicle control strategy according to the classification result of the target image after the classification result of the target image is obtained through the image classification model based on the target image.

In one possible design, in a first implementation manner of another aspect of the embodiment of the present application, the image classification apparatus further includes a sending module, a receiving module, and an updating module;

the sending module is used for sending the classification result of the target image to the server after the classification result of the target image is obtained through the image classification model based on the target image, so that the server determines the model accuracy of the image classification model according to the classification result of the target image and the class label labeled for the target image, and if the model accuracy is smaller than the accuracy threshold, the model parameters are updated to obtain the target model parameters;

the receiving module is used for receiving the target model parameters sent by the server;

and the updating module is used for updating the model parameters of the image classification model into target model parameters.

Another aspect of the present application provides a computer device, comprising: a memory, a processor, and a bus system;

wherein, the memory is used for storing programs;

a processor for executing the program in the memory, the processor for performing the above-described aspects of the method according to instructions in the program code;

the bus system is used for connecting the memory and the processor so as to enable the memory and the processor to communicate.

Another aspect of the present application provides a computer-readable storage medium having stored therein instructions, which when executed on a computer, cause the computer to perform the method of the above-described aspects.

In another aspect of the application, a computer program product or computer program is provided, the computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method provided by the above aspects.

According to the technical scheme, the embodiment of the application has the following advantages:

in the embodiment of the application, a model training method is provided, and first, a size feature of each first image sample in an original sample set is obtained, and each first image sample is labeled with a corresponding category label. Then, the original sample set can be clustered according to the size characteristics of each first image sample, so that N clustering results and the clustering center of each clustering result are obtained. Based on the above, for each clustering result, the size of the first image sample in the clustering result is scaled according to the clustering center of the clustering result, so as to obtain a target sample set corresponding to the clustering result. Finally, the model parameters of the image classification model can be updated according to the target sample set corresponding to each clustering result and the class label of each first image sample in the original sample set. By the method, the original sample set can be divided into a plurality of clustering results according to the size characteristics by adopting the clustering algorithm, namely, the image samples belonging to the same clustering result are more similar in size characteristics. Based on the method, the image samples included in the clustering results are subjected to self-adaptive size adjustment respectively based on the clustering centers of all the clustering results, so that the requirement of sample size consistency in batch training can be met, the images can be prevented from being seriously deformed to a certain extent, and the accuracy of model training is improved.

Drawings

FIG. 1 is a diagram illustrating the resizing of an image sample based on different fixed sizes in the prior art;

FIG. 2 is a block diagram of an embodiment of a model training system;

FIG. 3 is a schematic flow chart of a model training method according to an embodiment of the present application;

FIG. 4 is a schematic diagram of generating a plurality of clustering results based on a clustering algorithm in the embodiment of the present application;

FIG. 5 is a schematic diagram of generating a plurality of clustering results based on a size ratio interval in an embodiment of the present application;

FIG. 6 is a schematic diagram of a plurality of target sample sets in an embodiment of the present application;

FIG. 7 is a schematic illustration of a plurality of target sample subsets according to an embodiment of the present application;

FIG. 8 is a diagram illustrating a network structure of an image classification model according to an embodiment of the present application;

FIG. 9 is a flowchart illustrating an image classification method according to an embodiment of the present application;

FIG. 10 is a schematic diagram of an autopilot scenario in an embodiment of the subject application;

FIG. 11 is a diagram illustrating an embodiment of a server updating model parameters;

FIG. 12 is a schematic view of a model training apparatus according to an embodiment of the present application;

FIG. 13 is a schematic diagram of an image classification apparatus according to an embodiment of the present application;

FIG. 14 is a schematic structural diagram of a server in an embodiment of the present application;

fig. 15 is a schematic structural diagram of a terminal device in the embodiment of the present application.

Detailed Description

The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present application and in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "corresponding" and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The image recognition technology is the basis of practical technologies such as stereoscopic vision, motion analysis, data fusion and the like, and can be widely applied to many fields such as navigation, map and terrain registration, natural resource analysis, weather forecast, environment monitoring, physiological lesion research and the like.

Illustratively, in an autonomous driving scenario, respective decisions to stop or travel are generated based on the identified traffic light type (e.g., red light, green light, left-turn red light, right-turn green light, etc.).

Illustratively, in the advertisement detection scene, according to the identified advertisement type (for example, food advertisement, electric appliance advertisement or clothes advertisement, etc.), a corresponding display mode is selected.

Image recognition technology belongs to a branch of Computer Vision (CV) technology. CV is a science for researching how to make a machine look, and in particular, it refers to replacing human eyes with a camera and a computer to perform machine vision such as identification, tracking and measurement on a target, and further performing image processing, so that the computer processing becomes an image more suitable for human eyes to observe or is transmitted to an instrument to detect. As a scientific discipline, CV research-related theories and techniques attempt to build AI systems that can acquire information from images or multidimensional data. CV technologies generally include image processing, image Recognition, image semantic understanding, image retrieval, Optical Character Recognition (OCR), video processing, video semantic understanding, video content/behavior Recognition, three-dimensional object reconstruction, 3D technologies, virtual reality, augmented reality, synchronous positioning, map construction, and other technologies, and also include common biometric technologies such as face Recognition and fingerprint Recognition.

CV technology is an important field of Artificial Intelligence (AI). AI is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, AI is an integrated technique of computer science that attempts to understand the essence of intelligence and produces a new intelligent machine that can react in a manner similar to human intelligence. AI is to study the design principles and implementation methods of various intelligent machines, so that the machine has the functions of perception, reasoning and decision making. The AI technology is a comprehensive subject, and relates to the field of extensive technology, both hardware level technology and software level technology. The AI base technologies generally include technologies such as sensors, dedicated AI chips, cloud computing, distributed storage, big data processing technologies, operating/interactive systems, mechatronics, and the like. The AI software technology mainly includes several directions, such as computer vision technology, speech processing technology, natural language processing technology, and machine learning/deep learning.

With the research and progress of the AI technology, the AI technology is researched and applied in a plurality of fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, automatic driving, unmanned aerial vehicles, robots, smart medical services, smart customer service, etc., and it is believed that with the development of the technology, the AI technology will be applied in more fields and exert more and more important values.

In order to train an image classification model with a better classification effect, the present application provides a model training method, which is applied to a model training system shown in fig. 2, as shown in the figure, the model training system includes a server and a terminal device, and a client is deployed on the terminal device, wherein the client may run on the terminal device in the form of a browser, or run on the terminal device in the form of an independent Application (APP), and a specific presentation form of the client is not limited herein. The server related to the application can be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, and can also be a cloud server providing basic cloud computing services such as cloud service, a cloud database, cloud computing, a cloud function, cloud storage, Network service, cloud communication, middleware service, domain name service, safety service, Content Delivery Network (CDN), big data and an artificial intelligence platform. The terminal device may be a smart phone, a tablet computer, a notebook computer, a palm computer, a personal computer, a wearable device, an intelligent voice interaction device, an intelligent household appliance, a vehicle-mounted terminal, an aircraft, etc., but is not limited thereto. The application can be applied to various scenarios including but not limited to cloud technology, AI, intelligent traffic, assisted driving, etc.

The terminal device and the server may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein. The number of servers and terminal devices is not limited. The scheme provided by the application can be independently completed by the terminal device, can also be independently completed by the server, and can also be completed by the cooperation of the terminal device and the server, so that the application is not particularly limited.

Based on this, the server trains the image classification model and keeps the trained model parameters local. In one implementation, the server may issue the latest model parameters to each terminal device. In another implementation, the model parameters are downloaded from the server by the terminal device, so as to update the local image classification model.

With reference to fig. 3, a method for training a model in the present application will be described below, and an embodiment of the method for training a model in the present application includes:

110. obtaining the size characteristics of each first image sample in an original sample set, wherein the original sample set comprises M first image samples, each first image sample is labeled with a corresponding class label, and M is an integer greater than 1;

in one or more embodiments, the model training device obtains an original sample set including M first image samples, where the first image samples are typically images cropped directly from the original image, and thus the first image samples have the original image size. Thus, a corresponding size feature may be generated from the image size of each first image sample. At the same time, for each first image sample, a corresponding category label also needs to be marked.

Specifically, taking the classification of the traffic light image as an example, based on this, the category labels of the first image sample include, but are not limited to, "pie red light," pie green light, "" left turn red light, "" right turn red light, "and" yellow light. In practical applications, more category labels may be labeled, such as "traffic sign" and "ground road sign", which are not limited herein.

It should be noted that the model training apparatus in the present application may be deployed in a server or a terminal device, or in a system composed of a server and a terminal device, and is not limited herein.

120. Clustering the original sample set according to the size characteristics of each first image sample to obtain N clustering results and a clustering center of each clustering result, wherein each clustering result comprises at least one first image sample, N is an integer greater than 1 and less than M;

in one or more embodiments, the model training apparatus clusters all the first image samples in the original sample set by using a clustering algorithm according to a preset clustering number (i.e., N value) and a size characteristic of each first image sample, so as to obtain N clustering results, where each clustering result has a clustering center.

It should be noted that the Clustering algorithm employed in the present application includes, but is not limited to, K-means Clustering algorithm (K-means Clustering algorithm), Density-Based Clustering with Noise (DBSCAN) algorithm, Density-Based Clustering to identify the Clustering structure (OPTICS) algorithm, hierarchical Clustering (hierarchical Clustering) algorithm, etc., and is not limited herein.

130. For each clustering result, carrying out scaling processing on the size of a first image sample in the clustering result according to the clustering center of the clustering result to obtain a target sample set corresponding to the clustering result;

in one or more embodiments, after obtaining the N clustering centers, the model training apparatus may respectively perform a scaling (resize) process on sizes of the first image samples in each clustering result, so as to obtain a target sample set corresponding to the clustering result. Each target sample set comprises at least one second image sample, and the second image sample is the first image sample after being subjected to size adjustment.

140. And updating the model parameters of the image classification model according to the target sample set corresponding to each clustering result and the class label of each first image sample in the original sample set.

In one or more embodiments, the model training apparatus constructs and trains an image classification model.

Specifically, the target sample set corresponding to the same clustering result may be regarded as a batch (batch), and at this time, the second image samples in the target sample set have the same size. Samples belonging to the same batch are stitched into a four-dimensional matrix, i.e., (batch size image channel number image height value image width value). Meanwhile, the second image sample is the first image sample after the size adjustment, so that the second image sample and the first image sample have a one-to-one correspondence relationship. Based on this, the class label of the first image sample can be directly used as the class label corresponding to the second image sample, so that the model parameters of the image classification model can be updated.

In the embodiment of the application, a model training method is provided. By the method, the original sample set can be divided into a plurality of clustering results according to the size characteristics by adopting the clustering algorithm, namely, the image samples belonging to the same clustering result are more similar in size characteristics. Based on the method, the image samples included in the clustering results are subjected to self-adaptive size adjustment respectively based on the clustering centers of all the clustering results, so that the requirement of sample size consistency in batch training can be met, the images can be prevented from being seriously deformed to a certain extent, and the accuracy of model training is improved.

Optionally, on the basis of the respective embodiments corresponding to fig. 3, in another optional embodiment provided by the embodiment of the present application, the obtaining a size characteristic of each first image sample in the original sample set may specifically include:

acquiring an original sample set;

In one or more embodiments, a manner of clustering based on two-dimensional features is presented. As can be seen from the foregoing embodiments, it is first necessary to acquire an original image, and then mark out a target object (e.g., a traffic light, a traffic sign, a ground road sign, etc.) in the original image with a target frame, thereby cropping out an image block containing the target object as a first image sample.

Specifically, the description will be given taking any one first image sample in the original sample set as an example. Based on this, a height value and a width value of the first image sample are obtained, wherein both the height value and the width value can be expressed as a pixel size. For example, the height value of the first image sample is 1000, and the width value is 300, thereby obtaining the size characteristic of the first image sample as (1000,300). It can be seen that the size features can be represented as a two-dimensional vector.

It is understood that the size features of the first image samples can also be expressed as (300,1000), but it is necessary to unify the size feature expressions of the respective first image samples, i.e. all (height value, width value), or all (width value, height value), and the size features in this application are all expressed as (height value, width value), however, this should not be construed as limiting the present application.

Secondly, in the embodiment of the application, a clustering mode based on two-dimensional features is provided. In this way, it is considered that the length-width ratio is accurately marked on the appearance shape, and not the height value or the width value alone. Therefore, the height value and the width value of the first image sample are used as size features, the first image sample can be described more accurately, and the reliability of sample clustering is improved.

Optionally, on the basis of each embodiment corresponding to fig. 3, in another optional embodiment provided in this embodiment of the present application, clustering the original sample set according to the size characteristic of each first image sample to obtain N clustering results and a clustering center of each clustering result, which may specifically include:

randomly generating N initial clustering centers;

In one or more embodiments, a way to perform clustering based on a k-means clustering algorithm is presented. It can be seen from the foregoing embodiment that the height value and the width value of the first image sample can be clustered by using a preset clustering number (i.e., N value) and then using a k-means clustering algorithm. The k-means clustering algorithm is an unsupervised clustering method, and only the number of clusters needs to be specified. The more the clustering quantity is, the smaller the scale fine granularity is, and the smaller the deformation generated by the image sample is.

In particular, an original sample set with M first image samples is given, i.e., { X }₁,X₂,X₃,…,X_M}. Randomly generating N initial cluster centers, i.e., { A₁,A₂,A₃,…,A_N}. For each first image sample, the distance between the first image sample to each initial cluster center is calculated, i.e. each first image sample has N distances. Based on the above, each first image sample is divided into the category corresponding to the initial clustering center with the shortest distance. And after the M first image samples are divided, obtaining N clustering clusters. And recalculating the cluster center of each cluster.

And repeating the process until the convergence condition of the clustering centers is met, namely taking the N clustering clusters obtained by final clustering as N clustering results, and obtaining the clustering centers of all the clustering results. It should be noted that, in one case, when the iteration number reaches the number threshold, it is considered that the clustering center convergence condition is satisfied. In another case, when the minimum error variation of the cluster is smaller than the variation threshold, the cluster center convergence condition is considered to be satisfied.

For example, for convenience of understanding, please refer to fig. 4, fig. 4 is a schematic diagram illustrating that a plurality of clustering results are generated based on a clustering algorithm in the embodiment of the present application, and as shown in the figure, assuming that N is set to be 3, after 13 first image samples are clustered, 3 clustering results can be obtained. The clustering result 1 includes 5 first image samples, the clustering result 2 includes 5 first image samples, and the clustering result 3 includes 3 first image samples.

Thirdly, the embodiment of the application provides a clustering mode based on a k-means clustering algorithm. By the method, the k-means clustering algorithm has a good clustering effect on numerical data, and a clustering result is irrelevant to the input sequence of the data, so that the method is more suitable for the clustering scene provided by the application and has a high convergence rate.

acquiring an original sample set;

In one or more embodiments, a manner of clustering based on individual features is presented. As can be seen from the foregoing embodiments, it is first necessary to acquire an original image, and then mark out a target object (e.g., a traffic light, a traffic sign, a ground road sign, etc.) in the original image with a target frame, thereby cropping out an image block containing the target object as a first image sample.

Specifically, the description will be given taking any one first image sample in the original sample set as an example. Based on this, a height value and a width value of the first image sample are obtained, wherein both the height value and the width value can be expressed as a pixel size. For example, the height value of the first image sample is 1000, and the width value is 300, so that the size ratio of the first image sample is 10/3. And directly taking the size proportion of the first image sample as the size characteristic of the first image sample. It can be seen that the dimensional features can be represented as a single feature.

It is understood that the size ratio of the first image samples may also be represented as 3/10, but it is necessary to unify the size feature expressions of the respective first image samples, i.e. all (height/width value), or all (width/height value), and the size features in this application are all represented as (height/width value), however, this should not be construed as limiting the present application.

Secondly, in the embodiment of the application, a clustering mode based on single characteristics is provided. In this way, it is considered that the length-width ratio is accurately marked on the appearance shape, and not the height value or the width value alone. Therefore, the size proportion of the first image sample is used as the size characteristic, the first image sample can be described more accurately, and the reliability of sample clustering is improved.

generating N clustering results according to the size characteristics of each first image sample and N size proportion intervals, wherein each size proportion interval corresponds to one clustering result;

acquiring a central size proportion aiming at each size proportion interval;

In one or more embodiments, a manner of clustering based on size ratio intervals is presented. As can be seen from the foregoing embodiments, a plurality of size ratio sections can be preset, and the first image sample can be divided into corresponding size ratio sections according to the size characteristics (i.e., the size ratio) of the first image sample.

Specifically, for the convenience of understanding, please refer to fig. 5, fig. 5 is a schematic diagram of generating a plurality of clustering results based on the size scale interval in the embodiment of the present application, as shown in the figure, it is assumed that the clustering result is divided into 3 size scale intervals according to the size scale, please refer to table 1, where table 1 is a schematic diagram of the size scale intervals and the clustering results.

TABLE 1

As can be seen, taking the first image sample No. 1 as an example, assume that the height value of the first image sample No. 1 is 1000, and the width value is 920, i.e., the size characteristic of the first image sample No. 1 is 1000/920. Based on this, the first image sample No. 1 with the size characteristic of 1000/920 belongs to the size scale section 2. Meanwhile, the center size ratio can be used as the cluster center of the clustering result, and taking the size ratio section 2 as an example, the corresponding center size ratio is 10/9, that is, the cluster center is 10/9.

Thirdly, in the embodiment of the present application, a method for clustering based on a size ratio interval is provided. Through the mode, a plurality of size proportion intervals can be defined by users, and based on the size proportion of the first image sample, clustering can be completed by directly dividing the first image sample into the corresponding size proportion intervals, so that the clustering times are reduced, and the clustering efficiency is improved.

Optionally, on the basis of each embodiment corresponding to fig. 3, in another optional embodiment provided in this application embodiment, for each clustering result, according to a clustering center of the clustering result, scaling the size of the first image sample in the clustering result to obtain a target sample set corresponding to the clustering result, which may specifically include:

aiming at each clustering result, taking the clustering center of the clustering result as a target size, wherein the target size comprises a target height value and a target width value;

In one or more embodiments, a manner of scaling an image sample based on a target size is presented. As can be seen from the foregoing embodiments, if the size features of the first image sample are represented as two-dimensional vectors (i.e., height values and width values), the cluster center obtained after clustering is also represented as one two-dimensional vector.

Specifically, assuming that N is set to 3, there are 13 first image samples, and based on this, after clustering the 13 first image samples, 3 clustering results can be obtained. Assume that 5 first image samples are included in the

clustering result

1, 5 first image samples are included in the

clustering result

2, and 3 first image samples are included in the clustering result 3. The cluster center (i.e., the target size) of the clustering result 1 was (300, 1000), where the target height value was 300 and the target width value was 1000. The cluster center (i.e., the target size) of the clustering result 2 was (400), wherein the target height value was 400 and the target width value was 400. The cluster center (i.e., the target size) of the clustering result 3 is (1000,300), wherein the target height value is 1000 and the target width value is 300.

Based on this, please refer to fig. 6 for easy understanding, fig. 6 is a schematic diagram of a plurality of target sample sets in the embodiment of the present application, as shown in fig. 6 (a), the 5 first image samples belong to the clustering result 1, and therefore, it is required to scale the height values of the 5 first image samples to the target height value (i.e., 300) and scale the width values to the target width value (i.e., 1000). As shown in fig. 6 (B), the 5 first image samples belong to the clustering result 2, and therefore, it is necessary to scale the height values of the 5 first image samples to the target height value (i.e., 400) and scale the width values to the target width value (i.e., 400). As shown in fig. 6 (C), the 3 first image samples belong to the clustering result 3, and therefore, it is necessary to scale the height values of the 3 first image samples to the target height value (i.e., 1000) and scale the width values to the target width value (i.e., 300).

Thus, a target sample set corresponding to each clustering result is obtained, and the target sample set includes at least one first image sample after size adjustment, that is, includes at least one second image sample. It can be seen that the second image samples belonging to the same set of target samples have the same size (i.e. the same height value and width value), whereas the second image samples belonging to different sets of target samples have different sizes (i.e. at least one of different height values or width values).

Secondly, in the embodiment of the present application, a manner of scaling the image sample based on the target size is provided. Through the mode, the clustering center is directly used as the training size of the data set, on one hand, size unification can be realized, and on the other hand, the clustering center has better representativeness, so that a better size change effect is favorably achieved.

In one or more embodiments, a manner of scaling an image sample based on a target size scale is presented. As can be seen from the foregoing embodiments, if the size features of the first image sample are represented as two-dimensional vectors (i.e., height values and width values), the cluster center obtained after clustering is also represented as one two-dimensional vector.

clustering result

1, 5 first image samples are included in the

clustering result

Based on this, for the sake of easy understanding, the target size is (300, 1000) in the clustering result 1 as an example, and therefore, the target size ratio generated from the target size is 3/10. If the clustering result 1 meets the clustering grouping condition, K target sub-sizes can be generated according to the target size proportion.

Referring to fig. 7, fig. 7 is a schematic diagram of a plurality of target sample subsets in the embodiment of the present application, and as shown in fig. 7 (a), it is assumed that a target sub-size generated based on a target size ratio is (300, 1000), so that a first image sample in the clustering result 1 close to the target sub-size (300,100) is scaled to obtain a target sample subset. The second image samples in the target sample subset all have the same target sub-size, that is, the target sub-height value is 300, and the target sub-width value is 1000.

Similarly, as shown in fig. 7 (B), assume that another target sub-size generated based on the target size ratio is (150,500), and thus the first image sample in the clustering result 1 close to the target sub-size (150,500) is scaled to obtain another target sample subset. Wherein the second image samples in the subset of target samples all have the same target sub-size, i.e. the target sub-height value is 150 and the target sub-width value is 500.

It should be noted that, if the clustering center is in a size ratio, K target sub-sizes may be directly constructed according to the size ratio, and then a target sample set corresponding to the clustering result is obtained according to the K target sub-sizes, which is not described herein again.

Secondly, in the embodiment of the present application, a manner of scaling the image sample based on the target size scale is provided. Through the mode, the clustering center is used as the training size proportion of the data set, and then one or more target sub-sizes are generated based on the size proportion according to actual requirements, so that on one hand, size unification can be realized, and on the other hand, a target sample sub-set can be constructed more flexibly.

Optionally, on the basis of the foregoing respective embodiments corresponding to fig. 3, another optional embodiment provided in the embodiments of the present application may further include:

and for each clustering result, if the number of the first image samples in the clustering result is greater than or equal to a number threshold, determining that the clustering result meets a clustering grouping condition.

In one or more embodiments, a manner of determining whether a cluster grouping condition is satisfied based on a number of image samples is presented. As can be seen from the foregoing embodiments, the clustering results may include different numbers of first image samples, and therefore, the first image samples in the same clustering result may be further grouped.

Specifically, taking the number threshold of 15000 as an example, assuming that a certain clustering result includes 20000 first image samples, it is determined that the clustering result satisfies the clustering grouping condition. Assuming that a certain clustering result includes 10000 first image samples, it is determined that the clustering result does not satisfy the clustering grouping condition.

Thirdly, the embodiment of the application provides a mode for judging whether the clustering grouping condition is met or not based on the number of the image samples. For the case that a large number of first image samples are included in the clustering result, the clustering result can be divided into a plurality of target sample subsets, so that the overfitting situation can be avoided in the training process.

and for each clustering result, if the size proportion of the first image samples in the clustering result is larger than or equal to the first size proportion, or the size proportion of the first image samples is smaller than or equal to the second size proportion, determining that the clustering result meets the clustering grouping condition.

In one or more embodiments, a manner of determining whether a cluster grouping condition is satisfied based on an image sample size span is presented. As can be seen from the foregoing embodiments, the clustering results may include the first image samples with larger size differences, and therefore, the first image samples in the same clustering result may be further grouped.

Specifically, taking the size ratio as (height value/width value) as an example, assume that the first size ratio is 300/1000 and the second size ratio is 200/1000. Based on this, for example, if the size ratio of a certain first image sample is 350/1000, the size ratio of the first image sample is larger than the first size ratio, and therefore the clustering result to which the first image sample belongs satisfies the clustering condition. Illustratively, if the size ratio of a first image sample is 100/1000, the size ratio of the first image sample is smaller than the second size ratio, and therefore the clustering result to which the first image sample belongs satisfies the clustering condition.

Thirdly, the embodiment of the application provides a mode for judging whether the clustering grouping condition is met or not based on the size span of the image sample. For the condition that the clustering result comprises a large size span of the first image sample, the clustering result can be divided into a plurality of target sample subsets, so that the overfitting condition can be avoided in the training process.

Optionally, on the basis of the various embodiments corresponding to fig. 3, in another optional embodiment provided by the embodiment of the present application, the updating the model parameters of the image classification model according to the class label of each first image sample in the target sample set and the original sample set corresponding to each clustering result specifically may include:

generating a training sample set according to the target sample set corresponding to each clustering result;

In one or more embodiments, a manner of training an image classification model is presented. As can be seen from the foregoing embodiments, a proportion of the second image samples may be selected from the target sample set corresponding to each clustering result, for example, 80% of the second image samples are selected from each target sample set, and these second image samples together form the training sample set.

Specifically, a second image sample of a specific same size may be selected from the training sample set to be trained as a sample of batch. And inputting each second image sample in one batch to an image classification model, and acquiring a class distribution vector of each second image sample in the batch through the image classification model. Based on the method, the error between the class distribution vector output by the network and the real result (namely, the class label of the first image sample corresponding to the second image sample) can be calculated by adopting cross entropy loss, and finally, the model parameter of the image classification model is updated by adopting an error back-propagation method.

And obtaining the trained image classification model until the number of training rounds is reached or the error reaches a preset value.

Secondly, in the embodiment of the present application, a method for training an image classification model is provided. By the method, in order to avoid overfitting of the image classification model to the image samples with a certain size, samples of each batch need to be alternately extracted in a staggered mode among different training sample sets, and therefore the situation that the samples fall into local optimization is avoided.

Optionally, on the basis of the various embodiments corresponding to fig. 3, in another optional embodiment provided in the embodiment of the present application, based on each second image sample in the training sample set, obtaining, by the image classification model, a category distribution vector corresponding to each second image sample in the training sample set may specifically include:

aiming at each second image sample in the training sample set, based on the second image sample, obtaining a first feature map through a convolution module included in an image classification model;

In one or more embodiments, a manner of outputting a class distribution vector using an image classification model is presented. As can be seen from the foregoing embodiments, the image classification model may be a CNN model, and thus, the image classification model includes a plurality of convolution stages, each convolution stage containing a plurality of residual convolution modules. The following description will be made with reference to the drawings.

Specifically, for ease of understanding, referring to fig. 8, fig. 8 is a schematic diagram of a network structure of the image classification model in the embodiment of the present application, and as shown in the figure, it is assumed that the size of the second image sample is 3 × h × w, where 3 denotes the number of channels, h denotes the height value, and w denotes the width value. And inputting the second image sample into a convolution module included in the image classification model to obtain a first feature map of 64 x h/2 x w/2. The convolution module comprises a convolution layer and a maximum pooling layer, and the size of an adopted convolution kernel is 7 x 7. And taking the first feature map as an input of the residual convolution module, wherein the feature maps of the same stage are consistent in size, and the length and the width are 1/2 of the feature map of the previous convolution stage. The dashed lines in fig. 8 represent the downsampling followed by residual concatenation. And obtaining a second feature map of 512 x h/32 x w/32 after the T residual convolution modules.

The second feature map is pooled into a target feature vector using a mean pooling layer. The mean pooling layer is followed by a full connection layer, and the 512-dimensional target feature vector is mapped into a c-dimensional class distribution vector, wherein c represents the number of classes to be classified. For example, the class distribution vector is (0.8, 0.1,0, 0.1), and therefore the class corresponding to 0.8 is the predicted classification result.

It can be understood that, in practical applications, the number of residual convolution modules can be flexibly adjusted, and meanwhile, the image classification model may also use a lightweight network (mobile or shuffle, etc.), which is not limited herein.

Thirdly, in the embodiment of the present application, a manner of outputting a category distribution vector by using an image classification model is provided. In the manner, the specific category of the object in the image can be determined by utilizing the image classification model, so that the feasibility and operability of the scheme are improved.

Optionally, on the basis of the various embodiments corresponding to fig. 3, in another optional embodiment provided by the embodiment of the present application, after updating the model parameters of the image classification model according to the class label of each first image sample in the target sample set and the original sample set corresponding to each clustering result, the method may further include:

generating a verification sample set according to the target sample set corresponding to each clustering result;

based on each second image sample in the verification sample set, obtaining a category distribution vector corresponding to each second image sample in the verification sample set through an image classification model;

determining a classification result corresponding to each second image sample in the verification sample set according to the class distribution vector corresponding to each second image sample in the verification sample set;

determining the accuracy of the model according to the classification result corresponding to each second image sample in the verification sample set and the class label of the first image sample corresponding to each second image sample in the verification sample set;

and if the model accuracy is greater than or equal to the accuracy threshold, sending the model parameters of the image classification model to at least one terminal device.

In one or more embodiments, a manner of determining a line on a model using a set of validation samples is presented. As can be seen from the foregoing embodiments, after the target sample set corresponding to each clustering result is obtained, each target sample set may be divided into a training sample set and a verification sample set according to a certain proportion. For example, 80% of the second image samples are selected from each set of target samples, which together constitute a set of training samples.

Specifically, it is assumed that 3 clustering results are obtained after clustering, and each clustering result is used as a target sample set, so that a second image sample can be selected from different target sample sets for model verification. Assuming that the verification sample set has 1000 second image samples, and each second image sample has a corresponding first image sample, the class label of the second image sample corresponding to the first image sample is the class label of the second image sample. Based on the fact that the classification result determined based on the class distribution vector is the predicted result, it is assumed that the classification result of 950 second image samples in the 1000 second image samples is the same as the class label, and the classification result of the remaining 50 second image samples is different from the class label, so that the model accuracy of the image classification model is 95%.

For example, assuming that the accuracy threshold is 98%, the model accuracy is less than the accuracy threshold at this time, and therefore, the model parameters of the image classification model need to be continuously updated.

Illustratively, assuming that the accuracy threshold is 90%, the model accuracy is greater than the accuracy threshold at this time, and therefore, the updating of the model parameters of the image classification model may be stopped, and the model parameters of the image classification model may be sent to the at least one terminal device.

It should be noted that the verification sample set is a sample set left alone in the model training process, and it can be used for adjusting the hyper-parameters of the image classification model and for performing a preliminary evaluation on the capability of the image classification model. It is usually used to verify the generalization capability (e.g., accuracy and recall) of the current model when the model is iteratively trained, thereby deciding whether to continue training.

Thirdly, in the embodiment of the present application, a mode of determining the model online by using the verification sample set is provided. By the mode, the training sample set is used for training the image classification model, the verification set is used for evaluating the trained image classification model, and the image classification model with the model accuracy meeting the requirement can be directly applied on line, so that the model has higher reliability and accuracy in practical application.

With reference to fig. 9, a method for classifying an image in the present application will be described below, and an embodiment of the image classification method in the present application includes:

210. the terminal equipment acquires a target image;

in one or more embodiments, the terminal device obtains a target image, where the target image may be an image block to be classified. Based on this, the aspect ratio of the target image is calculated, and the target image resize is brought to the size closest to some second image sample.

220. The terminal equipment obtains the classification result of the target image through an image classification model based on the target image, wherein the image classification model is obtained by adopting any one method in the embodiment.

In one or more embodiments, the terminal device uses the target image after resize as an input of an image classification model, outputs a category distribution vector through the image classification model, and uses a position with the highest probability as a classification result of the target image, where a probability value corresponding to the position is a confidence coefficient predicted as the classification result.

It is understood that the image classification model is trained by the method described in the foregoing embodiment. That is, the image classification model may be trained based on a Machine Learning (ML) method. The ML is a multi-field interdisciplinary subject and relates to a plurality of subjects such as probability theory, statistics, approximation theory, convex analysis and algorithm complexity theory. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of AI, is the fundamental approach to making computers intelligent, and is applied throughout various areas of AI. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.

In an embodiment of the application, a method for image classification is provided. Through the method, the objects in the images can be classified based on the trained image classification model, and specific basis is provided for subsequent processing.

Optionally, on the basis of each embodiment corresponding to fig. 9, in another optional embodiment provided in this embodiment, the acquiring, by the terminal device, the target image may specifically include:

when a target vehicle runs, the terminal equipment acquires a target video frame acquired at the current moment through the image acquisition device;

the terminal equipment intercepts a target image from a target video frame through an image detection model based on the target video frame;

after the terminal device obtains the classification result of the target image through the image classification model based on the target image, the method may further include:

and the terminal equipment controls the running state of the target vehicle according to the classification result of the target image and a preset vehicle control strategy.

In one or more embodiments, a manner of implementing vehicle control based on image classification is presented. As can be seen from the foregoing embodiments, in an automatic driving or assisted driving scenario, Augmented Reality (AR) navigation is generally used. AR navigation requires sensing road elements such as "traffic lights", "traffic signs", and "ground road signs", wherein the image classification model is used to distinguish the categories of various elements, so as to resolve the specific meaning (for example, turning red light left).

Specifically, for ease of understanding, please refer to fig. 10, where fig. 10 is a schematic diagram of an automatic driving scenario in an embodiment of the present application, and as shown in the figure, when a target vehicle is driving, a video may be captured through an image capture device (i.e., a camera) installed on a vehicle data recorder, and based on this, each frame of image in the video may be analyzed. Taking a target video frame collected at the current moment as an example, the target video frame is taken as the input of an image detection model, and the image detection model outputs a bounding box. Then, the region selected by the bounding box is taken as a target image. Next, the target image is used as an input of the image classification model, and the class distribution vector is output by the image classification model. This makes it possible to know the classification result of the target image.

For example, assuming that the classification result of the target image is "green light", the preset vehicle control strategy may be to control the target vehicle to continue traveling.

For example, assuming that the classification result of the target image is "yellow light", the preset vehicle control strategy may be to control the target vehicle to decelerate to the stop line and to wait for the yellow light to turn into green light before driving.

For example, assuming that the classification result of the target image is "red light", the preset vehicle control strategy may be to control the target vehicle to brake in time.

It can be understood that the automatic driving technology generally includes technologies such as high-precision maps, environmental perception, behavior decision, path planning, motion control, and the like, and the self-determined driving technology has a wide application prospect.

Secondly, in the embodiment of the application, a mode for realizing vehicle control based on image classification is provided. By the method, the road image collected in real time can be identified in the vehicle driving process, and the vehicle driving strategy is adjusted based on the identification result. Because the image classification model has a good classification effect, the safety of vehicle running can be improved in the automatic driving or auxiliary driving process.

Optionally, on the basis of the foregoing respective embodiments corresponding to fig. 9, in another optional embodiment provided in this embodiment of the present application, after the terminal device obtains a classification result of the target image through the image classification model based on the target image, the method may further include:

the terminal equipment sends the classification result of the target image to the server, so that the server determines the model accuracy of the image classification model according to the classification result of the target image and the class label labeled for the target image, and if the model accuracy is smaller than an accuracy threshold, the model parameters are updated to obtain target model parameters;

the terminal equipment receives a target model parameter sent by the server;

and the terminal equipment updates the model parameters of the image classification model into target model parameters.

In one or more embodiments, a manner is described by which collected classification results are fed back to a server. According to the foregoing embodiment, after obtaining the classification result of the target image, the terminal device may upload the target image and the corresponding classification result to the server. Therefore, the server can receive the images and the classification results thereof sent by different terminal devices and continuously update the image classification model.

Specifically, referring to fig. 11, fig. 11 is a schematic diagram of updating model parameters by a server in the embodiment of the present application, as shown in the figure, taking a terminal device as an example of a vehicle-mounted terminal, the vehicle-mounted terminal may collect a video stream captured in real time, and then perform framing processing on the video stream to obtain a plurality of image frames. Based on the above, each image frame is divided into the input of the image detection model, and the region selected by the bounding box is obtained as the target image according to the output result of the image detection model, so that at least one target image can be obtained.

Taking a target image as an example, the target image is input into the image classification model, so as to obtain a classification result of the target image. The vehicle-mounted terminal uploads the target image and the corresponding classification result to the server, and background workers can label part or all of the target image regularly to obtain the labeled class label of the target image. Therefore, the consistency of the result can be calculated according to the class label of the target image and the reported classification result.

The labeled class labels belong to accurate results, the reported classification results are prediction results, and a total of 1000 target images are assumed, wherein the classification results of 800 target images are the same as the class labels, and the classification results of the remaining 200 target images are different from the class labels, so that the model accuracy of the image classification model is 80%. Based on this, assuming that the accuracy threshold is 90%, the model accuracy is smaller than the accuracy threshold at this time, and therefore the server needs to update the model parameters of the image classification model to obtain the target model parameters. Then, the server can feed back the updated target model parameters to the vehicle-mounted terminal, so that the vehicle-mounted terminal can update the image classification model.

Secondly, in the embodiment of the application, a mode of feeding back the collected classification result to the server is provided. Through the mode, the terminal equipment can report the identified classification result to the server regularly or quantitatively, on one hand, more samples can be provided for model updating, on the other hand, the model effect can be tested, and the model prediction accuracy can be maintained.

Referring to fig. 12, fig. 12 is a schematic view of an embodiment of the model training device in the embodiment of the present application, and the model training device 30 includes:

an obtaining module 310, configured to obtain a size feature of each first image sample in an original sample set, where the original sample set includes M first image samples, each first image sample is labeled with a corresponding category label, and M is an integer greater than 1;

the clustering module 320 is configured to cluster the original sample set according to the size characteristic of each first image sample to obtain N clustering results and a clustering center of each clustering result, where each clustering result includes at least one first image sample, N is an integer greater than 1 and less than M;

the processing module 330 is configured to, for each clustering result, perform scaling processing on the size of the first image sample in the clustering result according to the clustering center of the clustering result to obtain a target sample set corresponding to the clustering result;

the training module 340 is configured to update a model parameter of the image classification model according to the target sample set corresponding to each clustering result and the class label of each first image sample in the original sample set.

In the embodiment of the application, a model training device is provided. By adopting the device, the original sample set can be divided into a plurality of clustering results according to the size characteristics by adopting a clustering algorithm, namely, the image samples belonging to the same clustering result are more similar in size characteristics. Based on the method, the image samples included in the clustering results are subjected to self-adaptive size adjustment respectively based on the clustering centers of all the clustering results, so that the requirement of sample size consistency in batch training can be met, the images can be prevented from being seriously deformed to a certain extent, and the accuracy of model training is improved.

Alternatively, on the basis of the embodiment corresponding to fig. 12, in another embodiment of the model training device 30 provided in the embodiment of the present application,

an obtaining module 310, specifically configured to obtain an original sample set;

In the embodiment of the application, a model training device is provided. With the above apparatus, it is considered that the length-width ratio is precisely marked on the appearance shape, not the height value or the width value alone. Therefore, the height value and the width value of the first image sample are used as size features, the first image sample can be described more accurately, and the reliability of sample clustering is improved.

a clustering module 320, specifically configured to randomly generate N initial clustering centers;

In the embodiment of the application, a model training device is provided. By adopting the device, the k-means clustering algorithm has good clustering effect on numerical data, and the clustering result is irrelevant to the input sequence of the data, so that the device is more in line with the clustering scene provided by the application and has higher convergence rate.

In the embodiment of the application, a model training device is provided. With the above apparatus, it is considered that the length-width ratio is precisely marked on the appearance shape, not the height value or the width value alone. Therefore, the size proportion of the first image sample is used as the size characteristic, the first image sample can be described more accurately, and the reliability of sample clustering is improved.

the clustering module 320 is specifically configured to generate N clustering results according to the size features of each first image sample and according to N size ratio intervals, where each size ratio interval corresponds to one clustering result;

acquiring a central size proportion aiming at each size proportion interval;

In the embodiment of the application, a model training device is provided. By adopting the device, a plurality of size proportion intervals can be defined by users, and based on the size proportion of the first image sample, the clustering can be completed by directly dividing the size proportion intervals into the corresponding size proportion intervals, so that the clustering times are reduced, and the clustering efficiency is improved.

the processing module 330 is specifically configured to, for each clustering result, use a clustering center of the clustering result as a target size, where the target size includes a target height value and a target width value;

In the embodiment of the application, a model training device is provided. By adopting the device, the clustering center is directly used as the training size of the data set, on one hand, the size unification can be realized, and on the other hand, the clustering center has better representativeness, thereby being beneficial to achieving better size change effect.

In the embodiment of the application, a model training device is provided. By adopting the device, the clustering center is used as the training size proportion of the data set, and then one or more target sub-sizes are generated based on the size proportion according to the actual requirement, so that on one hand, the size unification can be realized, and on the other hand, the target sample sub-set can be constructed more flexibly.

the processing module 330 is further configured to determine, for each clustering result, that the clustering result satisfies the clustering grouping condition if the number of the first image samples in the clustering result is greater than or equal to the number threshold.

In the embodiment of the application, a model training device is provided. By adopting the device, the clustering result can be divided into a plurality of target sample subsets under the condition that a large number of first image samples are included in the clustering result, so that the overfitting condition can be avoided in the training process.

the processing module 330 is further configured to determine, for each clustering result, that the clustering result satisfies the clustering grouping condition if the size ratio of the first image sample in the clustering result is greater than or equal to the first size ratio, or the size ratio of the first image sample is less than or equal to the second size ratio.

In the embodiment of the application, a model training device is provided. By adopting the device, the clustering result can be divided into a plurality of target sample subsets under the condition that the first image sample size span is large, so that the overfitting condition can be avoided in the training process.

the training module 340 is specifically configured to generate a training sample set according to the target sample set corresponding to each clustering result;

In the embodiment of the application, a model training device is provided. By adopting the device, in order to avoid overfitting of the image classification model to the image sample with a certain size, the samples of each batch need to be alternately extracted in a staggered manner among different training sample sets, so that the situation of falling into local optimum is avoided.

the training module 340 is specifically configured to, for each second image sample in the training sample set, obtain, based on the second image sample, a first feature map through a convolution module included in the image classification model;

In the embodiment of the application, a model training device is provided. With the adoption of the device, the specific class of the object in the image can be determined by utilizing the image classification model, so that the feasibility and operability of the scheme are improved.

Optionally, on the basis of the embodiment corresponding to fig. 12, in another embodiment of the model training device 30 provided in the embodiment of the present application, the model training device 30 further includes a sending module 350;

the obtaining module 310 is further configured to update a model parameter of the image classification model according to the target sample set corresponding to each clustering result and the category label of each first image sample in the original sample set, and then generate a verification sample set according to the target sample set corresponding to each clustering result;

the obtaining module 310 is further configured to obtain, based on each second image sample in the verification sample set, a category distribution vector corresponding to each second image sample in the verification sample set through an image classification model;

the obtaining module 310 is further configured to determine, according to the class distribution vector corresponding to each second image sample in the verification sample set, a classification result corresponding to each second image sample in the verification sample set;

the obtaining module 310 is further configured to determine a model accuracy according to the classification result corresponding to each second image sample in the verification sample set and the category label of the first image sample corresponding to each second image sample in the verification sample set;

a sending module 350, configured to send the model parameters of the image classification model to at least one terminal device if the model accuracy is greater than or equal to the accuracy threshold.

In the embodiment of the application, a model training device is provided. By adopting the device, the training sample set is used for training the image classification model, the verification set is used for evaluating the trained image classification model, and the image classification model with the model accuracy meeting the requirement can be directly applied on line, so that the model has higher reliability and accuracy in practical application.

Referring to fig. 13, fig. 13 is a schematic diagram of an embodiment of an image classification apparatus in an embodiment of the present application, and the image classification apparatus 40 includes:

an obtaining module 410, configured to obtain a target image;

the classification module 420 is configured to obtain a classification result of the target image through an image classification model based on the target image, where the image classification model is obtained by training using the model training method provided in the foregoing embodiment.

In an embodiment of the application, an image classification device is provided. By adopting the device, the objects in the images can be classified based on the trained image classification model, and specific basis is provided for subsequent processing.

Optionally, on the basis of the embodiment corresponding to fig. 13, in another embodiment of the image classification device 40 provided in the embodiment of the present application, the image classification device 40 further includes a control module 430;

the obtaining module 410 is specifically configured to obtain, by an image collecting device, a target video frame collected at a current time when a target vehicle is running;

and the control module 430 is configured to, after obtaining the classification result of the target image through the image classification model based on the target image, control the driving state of the target vehicle according to a preset vehicle control strategy according to the classification result of the target image.

In an embodiment of the application, an image classification device is provided. By adopting the device, the road image collected in real time can be identified in the vehicle running process, and the vehicle running strategy is adjusted based on the identification result. Because the image classification model has a good classification effect, the safety of vehicle running can be improved in the automatic driving or auxiliary driving process.

Optionally, on the basis of the embodiment corresponding to fig. 13, in another embodiment of the image classification device 40 provided in the embodiment of the present application, the image classification device 40 further includes a sending module 440, a receiving module 450, and an updating module 460;

the sending module 440 is configured to send the classification result of the target image to the server after the classification result of the target image is obtained through the image classification model based on the target image, so that the server determines the model accuracy of the image classification model according to the classification result of the target image and the category label labeled for the target image, and if the model accuracy is smaller than an accuracy threshold, update the model parameters to obtain target model parameters;

a receiving module 450, configured to receive the target model parameters sent by the server;

an updating module 460, configured to update the model parameters of the image classification model to the target model parameters.

In an embodiment of the application, an image classification device is provided. By adopting the device, the terminal equipment can report the identified classification result to the server regularly or quantitatively, on one hand, more samples can be provided for model updating, on the other hand, the model effect can be checked, and the model prediction accuracy can be maintained.

In an embodiment of the present application, another model training apparatus is provided, the model training apparatus may be deployed in a server, for easy understanding, please refer to fig. 14, fig. 14 is a schematic structural diagram of a server provided in an embodiment of the present application, the server 500 may generate a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 522 (e.g., one or more processors) and a memory 532, and one or more storage media 530 (e.g., one or more mass storage devices) storing an application program 542 or data 544. Memory 532 and storage media 530 may be, among other things, transient storage or persistent storage. The program stored on the storage medium 530 may include one or more modules (not shown), each of which may include a series of instruction operations for the server. Still further, the central processor 522 may be configured to communicate with the storage medium 530, and execute a series of instruction operations in the storage medium 530 on the server 500.

The Server 500 may also include one or more power supplies 526, one or more wired or wireless network interfaces 550, one or more input-output interfaces 558, and/or one or more operating systems 541, such as a Windows Server^TM，Mac OS X^TM，Unix^TM, Linux^TM，FreeBSD^TMAnd so on.

The steps performed by the server in the above embodiment may be based on the server structure shown in fig. 14.

The embodiment of the present application further provides another model training device and an image classification device, where the model training device and the image classification device may be deployed in a terminal device, and as shown in fig. 15, for convenience of description, only a part related to the embodiment of the present application is shown, and details of a specific technology are not disclosed, please refer to a method part of the embodiment of the present application. In the embodiment of the present application, a terminal device is taken as an example to explain:

fig. 15 is a block diagram illustrating a partial structure of a smartphone related to a terminal device provided in an embodiment of the present application. Referring to fig. 15, the smart phone includes: radio Frequency (RF) circuitry 610, memory 620, input unit 630, display unit 640, sensor 650, audio circuitry 660, wireless fidelity (WiFi) module 670, processor 680, and power supply 690. Those skilled in the art will appreciate that the smartphone configuration shown in fig. 15 is not intended to be limiting and may include more or fewer components than shown, or some components in combination, or a different arrangement of components.

In the embodiment of the present application, the CPU 522 included in the server also has the following functions:

The following describes each component of the smartphone in detail with reference to fig. 15:

the RF circuit 610 may be used for receiving and transmitting signals during information transmission and reception or during a call, and in particular, receives downlink information of a base station and then processes the received downlink information to the processor 680; in addition, the data for designing uplink is transmitted to the base station. In general, the RF circuit 610 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a Low Noise Amplifier (LNA), a duplexer, and the like. In addition, the RF circuitry 610 may also communicate with networks and other devices via wireless communications. The wireless communication may use any communication standard or protocol, including but not limited to global system for mobile communications (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), email, Short Message Service (SMS), etc.

The memory 620 may be used to store software programs and modules, and the processor 680 may execute various functional applications and data processing of the smart phone by operating the software programs and modules stored in the memory 620. The memory 620 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the smartphone, and the like. Further, the memory 620 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

The input unit 630 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the smartphone. Specifically, the input unit 630 may include a touch panel 631 and other input devices 632. The touch panel 631, also referred to as a touch screen, may collect touch operations of a user (e.g., operations of the user on the touch panel 631 or near the touch panel 631 by using any suitable object or accessory such as a finger or a stylus) thereon or nearby, and drive the corresponding connection device according to a preset program. Alternatively, the touch panel 631 may include two parts of a touch detection device and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 680, and can receive and execute commands sent by the processor 680. In addition, the touch panel 631 may be implemented using various types, such as resistive, capacitive, infrared, and surface acoustic wave. The input unit 630 may include other input devices 632 in addition to the touch panel 631. In particular, other input devices 632 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a mouse, a joystick, and the like.

The display unit 640 may be used to display information input by or provided to the user and various menus of the smartphone. The display unit 640 may include a display panel 641, and optionally, the display panel 641 may be configured in the form of a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), or the like. Further, the touch panel 631 can cover the display panel 641, and when the touch panel 631 detects a touch operation thereon or nearby, the touch panel is transmitted to the processor 680 to determine the type of the touch event, and then the processor 680 provides a corresponding visual output on the display panel 641 according to the type of the touch event. Although in fig. 15, the touch panel 631 and the display panel 641 are two separate components to implement the input and output functions of the smart phone, in some embodiments, the touch panel 631 and the display panel 641 may be integrated to implement the input and output functions of the smart phone.

The smartphone may also include at least one sensor 650, such as a light sensor, motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor that may adjust the brightness of the display panel 641 according to the brightness of ambient light, and a proximity sensor that may turn off the display panel 641 and/or the backlight when the smartphone is moved to the ear. As one of the motion sensors, the accelerometer sensor may detect the magnitude of acceleration in each direction (generally, three axes), detect the magnitude and direction of gravity when stationary, and may be used for applications (such as horizontal and vertical screen switching, related games, magnetometer attitude calibration), vibration recognition related functions (such as pedometer, tapping) and the like for recognizing the attitude of the smartphone, and other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, an infrared sensor and the like may be further configured for the smartphone, which will not be described herein again.

Audio circuit 660, speaker 661, microphone 662 can provide an audio interface between the user and the smartphone. The audio circuit 660 may transmit the electrical signal converted from the received audio data to the speaker 661, and convert the electrical signal into an audio signal through the speaker 661 for output; on the other hand, the microphone 662 converts the collected sound signals into electrical signals, which are received by the audio circuit 660 and converted into audio data, which are processed by the audio data output processor 680 and then passed through the RF circuit 610 to be sent to, for example, another smartphone or output to the memory 620 for further processing.

WiFi belongs to short-distance wireless transmission technology, and the smart phone can help a user to receive and send e-mails, browse webpages, access streaming media and the like through the WiFi module 670, and provides wireless broadband internet access for the user. Although fig. 15 shows the WiFi module 670, it is understood that it does not belong to the essential constitution of the smartphone and can be omitted entirely as needed within the scope not changing the essence of the invention.

The processor 680 is a control center of the smart phone, connects various parts of the entire smart phone using various interfaces and lines, and performs various functions of the smart phone and processes data by operating or executing software programs and/or modules stored in the memory 620 and calling data stored in the memory 620. Optionally, processor 680 may include one or more processing units; optionally, the processor 680 may integrate an application processor and a modem processor, wherein the application processor mainly handles operating systems, user interfaces, application programs, and the like, and the modem processor mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into processor 680.

The smartphone also includes a power supply 690 (e.g., a battery) that provides power to the various components, optionally, the power supply may be logically connected to the processor 680 via a power management system, so that functions such as managing charging, discharging, and power consumption are implemented via the power management system.

Although not shown, the smart phone may further include a camera, a bluetooth module, and the like, which are not described herein.

In this embodiment, the processor 680 included in the terminal device further has the following functions:

acquiring a target image;

and acquiring a classification result of the target image through an image classification model based on the target image.

The steps performed by the terminal device in the above-described embodiment may be based on the terminal device configuration shown in fig. 15.

Embodiments of the present application also provide a computer-readable storage medium, in which a computer program is stored, and when the computer program runs on a computer, the computer is caused to execute the method described in the foregoing embodiments.

Embodiments of the present application also provide a computer program product including a program, which, when run on a computer, causes the computer to perform the methods described in the foregoing embodiments.

It is understood that, in the specific implementation of the present application, the data related to the first image sample and the second image sample, etc. need to obtain user permission or consent when the above embodiments of the present application are applied to specific products or technologies, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related countries and regions.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. A method of model training, comprising:

clustering the original sample set according to the size characteristics of each first image sample to obtain N clustering results and a clustering center of each clustering result, wherein each clustering result comprises at least one first image sample, and N is an integer greater than 1 and smaller than M;

2. The method of claim 1, wherein obtaining the size characteristic of each first image sample in the original sample set comprises:

acquiring the original sample set;

3. The method according to claim 2, wherein the clustering the original sample set according to the size characteristic of each first image sample to obtain N clustering results and a clustering center of each clustering result comprises:

randomly generating N initial clustering centers;

dividing the original sample set into N clustering clusters according to the distance between each first image sample and each initial clustering center;

4. The method of claim 1, wherein obtaining the size characteristic of each first image sample in the original sample set comprises:

acquiring the original sample set;

and regarding the size proportion of each first image sample as the size characteristic of the first image sample.

5. The method according to claim 4, wherein the clustering the original sample set according to the size characteristic of each first image sample to obtain N clustering results and a clustering center of each clustering result comprises:

generating the N clustering results according to N size proportion intervals according to the size characteristics of each first image sample, wherein each size proportion interval corresponds to one clustering result;

acquiring a central size proportion aiming at each size proportion interval;

6. The method according to claim 1, wherein the scaling the size of the first image sample in the clustering result according to the clustering center of the clustering result for each clustering result to obtain the target sample set corresponding to the clustering result comprises:

regarding each clustering result, taking a clustering center of the clustering result as a target size, wherein the target size comprises a target height value and a target width value;

and for each clustering result, scaling the height value of a first image sample in the clustering result to a target height value, and scaling the width value of the first image sample in the clustering result to a target width value to obtain a target sample set corresponding to the clustering result, wherein the target sample set comprises at least one second image sample, and the second image sample is the first image sample after size adjustment.

7. The method according to claim 1, wherein the scaling the size of the first image sample in the clustering result according to the clustering center of the clustering result for each clustering result to obtain the target sample set corresponding to the clustering result comprises:

for each clustering result, if the clustering result meets a clustering grouping condition, taking a clustering center of the clustering result as a target size, wherein the target size comprises a target height value and a target width value;

and aiming at each clustering result, obtaining a target sample set corresponding to the clustering result according to the K target sub-sizes, wherein the target sample set comprises K target sample sub-sets, and each second image sample in the same target sample sub-set has the same target sub-size.

8. The method of claim 7, further comprising:

9. The method of claim 7, further comprising:

and for each clustering result, if the size proportion of the first image samples in the clustering result is larger than or equal to a first size proportion, or the size proportion of the first image samples is smaller than or equal to a second size proportion, determining that the clustering result meets a clustering grouping condition.

10. The method according to any one of claims 1 to 9, wherein the updating the model parameters of the image classification model according to the target sample set corresponding to each clustering result and the class label of each first image sample in the original sample set comprises:

based on each second image sample in the training sample set, obtaining a class distribution vector corresponding to each second image sample in the training sample set through the image classification model;

11. The method according to claim 10, wherein the obtaining, by the image classification model, a class distribution vector corresponding to each second image sample in the training sample set based on each second image sample in the training sample set comprises:

for each second image sample in the training sample set, based on the second image sample, obtaining a first feature map through a convolution module included in the image classification model;

for each second image sample in the training sample set, based on a first feature map corresponding to the second image sample, obtaining a second feature map through T residual convolution modules included in the image classification model, where T is an integer greater than or equal to 1;

for each second image sample in the training sample set, obtaining a target feature vector through a pooling layer included in the image classification model based on a second feature map corresponding to the second image sample;

and for each second image sample in the training sample set, obtaining a category distribution vector through a full connection layer included in the image classification model based on a target feature vector corresponding to the second image sample.

12. The method according to claim 10, wherein after the updating the model parameters of the image classification model according to the target sample set corresponding to each clustering result and the class label of each first image sample in the original sample set, the method further comprises:

based on each second image sample in the verification sample set, obtaining a category distribution vector corresponding to each second image sample in the verification sample set through the image classification model;

determining model accuracy according to the classification result corresponding to each second image sample in the verification sample set and the class label of the first image sample corresponding to each second image sample in the verification sample set;

13. A method of image classification, comprising:

acquiring a target image;

obtaining a classification result of the target image through an image classification model based on the target image, wherein the image classification model is obtained by training through the method of any one of claims 1 to 12.

14. The method of claim 13, wherein the acquiring a target image comprises:

when a target vehicle runs, acquiring a target video frame acquired at the current moment through an image acquisition device;

intercepting the target image from the target video frame through an image detection model based on the target video frame;

after the classification result of the target image is obtained through an image classification model based on the target image, the method further comprises the following steps:

and controlling the running state of the target vehicle according to a preset vehicle control strategy according to the classification result of the target image.

15. The method according to claim 13 or 14, wherein after the obtaining of the classification result of the target image through the image classification model based on the target image, the method further comprises:

sending the classification result of the target image to a server, so that the server determines the model accuracy of the image classification model according to the classification result of the target image and the class label labeled for the target image, and if the model accuracy is smaller than an accuracy threshold, updating the model parameters to obtain target model parameters;

receiving the target model parameters sent by the server;

and updating the model parameters of the image classification model into the target model parameters.

16. A model training apparatus, comprising:

the image processing device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring the size characteristics of each first image sample in an original sample set, the original sample set comprises M first image samples, each first image sample is labeled with a corresponding class label, and M is an integer greater than 1;

a clustering module, configured to cluster the original sample set according to a size characteristic of each first image sample to obtain N clustering results and a clustering center of each clustering result, where each clustering result includes at least one first image sample, and N is an integer greater than 1 and smaller than M;

the processing module is used for carrying out scaling processing on the size of a first image sample in the clustering result according to the clustering center of the clustering result aiming at each clustering result to obtain a target sample set corresponding to the clustering result;

17. An image classification apparatus, comprising:

the acquisition module is used for acquiring a target image;

a classification module, configured to obtain a classification result of the target image through an image classification model based on the target image, where the image classification model is obtained by training according to any one of the methods of claims 1 to 12.

18. A computer device, comprising: a memory, a processor, and a bus system;

wherein the memory is used for storing programs;

the processor is configured to execute the program in the memory, the processor is configured to perform the method of any one of claims 1 to 12 or the method of any one of claims 13 to 15 according to instructions in the program code;

19. A computer-readable storage medium comprising instructions which, when executed on a computer, cause the computer to perform the method of any of claims 1 to 12, or perform the method of any of claims 13 to 15.

20. A computer program product comprising a computer program and instructions, characterized in that the computer program/instructions, when executed by a processor, implements the method according to any one of claims 1 to 12, or implements the method according to any one of claims 13 to 15.