CN110889428A

CN110889428A - Image recognition method and device, computer equipment and storage medium

Info

Publication number: CN110889428A
Application number: CN201911000955.6A
Authority: CN
Inventors: 陈丽娟; 杨晓飞; 侯利杰; 胡惜阳
Original assignee: Zhejiang Dasou Vehicle Software Technology Co Ltd
Current assignee: Zhejiang Dasou Vehicle Software Technology Co Ltd
Priority date: 2019-10-21
Filing date: 2019-10-21
Publication date: 2020-03-17

Abstract

The application relates to an image recognition method, an image recognition device, a computer device and a storage medium. The method comprises the following steps: acquiring an image data set to be identified; inputting the image data set to be identified into a classification model, and outputting a type label corresponding to the image data set to be identified by the classification model; the classification model is obtained by taking each training picture in the training picture set as input and taking the type label of the corresponding training picture as output to train a preset network model; the network model includes: the convolutional neural network model is used for extracting basic features of the training picture, the first convolutional neural network is used for convolving the basic features to obtain local features of the training picture, and the second convolutional neural network is used for convolving the basic features to obtain global features of the training picture. By adopting the method, the image classification and identification accuracy and efficiency can be improved.

Description

Image recognition method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to an image recognition method and apparatus, a computer device, and a storage medium.

Background

With the increase in the amount of motor vehicles kept, the probability of vehicle damage caused by traffic accidents and the like has increased year by year. After the vehicle is damaged, the vehicle damage is generally required to be identified as a basis for vehicle maintenance and insurance claim settlement. Generally, the identification of vehicle damage is mostly based on manual on-site survey identification, or by taking pictures and defining the vehicle damage type through manual remote assistance. However, the vehicle damage identification result is greatly influenced by subjective factors such as the detection capability of the surveyor, and therefore, the accuracy of the result is often low through manual identification.

In view of this, in some prior art, the vehicle damage type is identified by extracting the picture features through the existing model, however, the existing model divides the type more coarsely, and if the existing model is applied to a complex scene, the problems of low identification accuracy and low identification efficiency are likely to occur.

Disclosure of Invention

In view of the above, it is necessary to provide an image recognition method, an image recognition apparatus, a computer device, and a storage medium, which can improve recognition accuracy and recognition efficiency.

An image recognition method, the method comprising:

acquiring an image dataset;

inputting the image data set to be identified into a classification model, and outputting a type label corresponding to the image data set to be identified by the classification model;

the classification model is obtained by taking each training picture in the training picture set as input and taking the type label of the corresponding training picture as output to train a preset network model;

the preset network model comprises: the convolutional neural network model is used for extracting basic features of the training picture, the first convolutional neural network is used for convolving the basic features to obtain local features of the training picture, and the second convolutional neural network is used for convolving the basic features to obtain global features of the training picture.

In one embodiment, the method further comprises:

before acquiring an image data set to be identified, acquiring a training picture set;

carrying out data cleaning on the acquired training picture set;

acquiring a type label corresponding to each training picture in a training picture set after data cleaning;

and acquiring the preset network model, and training to obtain the classification model by taking each training picture in the training picture set after data cleaning as the input of the preset network model and taking the type label of the corresponding training picture as the output of the preset network model.

In one embodiment, the method further comprises:

carrying out data cleaning on the acquired training picture set;

performing data increment on the training picture set after data cleaning;

acquiring a type label corresponding to each training picture in the training picture set after data increment;

and acquiring the preset network model, and training to obtain the classification model by taking each training picture in the training picture set after the data increment as the input of the preset network model and taking the type label of the corresponding training picture as the output of the preset network model.

In one embodiment, the preset network model further includes a first loss function connected to the output end of the first convolutional neural network, a second loss function connected to the output end of the second convolutional neural network, and an adder configured to add a loss value output by the first loss function and a loss value output by the second loss function to obtain a loss value of the preset network model.

In one embodiment, the first loss function is a cross-entropy loss function; and/or the second loss function is a cross-entropy loss function.

In one embodiment, the first convolutional neural network comprises: the system comprises n × k convolution kernels used for convolving the basic features to obtain a first convolution result, a first pooling function used for globally pooling the first convolution result to obtain n × k feature vectors, and a second pooling function used for dividing the n × k feature vectors into k n-dimensional feature vectors and globally pooling each n-dimensional feature vector to obtain local features of the training picture; wherein k is the category number of the type label; n is a positive integer.

In one embodiment, the second convolutional neural network comprises: k convolution kernels for convolving the base features to obtain a first convolution result, and a third pooling function for globally pooling the first convolution result to obtain global features of the training picture.

An image recognition device comprises an image acquisition module and an image recognition module.

The image acquisition module is used for acquiring an image data set to be identified;

the image identification module is used for inputting the image data set to be identified into a classification model and outputting a type label corresponding to the image data set to be identified by the classification model;

A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:

acquiring an image data set to be identified;

A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:

acquiring an image data set to be identified;

According to the image identification method, the image identification device, the computer equipment and the storage medium, each training picture in the training picture set is used as input, the type label is used as output, the preset network model is trained to obtain the classification model, and automatic identification of the vehicle damage type can be achieved. The classification model comprises a convolutional neural network model used for extracting basic features of the training picture, a first path of convolutional neural network used for convolving the basic features to obtain local features of the training picture, and a second path of convolutional neural network used for convolving the basic features to obtain global features of the training picture, so that the classification model obtained by training can extract the local features and the global features of the image data set to be recognized, the probability of classification errors caused by global feature loss is reduced, the recognition accuracy is improved, fine-grained classification is realized, the classification model can be applied to complex scenes, and the recognition efficiency is improved.

Drawings

FIG. 1 is a diagram illustrating an exemplary application of an image recognition method according to an embodiment;

FIG. 2 is a flowchart illustrating an image recognition method according to an embodiment;

fig. 3 is a schematic diagram of a model structure of a preset network model according to an embodiment;

FIG. 4 is a flow chart illustrating an image recognition method according to another embodiment;

FIG. 5 is a flow chart illustrating an image recognition method according to yet another embodiment;

FIG. 6 is a block diagram showing the structure of an image recognition apparatus according to an embodiment;

FIG. 7 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The terms "first," "second," and the like in the description and in the claims of the embodiments of the application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.

It should be understood that the term "and/or" in the specification and claims of the embodiments of the present application refers to any and all possible combinations including one or more of the listed items, e.g., "a; and/or, B; "comprises: a combination of A and B, A and B ', A' and B; wherein B 'is different from B and A' is different from A.

The image identification method provided by the application can be applied to the application environment shown in fig. 1 to solve the problem of vehicle damage type identification or other image identification and classification problems. For example, in the application environment, a user may send a live image dataset of a damaged vehicle to the computing platform 104 through a network through the terminal 102, such as a smart phone, a camera, a tablet computer, a laptop computer, or a sensor, which may collect the live image dataset, and the computing platform 104 may identify the vehicle damage type of the damaged vehicle according to the live image dataset sent by the user.

In one embodiment, as shown in fig. 2, an image recognition method is provided, and the execution subject of the method can be any system, device, apparatus, platform or server with computing and processing capability, such as the computing platform shown in fig. 1, and the method can be applied to the computing platform in the form of a computer program. The method comprises the following steps:

step S202, an image data set to be identified is obtained.

In step S202, as a specific application example, the image data set to be recognized may be a picture or dynamic image data of the damaged vehicle collected from a place where the damaged vehicle is located. The image data set to be identified can be acquired by means of an image acquisition device at the location (site) of the damaged vehicle. For example, pictures of the damaged vehicle can be acquired through a smart phone, a camera, and the like. The picture can be a close-up photograph of the damaged vehicle or a distant photograph of the damaged vehicle. The dynamic image of the damaged vehicle may also be acquired by an image acquisition device such as a smart phone and a camera, which is not limited in the present invention.

And step S204, inputting the image data set to be recognized into a classification model, and outputting a type label corresponding to the image data set to be recognized by the classification model.

The classification model is obtained by taking each training picture in the training picture set as input and taking the type label of the corresponding training picture as output to train a preset network model.

The type tag may be, but is not limited to, used to indicate the type of vehicle injury. The types of vehicle damage may include scratches, paint drops, wrinkles, perforations, and/or other types of vehicle damage, as the present invention is not limited in this respect. Specifically, the type label is obtained by labeling the training pictures with the vehicle damage images in the training picture set in a manual labeling mode.

As shown in fig. 3, the preset network model 2 includes: the convolutional neural network model 10 is used for extracting basic features of the training picture, the first convolutional neural network 12 is used for convolving the basic features to obtain local features of the training picture, and the second convolutional neural network 14 is used for convolving the basic features to obtain global features of the training picture.

The convolutional neural Network model is an existing convolutional neural Network model, which may be a ResNet (residual neural Network) series, a vgg (visual Geometry group) series, or other convolutional neural Network models. The ResNet series includes, but is not limited to, ResNet _ v1 and ResNet _ v 2. In this embodiment, the convolutional neural network model is ResNet _ v2_ 50.

The basic features refer to high-level features obtained by performing feature extraction on the pictures through a convolutional neural network model, and are also called as image depth level features.

A local feature is a local representation of a picture feature, which includes edges, corners, lines, curves, and regions of particular attributes, among others. Currently, the mainstream local feature extraction operators include SIFT (Scale-invariant feature transform) or SURF (Speeded Up Robust Features). The local features have the characteristics of abundant content in the picture, small correlation degree among the features, no influence on detection and matching of other features due to disappearance of partial features under the shielding condition and the like.

Global features refer to the overall properties of a picture, and common global features include color features, texture features, and shape features, such as intensity histograms, and the like.

In the embodiment, the type label is used as output, and the preset network model is trained to obtain the classification model so as to realize automatic identification of the vehicle damage type. The preset network model comprises a convolutional neural network model used for extracting basic features of the training picture, a first convolutional neural network used for convolving the basic features to obtain local features of the training picture, and a second convolutional neural network used for convolving the basic features to obtain global features of the training picture, so that the classification model obtained by training can extract the local features and the global features of the image data set to be recognized, the probability of classification errors caused by global feature loss is reduced, the recognition accuracy is improved, fine-grained classification is realized, the classification model can be applied to a complex scene, and the recognition efficiency is improved.

Specifically, the first convolutional neural network includes: the system comprises n x k convolution kernels for convolving the basic features to obtain a first convolution result, a first pooling function for globally pooling the first convolution result to obtain n x k feature vectors, and a second pooling function for dividing the n x k feature vectors into k n-dimensional feature vectors and globally pooling each n-dimensional feature vector to obtain local features of the training picture.

Wherein k is the number of categories of the type label. The local features of the training picture extracted through the first path of convolutional neural network are k-dimensional feature vectors. Each dimension of the feature vector corresponds to a type label. For example, if the model is designed to recognize 4 types of vehicle damage, i.e., scratch, paint, and wrinkle, the number k of types of type tags is 4. It should be noted that k — 4 is only an exemplary illustration, and the number of k is not limited in the present invention.

n is a positive integer. It can be understood that, convolving the basic features by n × k convolution kernels, and then performing global pooling on the first convolution result may obtain an n × k feature vector, that is, n local features are extracted for each category of k vehicle damage types. For example, 10 local features are extracted for a type of tag that is "scratch". The larger n is, the stronger the adaptability of the model to the complex scene is.

The size of the convolution kernel may be 3 × 3, 5 × 5, etc., and in the present embodiment, the size of the convolution kernel is 3 × 3.

In this embodiment, the first pooling function and the second pooling function are both global average pooling functions. In other embodiments, the first pooling function may be a global maximum pooling function; the second pooling function may be a global maximum pooling function.

Specifically, the second convolutional neural network includes: k convolution kernels for convolving the base features to obtain a first convolution result, and a third pooling function for globally pooling the first convolution result to obtain global features of the training picture.

The global features of the training picture extracted through the second path of convolutional neural network are k-dimensional feature vectors, and each dimension of the feature vectors corresponds to one type of label.

In this embodiment, the third pooling function is a global average pooling function. In other embodiments, the third pooling function may be a global maximum pooling function.

Specifically, the preset network model 2 further includes a first loss function 16 connected to the output end of the first convolutional neural network, a second loss function 18 connected to the output end of the second convolutional neural network, and an adder 20 for adding the loss value output by the first loss function and the loss value output by the second loss function to obtain a loss value of the preset network model.

The loss value of the preset network model is the basis for updating the model parameters in the network model. In the process of training the classification model, each weight in the network model is updated by back-propagating the loss value, so that the loss value of the network model is minimized, and the network model training is completed.

In this embodiment, the first loss function is a cross-entropy loss function, and the second loss function is a cross-entropy loss function. In other embodiments, the first loss function is a loss function other than the cross-entropy loss function, e.g., a squared error loss function, and the second loss function is a cross-entropy loss function; or the first loss function is a cross entropy loss function, and the second loss function is a loss function different from the cross entropy loss function; alternatively, the first and second loss functions may also be other loss functions, such as squared error loss functions.

Specifically, the method further comprises:

before step S202, a training picture set is acquired;

performing data increment on the acquired training picture set;

and acquiring a preset network model, and training to obtain the classification model by taking each training picture in the training picture set after the data increment as the input of the preset network model and taking the type label of the corresponding training picture as the output of the preset network model.

Specifically, the performing data increment on the acquired training picture set includes:

preprocessing the training pictures in the acquired training picture set;

and determining the preprocessed training picture as a newly added training picture.

The preprocessing of the training pictures in the acquired training picture set includes at least one of the following modes:

carrying out color adjustment on the training pictures in the acquired training picture set;

cutting the training pictures in the acquired training picture set;

and rotating the training pictures in the acquired training picture set.

It should be understood that, although the steps in the flowchart of fig. 2 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 2 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

As described above, as a specific application example, the present application may be applied to the field of vehicle damage type identification, where the image data set to be identified may be a captured picture of a damaged vehicle or captured dynamic image data of the damaged vehicle. If the image is the image, the image to be recognized can be directly input into the classification model, and the classification model outputs the type label corresponding to the image to be recognized. If the dynamic image is the dynamic image, the dynamic image can be cut and converted into a picture by adopting the existing image processing technology before the classification model is input, and then the classification model is input for identification and marking. In another embodiment, as shown in fig. 4, there is provided an image recognition method including the steps of:

and S401, acquiring a training picture set.

The training pictures in the training picture set can be vehicle pictures which are acquired offline and accumulated in a large batch of histories, and can also be vehicle pictures acquired through a web crawler tool.

And S402, performing data cleaning on the acquired training picture set.

Specifically, step S402 includes:

and deleting the fuzzy pictures and the local close shots in the acquired training picture set.

The local close-up refers to a picture that the type of the vehicle damage cannot be judged by human eyes.

And S403, performing data increment on the training picture set subjected to data cleaning.

Specifically, the performing data increment on the training picture set with cleaned data includes:

preprocessing the training pictures in the training picture set with the cleaned data;

The preprocessing of the training pictures in the training picture set with cleaned data comprises at least one of the following modes:

performing color adjustment on training pictures in the training picture set subjected to data cleaning;

cutting training pictures in a training picture set with cleaned data;

and rotating the training pictures in the training picture set of which the data is cleaned.

Specifically, the method further comprises: before step S403, manually labeling training pictures with vehicle damage images in a training picture set with data cleaned, so as to obtain a vehicle damage category label corresponding to each training picture;

performing data increment on the training picture set subjected to data cleaning, including:

and determining the preprocessed training picture as a newly added training picture, and labeling the newly added training picture with a vehicle damage category label corresponding to the training picture before preprocessing.

And S404, acquiring a type label corresponding to each training picture in the training picture set after data increment.

Step S405, acquiring a preset network model, and training to obtain a classification model by taking each training picture in the training picture set after data increment as the input of the preset network model and taking the type label of the corresponding training picture as the output of the preset network model.

Step S406, in response to receiving the identification request instruction, acquires an image data set to be identified.

Step S407, inputting the image data set to be recognized into the classification model, and outputting the type label corresponding to the image data set to be recognized by the classification model.

In the present embodiment, steps S406 to S407 are applied to the server in the form of API, and an identification request instruction is sent by the terminal 102 to the computing platform 104, where the identification request instruction is used to request the computing platform 104 to call the API interface. Computing platform 104 performs steps S406-S407 upon receiving the request.

The preset network model includes: the convolutional neural network model is used for extracting basic features of the training picture, the first convolutional neural network is used for convolving the basic features to obtain local features of the training picture, and the second convolutional neural network is used for convolving the basic features to obtain global features of the training picture.

In the embodiment, the type label is used as output, and the preset network model is trained to obtain the classification model, so that the automatic identification of the vehicle damage type is realized. The classification model comprises a convolutional neural network model used for extracting basic features of the training picture, a first path of convolutional neural network used for convolving the basic features to obtain local features of the training picture, and a second path of convolutional neural network used for convolving the basic features to obtain global features of the training picture, so that the local features and the global features of the picture to be recognized can be extracted by the trained classification model, the probability of classification errors caused by global feature loss is reduced, the recognition accuracy is improved, fine-grained classification is realized, the classification model can be applied to scenes with more vehicle damage types, and the recognition efficiency is improved. Through data cleaning, the efficiency of model training can be improved, and the workload of manual labeling is reduced. Through data increment, the defect of insufficient sample quantity can be made up, and the accuracy of model training is improved.

It should be understood that, although the steps in the flowchart of fig. 4 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 4 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

In yet another embodiment, as shown in fig. 5, an image recognition method is provided, and the execution subject of the method can be any system, device, apparatus, platform or server with computing and processing capabilities, such as the computing platform shown in fig. 1. The method comprises the following steps:

step S601, before the image data set to be identified is obtained, a training picture set is obtained.

And step S602, performing data cleaning on the acquired training picture set.

And S603, acquiring a type label corresponding to each training picture in the training picture set after data cleaning.

Specifically, the method further comprises: before step S603, manually labeling training pictures with vehicle damage images in the training picture set after data cleaning to obtain a vehicle damage category label corresponding to each training picture.

Specifically, the difference between the number of training pictures corresponding to each type of label and the preset number is smaller than a preset threshold. The accuracy of model identification is improved by controlling the number of training pictures corresponding to different labels.

In this embodiment, the predetermined number is 1000. In other embodiments, the preset number may be set according to actual situations, which is not limited in the present invention.

Step S604, acquiring a preset network model, and training to obtain a classification model by taking each training picture in the training picture set after data cleaning as the input of the preset network model and taking the type label of the corresponding training picture as the output of the preset network model.

Specifically, the method further comprises: before step S604, a network model is constructed.

Wherein, the network model includes: the convolutional neural network model is used for extracting basic features of the training picture, the first convolutional neural network is used for convolving the basic features to obtain local features of the training picture, and the second convolutional neural network is used for convolving the basic features to obtain global features of the training picture.

The first path of convolutional neural network comprises: the system comprises n x k convolution kernels used for convolving basic features to obtain a first convolution result, a first pooling function used for performing global pooling on the first convolution result to obtain n x k feature vectors, and a second pooling function used for dividing the n x k feature vectors into k n-dimensional feature vectors and performing global pooling on each n-dimensional feature vector to obtain local features of a training picture; wherein k is the category number of the type label; n is a positive integer.

The second convolutional neural network comprises: k convolution kernels for convolving the base features to obtain a first convolution result, and a third pooling function for globally pooling the first convolution result to obtain global features of the training picture.

In this embodiment, the first pooling function, the second pooling function and the third pooling function are all global average pooling functions.

Specifically, the network model further includes a first loss function connected to the output end of the first convolutional neural network, a second loss function connected to the output end of the second convolutional neural network, and an adder for adding a loss value output by the first loss function and a loss value output by the second loss function to obtain a loss value of the preset network model.

Specifically, the first convolutional neural network is connected with the first loss function through a first softmax function, and is connected with the second loss function through a second softmax function.

The first softmax function is used for representing local features output by the first path of convolutional neural network as a first probability vector; the second softmax function is used to represent all features output by the second convolutional neural network as a second probability vector.

Specifically, when training the network model, the process of determining the loss of the network model is as follows:

obtaining a type label vector; the type label vector is obtained by encoding the type label; the encoding mode can be one-hot (one-hot) encoding;

inputting the first probability vector and the type label vector to a first loss function to output a loss value through the first loss function;

inputting the second probability vector and the type label vector into a second loss function to output a loss value through the second loss function;

and obtaining the loss value output by the adder as the loss value of the preset network model.

In this embodiment, the first loss function is a cross-entropy loss function, and the second loss function is a cross-entropy loss function.

The cross entropy loss function is:

the method comprises the steps of obtaining a first loss function, a second loss function and a third loss function, wherein fp represents a characteristic vector, α (fp) is a probability vector obtained by converting the characteristic vector fp through a softmax function, k represents a vehicle damage type, and q represents a type label vector, wherein for the first loss function, the characteristic vector fp is a characteristic vector of local characteristics extracted through the first path of convolutional neural network, and for the second loss function, the characteristic vector fp is a characteristic vector of global characteristics extracted through the second path of convolutional neural network.

Step S605, an image dataset to be recognized is acquired.

Step S606, inputting the image data set to be recognized into the classification model, and outputting the type label corresponding to the image data set to be recognized by the classification model.

In the embodiment, the type label is used as output, and the preset network model is trained to obtain the vehicle damage classification model so as to realize automatic identification of the vehicle damage type. The preset network model comprises a convolutional neural network model used for extracting basic features of the training picture, a first convolutional neural network used for convolving the basic features to obtain local features of the training picture, and a second convolutional neural network used for convolving the basic features to obtain global features of the training picture, so that the classification model obtained by training can extract the local features and the global features of the picture to be recognized, the probability of classification errors caused by global feature loss is reduced, the recognition accuracy is improved, fine-grained classification is realized, the class model can be applied to scenes with more vehicle damage types, and the recognition efficiency is improved. Through data cleaning, the efficiency of model training can be improved, and the workload of manual labeling is reduced.

It should be understood that, although the steps in the flowchart of fig. 5 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 5 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

In one embodiment, as shown in fig. 6, an image recognition apparatus 1 is provided, which includes an image acquisition module 101 and an image recognition module 102.

The image acquisition module 101 is configured to acquire an image dataset to be identified.

The image recognition module 102 is configured to input the image data set to be recognized to a classification model, and output a type tag corresponding to the image data set to be recognized by the classification model.

The class model is obtained by taking each training picture in the training picture set as input and taking the type label of the corresponding training picture as output to train a preset network model;

the preset network model comprises the following steps: the convolutional neural network model is used for extracting basic features of the training picture, the first convolutional neural network is used for convolving the basic features to obtain local features of the training picture, and the second convolutional neural network is used for convolving the basic features to obtain global features of the training picture.

In one embodiment, the apparatus further comprises:

the first training picture acquisition module is used for acquiring a training picture set before acquiring an image data set to be identified;

the first data cleaning module is used for cleaning the data of the acquired training picture set;

the first label acquisition module is used for acquiring a type label corresponding to each training picture in the training picture set after data cleaning;

and the first model training module is used for acquiring a preset network model, and training to obtain a classification model by taking each training picture in the training picture set after data cleaning as the input of the preset network model and taking the type label of the corresponding training picture as the output of the preset network model.

In one embodiment, the apparatus further comprises:

the second training picture acquisition module is used for acquiring a training picture set before acquiring the image data set to be identified;

the second data cleaning module is used for cleaning the data of the acquired training picture set;

the data increment module is used for performing data increment on the training picture set after data cleaning;

the second label acquisition module is used for acquiring a type label corresponding to each training picture in the training picture set after data increment;

and the second model training module is used for acquiring a preset network model, and training to obtain a classification model by taking each training picture in the training picture set after data increment as the input of the preset network model and taking the type label of the corresponding training picture as the output of the preset network model.

In an embodiment, the preset network model further includes a first loss function connected to the output terminal of the first convolutional neural network, a second loss function connected to the output terminal of the second convolutional neural network, and an adder for adding the loss value output by the first loss function and the loss value output by the second loss function to obtain a loss value of the preset network model.

In one embodiment, the first convolutional neural network comprises: the system comprises n x k convolution kernels used for convolving basic features to obtain a first convolution result, a first pooling function used for performing global pooling on the first convolution result to obtain n x k feature vectors, and a second pooling function used for dividing the n x k feature vectors into k n-dimensional feature vectors and performing global pooling on each n-dimensional feature vector to obtain local features of a training picture; wherein k is the category number of the type label; n is a positive integer.

For specific limitations of the image recognition device, reference may be made to the above limitations of the image recognition method, which are not described herein again. The modules in the image recognition device can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 7. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing the data of the image data set to be identified and the training picture set. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement an image recognition method.

Those skilled in the art will appreciate that the architecture shown in fig. 7 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program:

acquiring an image data set to be identified;

In one embodiment, the processor, when executing the computer program, further performs the steps of:

carrying out data cleaning on the acquired training picture set;

and acquiring a preset network model, and training to obtain a classification model by taking each training picture in the training picture set after data cleaning as the input of the preset network model and taking the type label of the corresponding training picture as the output of the preset network model.

carrying out data cleaning on the acquired training picture set;

performing data increment on the training picture set after data cleaning;

and acquiring a preset network model, and training to obtain a classification model by taking each training picture in the training picture set after data increment as the input of the preset network model and taking the type label of the corresponding training picture as the output of the preset network model.

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:

acquiring an image data set to be identified;

In one embodiment, the computer program when executed by the processor further performs the steps of:

carrying out data cleaning on the acquired training picture set;

performing data increment on the training picture set after data cleaning;

In one embodiment, the preset network model further includes a first loss function connected to the output terminal of the first convolutional neural network, a second loss function connected to the output terminal of the second convolutional neural network, and an adder for adding the loss value output by the first loss function and the loss value output by the second loss function to obtain the loss value of the preset classification model.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. An image recognition method, characterized in that the method comprises:

acquiring an image data set to be identified;

2. The method of claim 1, further comprising:

carrying out data cleaning on the acquired training picture set;

3. The method of claim 1, further comprising:

carrying out data cleaning on the acquired training picture set;

performing data increment on the training picture set after data cleaning;

4. The method according to claim 1, wherein the preset network model further includes a first loss function connected to an output terminal of the first convolutional neural network, a second loss function connected to an output terminal of the second convolutional neural network, and an adder for adding a loss value output by the first loss function and a loss value output by the second loss function to obtain a loss value of the preset network model.

5. The method of claim 4, wherein the first loss function is a cross-entropy loss function; and/or the second loss function is a cross-entropy loss function.

6. The method of claim 1, 2, 4 or 5, wherein the first convolutional neural network comprises: the system comprises n × k convolution kernels used for convolving the basic features to obtain a first convolution result, a first pooling function used for globally pooling the first convolution result to obtain n × k feature vectors, and a second pooling function used for dividing the n × k feature vectors into k n-dimensional feature vectors and globally pooling each n-dimensional feature vector to obtain local features of the training picture; wherein k is the category number of the type label; n is a positive integer.

7. The method of claim 6, wherein the second convolutional neural network comprises: k convolution kernels for convolving the base features to obtain a first convolution result, and a third pooling function for globally pooling the first convolution result to obtain global features of the training picture.

8. An image recognition apparatus, characterized in that the apparatus comprises:

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method according to any of claims 1-7.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.