CN113902001A - Model training method and device, electronic equipment and storage medium - Google Patents

Model training method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN113902001A
CN113902001A CN202111158035.4A CN202111158035A CN113902001A CN 113902001 A CN113902001 A CN 113902001A CN 202111158035 A CN202111158035 A CN 202111158035A CN 113902001 A CN113902001 A CN 113902001A
Authority
CN
China
Prior art keywords
model
image
images
probability value
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111158035.4A
Other languages
Chinese (zh)
Inventor
王晓波
陈佳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sangfor Technologies Co Ltd
Original Assignee
Sangfor Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sangfor Technologies Co Ltd filed Critical Sangfor Technologies Co Ltd
Priority to CN202111158035.4A priority Critical patent/CN113902001A/en
Publication of CN113902001A publication Critical patent/CN113902001A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention is suitable for the technical field of computers, and provides a model training method, a model training device, electronic equipment and a storage medium, wherein the model training method comprises the following steps: shielding partial image content of each first image in the at least two first images to obtain at least two second images; the at least two first images correspond to at least two image categories, and the similarity between the first images of different categories in the at least two first images is greater than a set value; training a first model based on the at least two first images and the at least two second images; the trained first model is used for identifying the image category of the input image.

Description

Model training method and device, electronic equipment and storage medium
Technical Field
The invention relates to the technical field of computers, in particular to a model training method and device, electronic equipment and a storage medium.
Background
At present, many office scenes need to use an image recognition model to recognize the category of an image, and for sensitive office images, such as invoices, red-head stamp files and other sensitive office images which are easy to be confused, related technologies are easy to recognize errors through the image recognition model, and the recognition accuracy of the image recognition model is low.
Disclosure of Invention
In order to solve the above problem, embodiments of the present invention provide a model training method, an apparatus, an electronic device, and a storage medium, so as to at least solve the problem of low recognition accuracy of an image recognition model in the related art.
The technical scheme of the invention is realized as follows:
in a first aspect, an embodiment of the present invention provides a model training method, where the method includes:
shielding partial image content of each first image in the at least two first images to obtain at least two second images; the at least two first images correspond to at least two image categories, and the similarity between the first images of different categories in the at least two first images is greater than a set value;
training a first model based on the at least two first images and the at least two second images; the trained first model is used for identifying the image category of the input image.
In the foregoing solution, the blocking a partial image area of each of the at least two first images includes:
selecting a set area in each first image of the at least two first images;
filling the set area with pixel blocks of a set color.
In the foregoing solution, the blocking a partial image area of each of the at least two first images includes:
and randomly selecting a partial image area in each of the at least two first images for shielding to obtain at least two second images corresponding to each of the at least two first images.
In the foregoing solution, the training a first model based on the at least two first images and the at least two second images includes:
inputting the at least two first images and the at least two second images into the first model to obtain a first probability value output by the first model and related to the first images and a second probability value of the corresponding second images; the first probability value characterizes an image class of the corresponding first image; the second probability value represents an image category of the corresponding second image;
determining a loss value of a set loss function corresponding to each of the at least two image categories based on the first probability value and the second probability value;
updating a weight parameter of the first model based on the loss value.
In the foregoing solution, the updating the weight parameter of the first model based on the loss value includes:
performing weighted calculation on the loss value of the set loss function corresponding to each image category of the at least two image categories to obtain a weighted value;
updating a weight parameter of the first model based on the weighted value.
In the above solution, before training the first model based on the at least two first images and the at least two second images, the method further includes:
and deleting the network layer number and/or the network channel number of the set neural network model to obtain the first model.
In the above scheme, after the first model is obtained by training based on the at least two first images and the at least two second images, the method further includes:
converting the first model to a second model; the second model is used for identifying the image category of the at least one third image; the first model characterizes a PyTorch model; the second model characterizes the paddle PaddlePaddle model.
In the foregoing solution, the converting the first model into the second model includes:
converting the first model to an open neural network interchange format;
converting the first model of the open neural network exchange format into the second model through a set model conversion tool.
In a second aspect, an embodiment of the present invention provides an image recognition method, where the method includes:
inputting a third image into the first model or the second model to obtain a third probability value output by the first model or the second model; the third probability value characterizes an image class of the third image;
determining a difference between the third probability value and a fourth probability value corresponding to each of at least two categories;
determining an image class of the second image based on the difference.
In a third aspect, an embodiment of the present invention provides a model training apparatus, including:
the shielding module is used for shielding partial image content of each first image in the at least two first images to obtain at least two second images; the at least two first images correspond to at least two image categories, and the similarity between the first images of different categories in the at least two first images is greater than a set value;
a training module for training a first model based on the at least two first images and the at least two second images; the trained first model is used for identifying the image category of the input image.
In a fourth aspect, an embodiment of the present invention provides an image recognition apparatus, including:
the input module is used for inputting a third image into the first model or the second model to obtain a third probability value output by the first model or the second model; the third probability value characterizes an image class of the third image;
a determining module, configured to determine a difference between the third probability value and a fourth probability value corresponding to each of at least two categories;
an identification module to determine an image category of the second image based on the difference.
In a fifth aspect, an embodiment of the present invention provides an electronic device, including a processor and a memory, where the processor and the memory are connected to each other, where the memory is used to store a computer program, and the computer program includes program instructions, and the processor is configured to call the program instructions to execute the steps of the model training method provided in the first aspect or the steps of the image recognition method provided in the second aspect of the embodiment of the present invention.
In a sixth aspect, an embodiment of the present invention provides a computer-readable storage medium, including: the computer-readable storage medium stores a computer program. The computer program, when executed by a processor, implements the steps of the model training method as provided in the first aspect or the steps of the image recognition method as provided in the second aspect of an embodiment of the invention.
According to the embodiment of the invention, at least two second images are obtained by shielding partial image content of each first image in at least two first images. And training a first model based on the at least two first images and the at least two second images, wherein the trained first model is used for identifying the image category of the input image. The at least two first images correspond to at least two image categories, and the similarity between the first images of different categories in the at least two first images is larger than a set value. According to the embodiment of the invention, the first model is trained by the at least two first images and the at least two second images, so that the number of training samples of the first model for each type of image can be increased, the recognition capability of the model for each type of image is enhanced, the distinction of the confusable type of image is realized, and the recognition accuracy of the image recognition model is improved.
Drawings
FIG. 1 is a schematic diagram of an implementation flow of a model training method according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of another implementation of a model training method according to an embodiment of the present invention;
FIG. 3 is a schematic flow chart of another implementation of a model training method according to an embodiment of the present invention;
FIG. 4 is a schematic flow chart of another implementation of a model training method according to an embodiment of the present invention;
FIG. 5 is a schematic flow chart illustrating an implementation of a model transformation method according to an embodiment of the present invention;
fig. 6 is a schematic flow chart illustrating an implementation of an image recognition method according to an embodiment of the present invention;
FIG. 7 is a diagram illustrating an image processing flow according to an embodiment of the present invention;
FIG. 8 is a schematic diagram of a model training apparatus according to an embodiment of the present invention;
fig. 9 is a schematic diagram of an image recognition apparatus according to an embodiment of the present invention;
fig. 10 is a schematic diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In office scenarios, there is often a need for image classification, where small portions of images can be classified by manual recognition, and in the case of large numbers of images, image classification by means of an image recognition model is required. For sensitive office images that are confusable, such as invoices, stamp files, red-headed files, and the like, the images of these confusable categories are similar in character to the computer and are indistinguishable. Therefore, the related art has a low recognition accuracy in recognizing a sensitive office image that is confusable.
Moreover, the current image recognition model for sensitive office needs to design a specific algorithm structure aiming at a specific application scene, so that the reasoning speed is low, and the image recognition efficiency is low.
In view of the above disadvantages of the related art, embodiments of the present invention provide a model training method, which can at least improve the recognition accuracy of an image recognition model. In order to explain the technical means of the present invention, the following description will be given by way of specific examples.
Fig. 1 is a schematic flow chart of an implementation process of a model training method according to an embodiment of the present invention, where an execution subject of the model training method is an electronic device, and the electronic device includes a desktop computer, a notebook computer, a server, and the like. Referring to fig. 1, the model training method includes:
s101, shielding partial image content of each first image in at least two first images to obtain at least two second images; the at least two first images correspond to at least two image categories, and the similarity between the first images of different categories in the at least two first images is greater than a set value.
Here, the at least two first images are training data of a first model, the first model is an image recognition model in which a network structure is set in advance, and the first model can be used for recognizing an image type of the input image after model training.
The at least two first images comprise at least two categories of first images, such as: at least two of 7 categories of code screenshot, invoice, seal file, design drawing, invoice number, red-headed file and red-headed seal file. And, in at least two first images, the similarity between the first images of different categories is greater than a set value.
The similarity between the first images of different classes is larger than a set value, which indicates that the first images of different classes are confusable, similar in characteristics to computers and difficult to distinguish. Here, the similarity may be cosine similarity, the images are represented as a vector, and the similarity between the two images is characterized by calculating the cosine distance between the vectors. The similarity may also be a histogram similarity, which can describe the global distribution of colors in one image, such as image a and image B, calculating the histograms of the two images, HistA and HistB, respectively, and then calculating the normalized correlation coefficient (babbitt distance, histogram intersection distance) of the two histograms as the similarity. The similarity may also be calculated by a structural similarity metric (SSIM) and a perceptual hashing algorithm. The perceptual hashing algorithm generates a "fingerprint" (fingerprint) string for each image and then compares the fingerprints of the different images. The closer the fingerprints are, the more similar the images are. The SSIM measures image similarity from three aspects of brightness, contrast and structure. The SSIM value range [0,1] indicates that the image distortion is smaller when the value is larger. In practical application, an image can be blocked by using a sliding window, the total number of blocks is N, the influence of the shape of the window on the blocks is considered, the mean value, the variance and the covariance of each window are calculated by adopting Gaussian weighting, then the structural similarity SSIM of the corresponding block is calculated, and finally the mean value is used as the structural similarity measurement of two images, namely the average structural similarity SSIM.
For example, for the image a, a rectangular frame of a solid background is inserted into the image a, and then a part of the image content of the image a is blocked by the rectangular frame. It should be understood that the area of the rectangular box should be smaller than the area of image a.
Referring to fig. 2, in an embodiment, the randomly blocking a partial image area of each of the at least two first images includes:
s201, selecting a setting area in each first image of the at least two first images.
Here, the set area is an area having a set shape and size, for example, the set area is a rectangular frame having a length and a width of 4mm, and the rectangular frame is randomly inserted into an arbitrary position of the first image.
In practical applications, the set area cannot completely block the first image, so the area of the set area is smaller than that of the first image.
S202, filling the setting area with pixel blocks of the setting color.
Here, the setting region may be filled with pixel blocks of an arbitrary color, for example, a white pixel block, and the blocking of a partial image region of the first image may be realized.
In practical applications, multiple occlusion regions can be generated on one first image at the same time, for example, 3 setting regions are selected on one first image, and the shapes of the 3 setting regions can be different, so that the first image will have 3 parts of image contents occluded.
In an embodiment, the blocking a partial image area of each of the at least two first images includes:
and randomly selecting a partial image area in each of the at least two first images for shielding to obtain at least two second images corresponding to each of the at least two first images.
Here, each of the at least two first images generates a plurality of second images, for example, each time one setting region is selected from the first images, the position of the selected setting region is different each time, that is, the content of the blocked image is different each time, so that the plurality of second images obtained are also different.
The blocking of the partial image content of the first image does not change the category of the first image, for example, the image content of the first image is a cat, the ear of the cat is blocked, and the obtained image content of the second image is also a cat.
The larger the number of the second images generated by the first images of each category is, the recognition capability of the model to the images of the category can be enhanced during model training, the robustness of the model is improved, and the distinction of the confusable category images is realized.
S102, training a first model based on the at least two first images and the at least two second images; the trained first model is used for identifying the image category of the input image.
Referring to fig. 3, in an embodiment, the training the first model based on the at least two first images and the at least two second images includes:
s301, inputting the at least two first images and the at least two second images into the first model to obtain a first probability value of the first image and a second probability value of the corresponding second image, which are output by the first model; the first probability value characterizes an image class of the corresponding first image; the second probability value represents an image category of the corresponding second image.
In practical application, before the training of the first model is started by using the training images, the training images need to be labeled, and the labels represent the categories of the corresponding images.
Training a first model based on the at least two first images and the at least two second images with the labels, and enhancing the capability of the model to distinguish the images of the confusable categories.
Here, in the model training process, the first model outputs a probability value representing the image type of the input image predicted by the model, the first probability value representing the corresponding image type of the first image, and the second probability value representing the corresponding image type of the second image.
Assume a total of 5 categories, each of which corresponds to a set probability value. For example, the probability value corresponding to the invoice category is 0.2; the probability value corresponding to the seal type is 0.4; the probability value corresponding to the red header file category is 0.6; the probability value corresponding to the waybill category is 0.8; the probability value corresponding to the design drawing type is 1.0.
The closer the probability value output by the first model is to the set probability value corresponding to which category, the corresponding category is determined by the first model as the category of the input first image. For example, the first model outputs a first probability value of 0.22 that is closest to the probability value corresponding to the invoice category, and identifies the category of the input first image as the invoice category.
S302, determining a loss value of a set loss function corresponding to each image category in the at least two image categories based on the first probability value and the second probability value.
Here, a loss function is set for each of the at least two image classes, and the first model is trained only when all the loss functions converge. The set loss function is a matching loss function, and the effect of the set loss function is to make the first probability value and the second probability value as identical as possible.
And for the set loss function corresponding to each image category, substituting the first probability value and the second probability value into the set loss function to solve the loss value. In the embodiment of the invention, the smaller the difference value between the first probability value and the second probability value is, the smaller the loss value corresponding to the set loss function is; the larger the difference between the first probability value and the second probability value is, the larger the loss value corresponding to the set loss function is. A set loss function may be considered to converge when the set loss function has a minimum loss value for the set loss function. In practical applications, the set loss function may be an L2 loss function, the L2 loss function is also called a least square error, and the set loss function may be minimized by gradient descent.
S303, updating the weight parameter of the first model based on the loss value.
The model training is an iterative training process, the first model is continuously trained by using training images, and model parameters are updated until the model converges.
Here, the weight parameters are configuration variables inside the first model. Whether the first model is applicable or how the effect depends on the setting of the weight parameters to a large extent, and the first model can be optimized by adjusting and optimizing the weight parameters, so that the performance of the first model is improved. In practical applications, the updating of the weight parameter includes increasing the parameter value, decreasing the parameter value, and the like.
Referring to fig. 4, in an embodiment, the updating the weight parameter of the first model based on the loss value includes:
s401, performing weighted calculation on the loss value of the set loss function corresponding to each image category of the at least two image categories to obtain a weighted value.
S402, updating the weight parameter of the first model based on the weighted value.
Because the set loss function corresponding to each image category has a loss value, if the weight parameter of the first model is adjusted according to each loss value independently, the weight parameter of the first model needs to be adjusted for many times, and the weight parameter of the first model is difficult to realize balance. According to the embodiment of the invention, the loss value of the set loss function corresponding to each image type is weighted to obtain a weighted value, and the weight parameter of the first model is adjusted based on the weighted value, so that the complexity of updating the weight parameter of the first model is reduced by only adjusting the weight parameter of the first model once. In the weighting calculation, the weight of the loss value corresponding to each image type may be the same or different.
In practical application, a corresponding relationship between the loss value and the weight parameter to be adjusted may be set, and after the loss value is obtained, the weight parameter of the first model is adjusted to the weight parameter corresponding to the loss value.
And after updating the weight parameters of the first model, continuing to train the first model, and repeating iterative training until all loss functions of the first model are converged, so that the training of the first model is finished.
By training the first model by using the at least two first images and the at least two second images, the number of training samples of the first model for each class of images can be enlarged, and the identification capability of the first model for each class of images can be enhanced. Because the loss functions corresponding to different image categories are converged simultaneously, in the actual image classification process, if a plurality of images of the confusable categories are input simultaneously, the first model can realize the distinction of the images of the confusable categories.
According to the embodiment of the invention, at least two second images are obtained by shielding partial image content of each first image in at least two first images. And training a first model based on the at least two first images and the at least two second images, wherein the trained first model is used for identifying the image category of the input image. The at least two first images correspond to at least two image categories, and the similarity between the first images of different categories in the at least two first images is larger than a set value. According to the embodiment of the invention, the first model is trained by the at least two first images and the at least two second images, so that the number of training samples of the first model for each type of image can be increased, the recognition capability of the model for each type of image is enhanced, the robustness of the model is improved, the distinction of the confusable type of image is realized, and the recognition accuracy of the image recognition model is improved.
In an embodiment, before training the first model based on the at least two first images and the at least two second images, the method further comprises:
and deleting the network layer number and/or the network channel number of the set neural network model to obtain the first model.
For example, the set neural network model is the original MobileNet _ V3 model, and the network structure of the original MobileNet _ V3 is shown in table 1:
Figure BDA0003285545860000101
Figure BDA0003285545860000111
TABLE 1
Simplifying the original MobileNet _ V3 model, deleting the network layer number and/or the network channel number of the original MobileNet _ V3 model, and the network structure of the simplified MobileNet _ V3 is shown in table 2:
Input Operator expsize #out SE NL s
224x224x3 conv2d - 16 - HS 2
112x112x16 bneck,3x3 16 16 RE 2
56x56x16 bneck,3x3 72 24 - RE 2
28x28x24 bneck,3x3 88 24 - RE 1
28x28x24 bneck,5x5 96 40 HS 2
14x14x40 bneck,5x5 240 40 HS 1
14x14x40 bneck,5x5 120 48 HS 1
14x14x48 bneck,5x5 288 96 HS 2
7x7x96 bneck,5x5 576 96 HS 1
7x7x96 bneck,5x5 576 96 HS 1
7x7x96 conv2d,1x1 - 480 - HS 1
7x7x480 pool,7x7 - - - - 1
1x1x480 conv2d,1x1,NBN - 512 - HS 1
1x1x512 conv2d,1x1,NBN - k - - 1
TABLE 2
Wherein, Input is Input size; operator denotes the structure; expsize and # out denote the number of channels; SE colluding indicates that the attention mechanism is used; NL denotes the different activation functions, RE denotes ReLU, HS denotes h-wish; s denotes the convolution step size.
The number of network layers and the number of network channels are reduced, so that the network parameters are reduced. Meanwhile, the calculation amount is reduced by using a large step length, the processing speed of the first model is improved, and the identification efficiency of the first model is improved.
In an embodiment, after training the first model based on the at least two first images and the at least two second images, the method further comprises:
converting the first model to a second model; the second model is used for identifying the image category of the at least one third image; the first model characterizes a PyTorch model; the second model characterizes the paddle PaddlePaddle model.
For example, the first model is a lightweight MobileNet _ V3 network model under the pyrrch framework, and the second model is a Paddle model.
The first model is converted into the second model, because the Paddle model is constructed based on the C + + language, the pytorch model is constructed based on the Python language, and the C + + language is faster than the Python processing speed. And the Paddle model is specially designed for reasoning, the bottom layer optimization is better, and the operation speed is higher for image recognition.
Referring to fig. 5, in the above embodiment, the converting the first model into the second model includes:
s501, converting the first model into an open neural network exchange format.
Here, the open neural network exchange format is an ONEX format, and ONNX is an open file format designed for machine learning, and is used to store a trained model. Mainstream frameworks such as Caffe2, PyTorch, Microsoft Cognitive Toolkit, and Apache MXNet all support ONNX to varying degrees. This facilitates migration of the model between different frames.
S502, converting the first model in the open neural network exchange format into the second model through a set model conversion tool.
Here, the model conversion tool is set to be X2Paddle, where X2Paddle is a model conversion tool that can convert models of TensorFlow and Caffe into a format loadable by the PaddlePaddle core framework, and X2Paddle also supports model conversion in the ONNX format, which is equivalent to supporting a plurality of frameworks that can be converted into the ONNX format, such as PyTorch, MXNet, CNTK, and the like.
The first model of ONNX format can be converted to the second model of Paddle structure by x2 Paddle. The image recognition is carried out through the Paddle model obtained through conversion, the image processing speed of the model can be increased, the memory consumption is reduced, the processing speed of 20 images per second can be realized by a single-core CPU in practical application, and the recognition accuracy rate of different types of images is 95%.
Referring to fig. 6, fig. 6 is a schematic flow chart illustrating an implementation of an image recognition method according to an embodiment of the present invention, where the image recognition method performs image recognition by using the first model or the second model obtained by training according to the above embodiment, and an execution subject of the image recognition method is an electronic device, where the electronic device includes a desktop computer, a notebook computer, a server, and the like. Referring to fig. 6, the image recognition method includes:
s601, inputting a third image into the first model or the second model to obtain a third probability value output by the first model or the second model; the third probability value characterizes an image class of the third image.
Here, both the first model and the second model can be used for image recognition, and the use of the second model has a more efficient image recognition capability.
For an input third image, the first model or the second model outputs a third probability value, and the third probability value represents an image category of the third image.
S602, determining a difference value between the third probability value and a fourth probability value corresponding to each of at least two categories.
Here, a fourth probability value is set for each category, and a total of 5 categories are assumed, for example, the fourth probability value corresponding to the invoice category is 0.2; the fourth probability value corresponding to the seal type is 0.4; the fourth probability value corresponding to the red header file type is 0.6; the fourth probability value corresponding to the waybill category is 0.8; and the fourth probability value corresponding to the design drawing type is 1.0.
A difference is calculated between the third probability value and a fourth probability value corresponding to each of the at least two categories.
S603, determining the image category of the second image based on the difference value.
The closer the third probability value output by the first model or the second model is to the fourth probability value corresponding to which image class, the first model determines the corresponding image class as the image class of the input first image. For example, the first model or the second model outputs a third probability value of 0.22, which is the smallest difference from a fourth probability value corresponding to the invoice category, so that the category of the input third image is identified as the invoice category.
According to the embodiment of the invention, the first model or the second model obtained by training is used for image recognition, so that the distinction of the confusable class images can be realized, and the accuracy of image recognition is enhanced.
Referring to fig. 7, fig. 7 is a schematic diagram of an image processing flow according to an embodiment of the present invention. The image processing flow comprises the following steps:
firstly, office sensitive images are obtained from information sources such as the interior of an enterprise and the Internet, partial image content of the office sensitive images is shielded, and shielded images are obtained. And labeling the occlusion images and the original common sensitive images to be used as training data. The MobileNet _ V3 model was then trained under the pyrrch framework using training data. By adding the shielding images as training data, the recognition capability of the model to the images of each category can be enhanced, and the distinction of the confusable category images is realized.
And converting the trained MobileNet _ V3 model into an ONNX format, and then converting the MobileNet _ V3 model in the ONNX format into a Paddle model through x2 Paddle. Therefore, the reasoning operation speed of the model on the image can be increased, and the recognition efficiency of the model on the image is improved.
And finally, classifying the images by using the Paddle model obtained by conversion, and realizing high-accuracy and high-efficiency image identification.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
The technical means described in the embodiments of the present invention may be arbitrarily combined without conflict.
In addition, in the embodiments of the present invention, "first", "second", and the like are used for distinguishing similar objects, and are not necessarily used for describing a specific order or a sequential order.
Referring to fig. 8, fig. 8 is a schematic diagram of a model training apparatus according to an embodiment of the present invention, as shown in fig. 8, the apparatus includes: the device comprises a shielding module and a training module.
The shielding module is used for shielding partial image content of each first image in the at least two first images to obtain at least two second images; the at least two first images correspond to at least two image categories, and the similarity between the first images of different categories in the at least two first images is greater than a set value;
a training module for training a first model based on the at least two first images and the at least two second images; the trained first model is used for identifying the image category of the input image.
In an embodiment, the occlusion module, when occluding the partial image area of each of the at least two first images, is configured to:
selecting a set area in each first image of the at least two first images;
filling the set area with pixel blocks of a set color.
In an embodiment, the occlusion module, when occluding the partial image area of each of the at least two first images, is configured to:
and randomly selecting a partial image area in each of the at least two first images for shielding to obtain at least two second images corresponding to each of the at least two first images.
In an embodiment, the training module, when training the first model based on the at least two first images and the at least two second images, is configured to:
inputting the at least two first images and the at least two second images into the first model to obtain a first probability value output by the first model and related to the first images and a second probability value of the corresponding second images; the first probability value characterizes an image class of the corresponding first image; the second probability value represents an image category of the corresponding second image;
determining a loss value of a set loss function corresponding to each of the at least two image categories based on the first probability value and the second probability value;
updating a weight parameter of the first model based on the loss value.
In one embodiment, the training module updates the weight parameter of the first model based on the loss value for:
performing weighted calculation on the loss value of the set loss function corresponding to each image category of the at least two image categories to obtain a weighted value;
updating a weight parameter of the first model based on the weighted value.
In one embodiment, the apparatus further comprises:
and the deleting module is used for deleting the network layer number and/or the network channel number of the set neural network model to obtain the first model.
In one embodiment, the apparatus further comprises:
a conversion module for converting the first model into a second model; the second model is used for identifying the image category of the at least one third image; the first model characterizes a PyTorch model; the second model characterizes the paddle PaddlePaddle model.
In an embodiment, the conversion module, when converting the first model into the second model, is configured to:
converting the first model to an open neural network interchange format;
converting the first model of the open neural network exchange format into the second model through a set model conversion tool.
Referring to fig. 9, fig. 9 is a schematic diagram of a model training apparatus according to an embodiment of the present invention, as shown in fig. 9, the apparatus includes: the device comprises an input module, a determination module and an identification module.
The input module is used for inputting a third image into the first model or the second model to obtain a third probability value output by the first model or the second model; the third probability value characterizes an image class of the third image;
a determining module, configured to determine a difference between the third probability value and a fourth probability value corresponding to each of at least two categories;
an identification module to determine an image category of the second image based on the difference.
In practical applications, the input module, the determination module and the identification module may be implemented by a Processor in an electronic device, such as a Central Processing Unit (CPU), a Digital Signal Processor (DSP), a Micro Control Unit (MCU), or a Programmable Gate Array (FPGA).
It should be noted that: in the model training device provided in the above embodiment, only the division of the above modules is used for illustration when performing model training, and in practical applications, the above processing may be distributed to different modules as needed, that is, the internal structure of the device may be divided into different modules to complete all or part of the above described processing. In addition, the model training device and the model training method provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments in detail and are not described herein again.
Based on the hardware implementation of the program module, in order to implement the method of the embodiment of the present application, an embodiment of the present application further provides an electronic device. Fig. 10 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application, and as shown in fig. 10, the electronic device includes:
the communication interface can carry out information interaction with other equipment such as network equipment and the like;
and the processor is connected with the communication interface to realize information interaction with other equipment, and is used for executing the method provided by one or more technical schemes on the electronic equipment side when running a computer program. And the computer program is stored on the memory.
Of course, in practice, the various components in an electronic device are coupled together by a bus system. It will be appreciated that a bus system is used to enable communications among the components. The bus system includes a power bus, a control bus, and a status signal bus in addition to a data bus. But for the sake of clarity the various buses are labeled as a bus system in figure 10.
The memory in the embodiments of the present application is used to store various types of data to support the operation of the electronic device. Examples of such data include: any computer program for operating on an electronic device.
It will be appreciated that the memory can be either volatile memory or nonvolatile memory, and can include both volatile and nonvolatile memory. Among them, the nonvolatile Memory may be a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), a magnetic random access Memory (FRAM), a Flash Memory (Flash Memory), a magnetic surface Memory, an optical disk, or a Compact Disc Read-Only Memory (CD-ROM); the magnetic surface storage may be disk storage or tape storage. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of illustration and not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), Synchronous Static Random Access Memory (SSRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic Random Access Memory (SDRAM), Double Data Rate Synchronous Dynamic Random Access Memory (DDRSDRAM), Enhanced Synchronous Dynamic Random Access Memory (ESDRAM), Enhanced Synchronous Dynamic Random Access Memory (Enhanced DRAM), Synchronous Dynamic Random Access Memory (SLDRAM), Direct Memory (DRmb Access), and Random Access Memory (DRAM). The memories described in the embodiments of the present application are intended to comprise, without being limited to, these and any other suitable types of memory.
The method disclosed in the embodiments of the present application may be applied to a processor, or may be implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The processor described above may be a general purpose processor, a DSP, or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. The processor may implement or perform the methods, steps, and logic blocks disclosed in the embodiments of the present application. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed in the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software modules may be located in a storage medium located in a memory where a processor reads the programs in the memory and in combination with its hardware performs the steps of the method as previously described.
Optionally, when the processor executes the program, the corresponding process implemented by the electronic device in each method of the embodiment of the present application is implemented, and for brevity, no further description is given here.
In an exemplary embodiment, the present application further provides a storage medium, specifically a computer storage medium, for example, a first memory storing a computer program, where the computer program is executable by a processor of an electronic device to perform the steps of the foregoing method. The computer readable storage medium may be Memory such as FRAM, ROM, PROM, EPROM, EEPROM, Flash Memory, magnetic surface Memory, optical disk, or CD-ROM.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus, electronic device and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only a logical functional division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, all functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: a removable storage device, a ROM, a RAM, a magnetic or optical disk, or various other media that can store program code.
Alternatively, the integrated units described above in the present application may be stored in a computer-readable storage medium if they are implemented in the form of software functional modules and sold or used as independent products. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially implemented or portions thereof contributing to the prior art may be embodied in the form of a software product stored in a storage medium, and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a removable storage device, a ROM, a RAM, a magnetic or optical disk, or various other media that can store program code.
The technical means described in the embodiments of the present application may be arbitrarily combined without conflict.
In addition, in the examples of the present application, "first", "second", and the like are used for distinguishing similar objects, and are not necessarily used for describing a specific order or a sequential order.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (13)

1. A method of model training, the method comprising:
shielding partial image content of each first image in the at least two first images to obtain at least two second images; the at least two first images correspond to at least two image categories, and the similarity between the first images of different categories in the at least two first images is greater than a set value;
training a first model based on the at least two first images and the at least two second images; the trained first model is used for identifying the image category of the input image.
2. The method according to claim 1, wherein the blocking a partial image area of each of the at least two first images comprises:
selecting a set area in each first image of the at least two first images;
filling the set area with pixel blocks of a set color.
3. The method according to claim 1, wherein the blocking a partial image area of each of the at least two first images comprises:
and randomly selecting a partial image area in each of the at least two first images for shielding to obtain at least two second images corresponding to each of the at least two first images.
4. The method of claim 1, wherein training the first model based on the at least two first images and the at least two second images comprises:
inputting the at least two first images and the at least two second images into the first model to obtain a first probability value output by the first model and related to the first images and a second probability value of the corresponding second images; the first probability value characterizes an image class of the corresponding first image; the second probability value represents an image category of the corresponding second image;
determining a loss value of a set loss function corresponding to each of the at least two image categories based on the first probability value and the second probability value;
updating a weight parameter of the first model based on the loss value.
5. The method of claim 4, wherein updating the weight parameter of the first model based on the loss value comprises:
performing weighted calculation on the loss value of the set loss function corresponding to each image category of the at least two image categories to obtain a weighted value;
updating a weight parameter of the first model based on the weighted value.
6. The method according to claim 1, wherein before training the first model based on the at least two first images and the at least two second images, the method further comprises:
and deleting the network layer number and/or the network channel number of the set neural network model to obtain the first model.
7. The method according to claim 1, wherein after training the first model based on the at least two first images and the at least two second images, the method further comprises:
converting the first model to a second model; the second model is used for identifying the image category of the at least one third image; the first model characterizes a PyTorch model; the second model characterizes the paddle PaddlePaddle model.
8. The method of claim 7, wherein converting the first model to a second model comprises:
converting the first model to an open neural network interchange format;
converting the first model of the open neural network exchange format into the second model through a set model conversion tool.
9. An image recognition method, wherein image recognition is performed based on the first model or the second model according to any one of claims 1 to 8, the method comprising:
inputting a third image into the first model or the second model to obtain a third probability value output by the first model or the second model; the third probability value characterizes an image class of the third image;
determining a difference between the third probability value and a fourth probability value corresponding to each of at least two categories;
determining an image class of the second image based on the difference.
10. A model training apparatus, comprising:
the shielding module is used for shielding partial image content of each first image in the at least two first images to obtain at least two second images; the at least two first images correspond to at least two image categories, and the similarity between the first images of different categories in the at least two first images is greater than a set value;
a training module for training a first model based on the at least two first images and the at least two second images; the trained first model is used for identifying the image category of the input image.
11. An image recognition apparatus, comprising:
the input module is used for inputting a third image into the first model or the second model to obtain a third probability value output by the first model or the second model; the third probability value characterizes an image class of the third image;
a determining module, configured to determine a difference between the third probability value and a fourth probability value corresponding to each of at least two categories;
an identification module to determine an image category of the second image based on the difference.
12. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the model training method of any one of claims 1 to 8 or the image recognition method of claim 9 when executing the computer program.
13. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program comprising program instructions that, when executed by a processor, cause the processor to perform the model training method of any one of claims 1 to 8 or the image recognition method of claim 9.
CN202111158035.4A 2021-09-28 2021-09-28 Model training method and device, electronic equipment and storage medium Pending CN113902001A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111158035.4A CN113902001A (en) 2021-09-28 2021-09-28 Model training method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111158035.4A CN113902001A (en) 2021-09-28 2021-09-28 Model training method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113902001A true CN113902001A (en) 2022-01-07

Family

ID=79189472

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111158035.4A Pending CN113902001A (en) 2021-09-28 2021-09-28 Model training method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113902001A (en)

Similar Documents

Publication Publication Date Title
Singh Practical machine learning and image processing: for facial recognition, object detection, and pattern recognition using Python
CN108345890B (en) Image processing method, device and related equipment
US20230237841A1 (en) Occlusion Detection
CN109117848A (en) A kind of line of text character identifying method, device, medium and electronic equipment
JP6943291B2 (en) Learning device, learning method, and program
CN110443357B (en) Convolutional neural network calculation optimization method and device, computer equipment and medium
CN112348081A (en) Transfer learning method for image classification, related device and storage medium
CN113785305A (en) Method, device and equipment for detecting inclined characters
CN111753863A (en) Image classification method and device, electronic equipment and storage medium
JP7095599B2 (en) Dictionary learning device, dictionary learning method, data recognition method and computer program
CN114330588A (en) Picture classification method, picture classification model training method and related device
US20220270341A1 (en) Method and device of inputting annotation of object boundary information
CN113222043B (en) Image classification method, device, equipment and storage medium
WO2024055864A1 (en) Training method and apparatus for implementing ia classification model using rpa and ai
CN112364916A (en) Image classification method based on transfer learning, related equipment and storage medium
CN113822144A (en) Target detection method and device, computer equipment and storage medium
CN113516739B (en) Animation processing method and device, storage medium and electronic equipment
CN117315758A (en) Facial expression detection method and device, electronic equipment and storage medium
CN113223011A (en) Small sample image segmentation method based on guide network and full-connection conditional random field
CN110717405B (en) Face feature point positioning method, device, medium and electronic equipment
CN110059743B (en) Method, apparatus and storage medium for determining a predicted reliability metric
WO2023160290A1 (en) Neural network inference acceleration method, target detection method, device, and storage medium
CN111815748A (en) Animation processing method and device, storage medium and electronic equipment
US20230021551A1 (en) Using training images and scaled training images to train an image segmentation model
CN113902001A (en) Model training method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination