CN114743081B - Model training method, related device and storage medium - Google Patents

Model training method, related device and storage medium Download PDF

Info

Publication number
CN114743081B
CN114743081B CN202210502310.8A CN202210502310A CN114743081B CN 114743081 B CN114743081 B CN 114743081B CN 202210502310 A CN202210502310 A CN 202210502310A CN 114743081 B CN114743081 B CN 114743081B
Authority
CN
China
Prior art keywords
image
target
weight distribution
target image
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210502310.8A
Other languages
Chinese (zh)
Other versions
CN114743081A (en
Inventor
请求不公布姓名
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Real AI Technology Co Ltd
Original Assignee
Beijing Real AI Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Real AI Technology Co Ltd filed Critical Beijing Real AI Technology Co Ltd
Priority to CN202210502310.8A priority Critical patent/CN114743081B/en
Publication of CN114743081A publication Critical patent/CN114743081A/en
Application granted granted Critical
Publication of CN114743081B publication Critical patent/CN114743081B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the application relates to the field of computer vision, and provides a model training method, a related device and a storage medium, wherein the method comprises the following steps: acquiring a target image; acquiring target weight distribution, and acquiring target image characteristics from the target image according to the target weight distribution, wherein the target weight distribution is updated according to historical weight distribution; identifying the target image characteristics to obtain the identification probability distribution of the target image; if the similarity between the target category and the label of the target image is not smaller than a first preset threshold, the target weight distribution is used as the final weight distribution of a preset model; the target category is a category with a probability value larger than a preset probability value in the identification probability distribution. According to the embodiment of the application, the fixed weight in the neural network model is replaced by the weight distribution with more flexible and changeable expression capacity, so that the model can extract more extensive image characteristics, and the accuracy of image recognition is improved.

Description

Model training method, related device and storage medium
Technical Field
The embodiment of the application relates to the field of computer vision, in particular to a model training method, a related device and a storage medium.
Background
The neural network model usually needs to be trained by using the characteristic information of the sample data or identified based on the characteristic information of the data to be identified, and whether the characteristic information is accurate or not has great influence on the training effect or the identification result of the neural network model.
Implicit neural network models are favored because of their lower storage resource consumption compared to explicit neural network models such as convolutional neural network models. The depth equalization model (Deep Equilibrium Models, DEQ) is a typically implicit neural network model that can be used for feature extraction. However, the DEQ model shares a group of weights in the process of iteratively extracting the features, so that the flexible and changeable features in the input data cannot be obtained; the features extracted from the input data by adopting the DEQ model are not accurate enough and have weak expressive power, which can influence the training effect of the recognition model, so that the recognition result of the recognition model after training is not accurate enough.
Disclosure of Invention
The embodiment of the application provides a model training method, a model training device and a model training storage medium, wherein in the iterative process of acquiring target image characteristics from a target image, each iterative round resamples weights for extracting the image characteristics from target weight distribution instead of sharing a group of weights for each iterative round. Each resampling weight from the target weight distribution is possibly different, so that the universality of image feature acquisition can be improved, namely the target weight distribution enables the model to have more flexible and changeable feature expression capability, so that more extensive image features can be extracted, and finally, the aim of improving the training effect and the recognition effect of the image recognition model is fulfilled.
In a first aspect, an embodiment of the present application provides a model training method, including:
acquiring a target image;
acquiring target weight distribution, and acquiring target image characteristics from the target image according to the target weight distribution, wherein the target weight distribution is updated according to historical weight distribution;
identifying the target image characteristics to obtain the identification probability distribution of the target image;
if the similarity between the target category and the label of the target image is not smaller than a first preset threshold, the target weight distribution is used as the final weight distribution of a preset model;
the target category is a category with a probability value larger than a preset probability value in the identification probability distribution.
In a second aspect, an embodiment of the present application provides a model training apparatus having a function of implementing a model training method corresponding to the first aspect. The functions may be implemented by hardware, or may be implemented by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the functions described above, which may be software and/or hardware.
In one embodiment, the model training apparatus includes:
An input-output module configured to acquire a target image;
the processing module is configured to acquire a target weight distribution, acquire target image characteristics from the target image according to the target weight distribution, wherein the target weight distribution is updated according to a historical weight distribution;
the processing module is further configured to identify the characteristics of the target image to obtain an identification probability distribution of the target image;
the processing module is further configured to take the target weight distribution as a final weight distribution of a preset model if the similarity between the target category and the label of the target image is not smaller than a first preset threshold;
the target category is a category with a probability value larger than a preset probability value in the identification probability distribution.
In a third aspect, embodiments of the present application provide a computer-readable storage medium comprising instructions that, when run on a computer, cause the computer to perform the model training method as described in the first aspect.
In a fourth aspect, an embodiment of the present application provides a computing device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the model training method of the first aspect when executing the computer program.
Compared with the prior art, in the embodiment of the application, the target image is acquired, the target weight distribution is acquired according to the historical weight distribution, the target image features with wide feature information are acquired from the target image according to the target weight distribution, the target image features are identified to determine whether the target image features are accurate or not, and further whether the target weight distribution can be used as the final weight distribution of the preset model or not is determined, so that training of the preset model is completed. According to the embodiment of the application, the fixed weight in the existing model is replaced by the weight distribution, and the weight distribution is used as a to-be-optimized item of the preset model for training. Because the weight distribution may include a plurality of different weights, the plurality of different weights may help the preset model obtain different image features from the image, however, the different image features may not provide effective information for image recognition, and therefore, the embodiment of the present application further performs iterative update on the weight distribution to obtain a final weight distribution. Compared with the historical weight distribution, the final weight distribution also comprises a plurality of different weights, and the different weights can enable the preset model to acquire different image features from the image, but the different image features can provide feature information effective for image identification, namely the trained preset model can acquire wide and accurate image features from the image, and the image feature distribution method has flexible and changeable feature expression capability. Compared with the existing model which can only acquire fixed image features from an image according to fixed weights, the preset model of the embodiment of the application can acquire wide and accurate target image features from the image according to final weight distribution, and the wide and accurate target image features have wider and more dimensional image information than single fixed image features, so that the target image features can improve the recognition accuracy of the image recognition model.
Drawings
The objects, features and advantages of the embodiments of the present application will become readily apparent from the detailed description of the embodiments of the present application read with reference to the accompanying drawings. Wherein:
FIG. 1 is a schematic diagram of a communication system of a model training method in an embodiment of the present application;
FIG. 2 is a flow chart of a model training method according to an embodiment of the present application;
FIG. 3 is a flowchart illustrating a step of acquiring the target image feature in FIG. 2;
fig. 4 is a schematic flow chart of iterative acquisition of target image features by a preset model according to an embodiment of the present application;
FIG. 5 is a flowchart illustrating a preset model update weight distribution according to an embodiment of the present application;
FIG. 6 is a schematic diagram illustrating a comparison between a conventional model and a Bayesian model in accordance with an embodiment of the present application;
FIG. 7 is a schematic diagram showing a comparison of features learned by a conventional DEQ model and a Bayesian-DEQ model in a two-dimensional European space according to an embodiment of the present application;
FIG. 8 is a schematic structural diagram of a model training device according to an embodiment of the present application;
FIG. 9 is a schematic diagram of a computing device according to an embodiment of the present application;
fig. 10 is a schematic structural diagram of a mobile phone according to an embodiment of the present application;
fig. 11 is a schematic structural diagram of a server in an embodiment of the present application.
In the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.
Detailed Description
The terms first, second and the like in the description and in the claims of the embodiments and in the above-described figures are used for distinguishing between similar objects (e.g. first and second weights are each shown as a different weight, and other similar), and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be implemented in other sequences than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or modules is not necessarily limited to those explicitly listed but may include other steps or modules not expressly listed or inherent to such process, method, article, or apparatus, such that the partitioning of modules by embodiments of the application is only one logical partitioning, such that a plurality of modules may be combined or integrated in another system, or some features may be omitted, or not implemented, and further that the coupling or direct coupling or communication connection between modules may be via some interfaces, such that indirect coupling or communication connection between modules may be electrical or other like, none of the embodiments of the application are limited. The modules or sub-modules described as separate components may or may not be physically separate, may or may not be physical modules, or may be distributed in a plurality of circuit modules, and some or all of the modules may be selected according to actual needs to achieve the purposes of the embodiments of the present application.
The embodiment of the application provides a model training method, a related device and a storage medium, which can be applied to an image processing system. The model training device is used for training a preset model at least used for extracting image features and outputting the target image features extracted from the target image through the preset model. The image processing device is used for processing the target image features extracted by the preset model to obtain an image processing result. At least one image processing result (e.g. the recognition probability distribution) obtained by the image processing means may be used by the model training means to iteratively update parameters of the predetermined model, such as parameters of the weight distribution. The model training device can be an application program for training the preset model and outputting the image characteristics extracted by the preset model, or a server provided with the application program for training the preset model and outputting the image characteristics extracted by the preset model; the image processing apparatus may be an image recognition program that processes the target image feature to obtain a processing result, for example, an image processing model, and may be a terminal device in which the image processing model is deployed.
It should be noted that, the preset model obtained by training in the embodiment of the present application is particularly suitable for the situation that the uncertain image identifies the scene, that is, the image to be identified has a blurred area. In the case of image recognition in the field of autopilot, for example, the image data fed back to the image recognition model during high-speed driving of the motor vehicle is often more or less blurred. The preset model obtained through training is more suitable for the application environment than a normal deep neural network model, and because the preset model obtained through training of the application learns weight distribution, distribution of image characteristics can be given under the condition that a fuzzy area exists in a target image, and uncertainty prediction is given based on the distribution of the image characteristics instead of a single prediction result.
In addition, although the embodiment of the present application describes how to train a model and how to extract image features by taking an image recognition scene as an example, those skilled in the art may implement the model training method of the embodiment of the present application in the scenes of finance, weather, behavior prediction, and the like according to the disclosure of the embodiment of the present application, so as to perform corresponding feature extraction. The inventive principles disclosed in the embodiments of the present application include at least replacing a set of weights shared in a conventional implicit neural network model with a weight distribution, so that the model learns the feature distribution of sample data instead of the fixed features when performing model training, improving the expressive power of the model, and finally improving the recognition/prediction accuracy.
The solution provided in the embodiments of the present application relates to techniques such as artificial intelligence (Artificial Intelligence, AI), natural language processing (Nature Language processing, NLP), and machine learning (MachineLearning, ML), and is specifically described by the following embodiments:
the AI is a theory, a method, a technology and an application system which simulate, extend and extend human intelligence by using a digital computer or a machine controlled by the digital computer, sense environment, acquire knowledge and acquire an optimal result by using the knowledge. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.
AI technology is a comprehensive discipline, and relates to a wide range of technologies, both hardware and software. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.
NLP is an important direction in the fields of computer science and artificial intelligence. It is studying various theories and methods that enable effective communication between a person and a computer in natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. Thus, the research in this field will involve natural language, i.e. language that people use daily, so it has a close relationship with the research in linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic questions and answers, knowledge graph techniques, and the like.
In some embodiments, the model training apparatus and the image processing apparatus are separately deployed, and referring to fig. 1, the model training method provided in the embodiments of the present application may be implemented based on a communication system shown in fig. 1. The communication system may comprise a server 01 and a terminal device 02.
The server 01 may be a model training device in which a preset model to be trained and a training program may be deployed.
The terminal device 02 may be an image processing apparatus in which an image processing model, such as an image classification model, an image recognition model, or an image detection model, or an AI model trained by a machine learning-based method may be deployed. The image recognition model can be a face recognition model, a license plate recognition model or a road sign recognition model and the like. The image detection model may be an object detection model or the like.
The server 01 extracts a target image feature based on a target image stored locally or inputted externally using a preset model, and transmits the target image feature to the terminal device 02. The terminal device 02 may process the target image features of the target image using the image processing model to obtain an identification probability distribution, and then feed back the identification probability distribution to the server 01. The server 01 may update the preset model to be trained based on the recognition probability distribution, for example, update parameters of the weight distribution.
It should be noted that, the server according to the embodiments of the present application may be an independent physical server, or may be a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server that provides cloud services, a cloud database, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, and basic cloud computing services such as big data and an artificial intelligence platform.
The terminal device according to the embodiments of the present application may be a device that provides voice and/or data connectivity to a user, a handheld device with wireless connection functionality, or other processing device connected to a wireless modem. Such as mobile telephones (or "cellular" telephones) and computers with mobile terminals, which can be portable, pocket, hand-held, computer-built-in or car-mounted mobile devices, for example, which exchange voice and/or data with radio access networks. For example, personal communication services (English full name: personal Communication Service, english short name: PCS) telephones, cordless telephones, session Initiation Protocol (SIP) phones, wireless local loop (Wireless Local Loop, english short name: WLL) stations, personal digital assistants (English full name: personal Digital Assistant, english short name: PDA) and the like.
Referring to fig. 2, fig. 2 is a flow chart of a model training method according to an embodiment of the present application. The method can be executed by a model training device and an image processing device, and updates the weight distribution to obtain a target weight distribution as a final weight distribution of a preset model, and the model training method comprises the following steps:
step S110, a target image is acquired.
In this embodiment of the present application, at different processing stages of the preset model, the target image may be obtained based on different types of images, where the different types of images may include a training image and an image to be identified. In a training stage of a preset model, the target image can be obtained based on a training image, for example, part or all of an open source data set, or an image collected and marked by a user; in the reasoning (application) stage of the preset model, the target image may be obtained based on an image to be recognized, where the image to be recognized may be a face to be recognized collected by the face recognition device or an environmental image collected during the running process of the automatic driving device, which is not limited in the embodiment of the present application.
In this embodiment of the present application, the target image may be a training image or an image to be identified after being processed, or may be a training image or an image to be identified directly, without any processing, that is, an image of the preset model is directly input. Those skilled in the art may choose whether to perform filtering, binarization and other image preprocessing operations on the training image or the image to be identified according to actual needs, which is not limited in this embodiment.
In order to further increase the range of the image features learned by the preset model obtained by training in the embodiment of the application, the preset model learns a wider image feature distribution. In some embodiments, the target image includes noise such that the preset model may extract accurate image features based on the target image including noise. The target image may be obtained after performing an image processing operation (for example, increasing noise) before the training image is input into the preset model; the target image may also be obtained by adding noise to the input training image in the process of obtaining the image features of the target image by the preset model, for example, the training image input to the preset model may be processed by the target weight distribution in the following embodiment to directly obtain the target image features (i.e. the adding noise and the target image features are obtained by the weight distribution).
When the target image includes noise, the target image may be obtained by performing a blurring process on an image input into a preset model, for example, a blurring process may be performed on a clear environmental image acquired when the automatic driving device is stationary, so as to obtain a blurred image (i.e., a target image including noise). The blurred image may be regarded as an environmental image including a blurred region acquired during the running process of the simulated automatic driving device, and if the preset model in the embodiment of the present application has the capability of acquiring accurate target image features from the simulated blurred image, it is indicated that the preset model may also acquire accurate target image features from a real blurred image. The preset model of the embodiment of the application can acquire wider image features, has strong generalization capability and application range, and is not only suitable for image feature acquisition scenes of clear images.
Step S120, acquiring a target weight distribution, and acquiring a target image feature from the target image according to the target weight distribution.
In this embodiment of the present application, the target weight distribution may be a weight distribution that meets a first preset condition.
In some embodiments, the first preset condition may be: and in a training stage of a preset model, after the weights are obtained by sampling from the target weight distribution, the preset model can acquire sufficiently accurate target image features from the target image based on the weights. Sufficiently accurate target image features refer to: when the image recognition is performed based on the target image features, a recognition result corresponding to the label of the target image can be obtained. For example, if the target image is an image 1 with a label of "cat", the recognition result obtained after the recognition based on the target image features of the image 1 is also "cat"; or, the recognition result is a recognition probability distribution, and the target category with the largest probability value in the recognition probability distribution is a cat.
It will be appreciated that the target weight distribution may be difficult to obtain directly, requiring stepwise iteration. In some embodiments, the target weight distribution may be updated stepwise based on the initialized weight distribution, i.e. the initial weight distribution will be updated iteratively a plurality of times before the target weight distribution is obtained. For example, a first weight distribution may be obtained by random initialization, then the first weight distribution is updated to obtain a second weight distribution, then the second weight distribution is updated to obtain a third weight distribution, the above weight distribution updating process is continuously performed for multiple times, and each time the weight distribution updating is based on the historical weight distribution until a target weight distribution meeting a first preset condition is obtained.
It is contemplated that the data distribution is often determined by parameters of the data distribution, e.g., a gaussian distribution is typically determined by mean and variance. Therefore, in some embodiments, if the first weight distribution is a gaussian distribution, the obtaining the first weight distribution may be obtaining a mean value and a variance, where the mean value and the variance may be obtained based on a preset (e.g. random initialization) or may be obtained from an outside of the model training device when the mean value and the variance are obtained, which is not limited in this embodiment of the present application.
The target image feature may also be an image feature that meets a second preset condition, similar to the target weight distribution may be a weight distribution that meets a first preset condition. Specifically, the image features meeting the second preset condition refer to one of a group of image features meeting the second preset condition, the group of image features comprises at least two time sequence adjacent image features, the similarity between the at least two time sequence adjacent image features is not smaller than a second preset threshold, and the at least two time sequence adjacent image features are acquired from the target image by the preset model according to the target weight distribution. For example, the preset model iteratively acquires the image feature a, the image feature b and the image feature c from the target image a according to the target weight distribution, and if the difference between the image feature c and the image feature b is small, that is, the similarity between the image feature c and the image feature b is not smaller than the second preset threshold, the image feature b may be set as the target image feature.
To quickly acquire the target image feature, in some embodiments, determining whether the image feature is a target image feature after each image feature is acquired, as shown in fig. 3, acquiring a target image feature from the target image according to the target weight distribution includes:
step S121, acquiring a first weight and a second weight from the target weight distribution.
The first weight does not refer to the weight obtained by the first acquisition, and the weight obtained by three times, four times or more in the actual target image characteristic acquisition process of the preset model can be obtained; the first weight corresponds to the first image feature, and the weight capable of acquiring the first image feature based on the historical image feature (including the target image) is the first weight. Similarly, the second weight also does not refer specifically to a weight obtained from the second acquisition in the target weight distribution.
The first weight and the second weight refer to different weights only, and there is a difference in the weight acquisition order. The second weight is obtained independently of the first weight, that is, the second weight and the first weight are obtained from the same target weight distribution only, and no other corresponding relation exists. It will be appreciated that although the first weight and the second weight are different weights obtained from the same weight distribution, in some embodiments, the value of the first weight may be the same as the value of the second weight, i.e., in some cases, the first weight and the second weight may be obtained with the same value, which is not limited in this embodiment.
Step S122, acquiring a first image feature based on the historical image feature according to the first weight, and acquiring a second image feature based on the first image feature according to a second weight.
In this embodiment of the present application, the first image feature and the second image feature are a group of image features that meet a second preset condition, and the two image features are adjacent in time sequence, that is, after a time step of obtaining the first image feature, a next time step obtains the second image feature based on the first image feature according to a second weight.
Similar to the first weight, the first image feature is not particularly the image feature obtained by the first image feature acquisition, and may be an image feature with similarity with the second image feature not smaller than a second preset threshold.
Similarly, the second image feature is not particularly limited to the image feature obtained by performing the second image feature acquisition, and may be an image feature that is not smaller than the first image feature by a second preset threshold, that is, the target image feature.
In the embodiment of the application, in order to obtain the target image feature, multiple weight acquisitions and image feature acquisitions may be required. As shown in fig. 4, the image features may have been acquired a plurality of times based on the target weight distribution and the target image before the first image features are extracted from the historical image features based on the first weights. For example, the first image feature may have been acquired 1 time before the first image feature is acquired, and the specific procedure may be: acquiring a weight 1 from the target weight distribution, acquiring an image feature 1 from the target image according to the weight 1, then acquiring a first weight from the target weight distribution, and acquiring the first image feature from the image feature 1 according to the first weight.
It may be appreciated that before the first image feature is extracted from the historical image features based on the first weight, the image features may not be acquired based on the target weight distribution and the target image, that is, the first image feature is directly acquired based on the target image, and the second image feature is the image feature acquired by performing the image feature acquisition for the second time.
Step S123, if the similarity between the first image feature and the second image feature is not less than a second preset threshold, taking the second image feature as the target image feature.
In this embodiment of the present application, the calculation manner of performing similarity calculation on the first image feature and the second image feature may be performed based on an existing vector distance/similarity calculation manner such as cosine similarity, euclidean distance, chebyshev distance, etc., and a person skilled in the art may select an appropriate manner according to actual needs, which is not limited in this embodiment of the present application.
In the embodiment of the present application, in order to determine whether the target weight distribution can be used as the final weight distribution of the preset model, it is required to determine whether the target weight distribution meets the first preset condition, that is, whether the preset model can acquire sufficiently accurate target image features from the target image based on the target weight distribution, so as to identify a target category consistent with the label of the target image based on the sufficiently accurate target image features. Since the target weight distribution may comprise a plurality of different weights, different image features may be obtained from the target image based on the different weights, and the different image features may result in different recognition results, it is necessary to obtain sufficiently stable image features from the target image according to the target weight distribution in order to determine whether the target weight distribution meets the first preset condition. In order to conveniently determine the sufficiently stable image feature, if the similarity of a group of image features adjacent in time sequence, which are iteratively acquired based on the target weight distribution, is not smaller than a second preset threshold, it can be considered that the sufficiently stable image feature is acquired, and at this time, the image feature (i.e., a second image feature having the similarity with the first image feature) subsequent in time sequence in the group of image features can be taken as the target image feature.
It will be appreciated that, since the similarity of the first image feature to the second image feature is sufficiently high, in some embodiments, the first image feature may also be considered as the target image feature.
It should be noted that there is a high probability that the similarity between the image feature acquired for the first time and the image feature acquired for the second time is smaller than the second preset threshold, that is, the image feature acquiring step is already performed for a plurality of time steps before the target image feature is acquired.
In some embodiments, when each time step acquires an image feature, the image feature acquired a time step above the image feature of the current time step is acquired as an input to the preset model. For example, first acquiring weight 1 from the target weight distribution, and then acquiring image feature 1 from the target image based on the weight 1; next, weight 2 is acquired from the target weight distribution, and then image feature 2 is acquired from image feature 1 based on the weight 2; then calculating the similarity between the image feature 1 and the image feature 2, and if the similarity is not smaller than a second preset threshold value, determining the image feature 2 as a target image feature; if the similarity of the two is smaller than a second preset threshold value, acquiring the image feature 3 according to the weight 3 re-acquired from the target weight distribution based on the image feature 2, and then continuously calculating the similarity of the image feature 3 and the image feature 2; repeating the above process until the target image feature is obtained.
In order to obtain more stable target image features from the target image, in one embodiment, the weight sampling and image feature iteration may not only be stopped when a pair of image features with a similarity not greater than a second preset threshold (i.e., the first image feature and the second image feature) first occur, but be performed redundantly several times more. After the second image features are obtained, weight sampling and image feature acquisition are continued, and the similarity between the image features obtained by subsequent iteration is guaranteed to meet the requirements.
Specifically, in some embodiments, steps S121-S123 may also be replaced by the following steps a and b:
a. sampling a first preset number of weights from the target weight distribution.
In this step, the preset number may be any integer greater than 2, such as 3, 5 or 10, and those skilled in the art may set the preset number according to the actual application scenario.
b. And iteratively extracting image features from the target image based on the first preset number of weights until the iteratively extracted image features meet a second preset condition.
For example, a second consecutive preset number of image features, the similarity of which is not smaller than a second preset threshold, appear in the first preset number of image features iteratively extracted based on the first preset number of weights.
Specifically, if the first preset number is 10, the second preset number is 5; when the similarity of the 2 nd image feature and the 3 rd image feature obtained by iterative extraction is not smaller than a second preset threshold value based on the 2 nd weight and the 3 rd weight, if the 4 th to 6 th weights obtained by sampling are next extracted in an iterative manner, the 4 th to 6 th image features, the 2 nd image feature and the 3 rd image feature are obtained by iterative extraction, and the similarity between the two features is not smaller than the second preset threshold value, a steady-state solution can be considered to be obtained, and the weight sampling and the iteration can be stopped.
As can be seen, this embodiment redundantly performs several weight sampling and image feature iterations after a pair of image features (i.e., the first image feature and the second image feature) having a similarity not greater than a second preset threshold occurs for the first time. After the second image feature is obtained, weight sampling and image feature obtaining are continued, so that the similarity between the image features obtained by subsequent iteration is guaranteed to meet the requirements, and the finally obtained target image feature is more stable and reliable.
And step S130, identifying the characteristics of the target image to obtain the identification probability distribution of the target image.
As described in the embodiment of step S120, the target weight distribution may not be directly obtained, and may be obtained by gradually updating a randomly initialized first weight distribution. Thus, after the preset model obtains the target image features from the target image, referring to fig. 5, the target image features are input into the image recognition model to obtain a recognition probability distribution based on the image features,
the identification probability distribution may include a plurality of identification categories and probability values corresponding to the plurality of identification categories.
In this embodiment of the present application, a class in the probability distribution for identification, where the probability value is greater than the preset probability value, may be determined as a target class, so as to compare with a label of the target image (such as similarity calculation), and obtain a conclusion about whether the feature of the target image is accurate.
It may be understood that the number of the categories with probability values greater than the preset probability value in the identification probability distribution may be one or more; for example, the preset probability value may be set to 80%, and the identification probability distribution includes a plurality of categories and corresponding probability values.
Step S140, if the similarity between the target category and the label of the target image is not less than a first preset threshold, the target weight distribution is used as a final weight distribution of a preset model.
In the embodiment of the present application, the preset model may be a feature extraction model constructed by using an implicit Neural network model, such as an implicit Neural network model of ordinary differential equation (Ordinary Differential Equation, ODE), neural Network Ordinary Differential Equation (NODEs), and DEQ. These implicit neural network models replace explicit, depth-stacked layers (e.g., convolutional, pooled, fully connected layers, etc., typically provided in convolutional neural network models) with analytical conditions that the model must meet, and are capable of modeling models with "infinite" depth over a constant memory footprint. Thus, the implicit neural network model has a strong advantage in memory consumption over the explicit neural network model.
Conventional implicit neural network models, such as conventional DEQ models, typically employ a set of shared weights to iteratively acquire image features from a target image, i.e., each iteration round employs a fixed weight during which the target image features are iteratively acquired (i.e., steady state solutions are acquired).
When the implicit neural network model is used for processing the image data of the input model, the implicit neural network model often processes the image data through an iteration method. Specifically, taking a DEQ model as an example, the DEQ model carries out iterative processing on the target image until a steady-state solution is obtained, namely, the image characteristics output after each round of iteration accord with a second preset condition; for example, if the input of the DEQ model is the original image X and the output of each iteration is the image feature Z, each step of iteratively obtaining a steady state solution can be expressed as:
Z[i+1]=f θ (Z[i];X) for i = 0,1,2,······;
Wherein θ represents a weight for image feature acquisition of a target image, and a specific data processing function is constructed based on θ when image feature acquisition is performedf θ The method comprises the steps of carrying out a first treatment on the surface of the By the data processing functionf θ The steady state solution of the DEQ model obtained by iterative processing of the target image is Z [ i+1 ]]= Z[i]。
Unlike the conventional implicit neural network model, in the embodiment of the present application, the preset model may be constructed based on a specific DEQ model, in which a target weight distribution for extracting weights from the sampled image features is preset, so that each round of iteratively acquiring a steady-state solution re-performs weight sampling, that is, each image feature acquired based on the same target image is based on a different weight.
It should be noted that, although the embodiment of the present application uses a specific DEQ model as an example to describe how the implicit neural network model trains and extracts image features, those skilled in the art may implement the same or similar modification to other implicit neural network models, such as ODE or NODEs, according to the modification of the conventional DEQ model disclosed in the present application (preset weight distribution or sampling of weights by setting the bayesian neural network model described in the following embodiment), so that each iteration round when iteratively acquiring a steady state solution (target image feature) can use different weights.
In one possible design, the particular DEQ model includes a bayesian neural network model (Bayesian Neural Network, BNN) to sample weights for image feature extraction from a weight distribution at each step of iteratively extracting image features. The comparison of the BNN model and the traditional model is shown in fig. 6, wherein x represents an input layer, h represents a hidden layer, and y represents an output layer; the BNN model comprises weight distribution, namely the weights are regarded as Gaussian distribution obeying a mean value of mu and a variance of delta, and each weight is different from the Gaussian distribution. The conventional model shown in fig. 6 is optimized with fixed weights during training, and the BNN model is optimized with the mean and variance of the weight distribution; therefore, the traditional model learns fixed image characteristics based on the training image, and the preset model of the embodiment of the application learns the distribution of the image characteristics based on the training image.
In one possible design, the full connectivity layer in the traditional DEQ model may be replaced with the BNN model, which in embodiments of the present application may be referred to as a Bayesian-DEQ model. The Bayesian-DEQ model acquires weights from the target weight distribution through the BNN model, and acquires image features from the target image according to the weights after acquiring the weights instead of acquiring different image features based on a set of fixed weights, namely acquiring different image features according to different weights, so that the acquisition range of the image features is greatly increased. The Bayesian-DEQ model in the embodiment of the application is not a feature but a feature distribution learned in training, which is equivalent to adding noise in training of a non-Bayesian-DEQ model, so that the feature distribution learned by the model is greatly expanded.
As shown in FIG. 7, if the training process is mapped into two-dimensional European space, the Bayesian-DEQ model learns a tree-like distribution, rather than just a straight line as the Bayesian-DEQ model (the conventional model) learns. As the range of the features learned by the Bayesian-DEQ model is greatly increased, more information can be used for identification, so that a more accurate identification result can be given.
For example, in the field of autopilot, autopilot devices often need to identify captured images of the environment in order to make driving action decisions. The autopilot device is often in a high speed state during driving, i.e. the image acquisition is performed in a high speed motion state, which may result in a blurred area of the acquired environmental image. The traditional model cannot learn accurate image characteristics based on the noisy image, so that the non-training image cannot be accurately identified.
Specifically, since a fixed weight is set in the conventional model, a fixed image feature is learned from the training image according to the fixed weight; if the training image is a sharp image, not including blurred regions, then the conventional model learns fixed sharp image features. When the conventional model is deployed to the automatic driving apparatus, since the recognition image features matching the learned image features cannot be extracted based on these environmental images including the blurred region, the conventional model will be caused to fail to recognize the environmental images including the blurred region well.
For example, the training image is a clear and well-defined railing image, the railing image collected by the automatic driving device in the driving process is a continuous railing color strip, and the image features of the training image and the environment image are obviously different. Therefore, the image features learned based on clear training images and the image features acquired based on the environment images comprising the fuzzy areas of the traditional model cannot be matched, so that the rails on two sides of the road cannot be accurately identified in the driving process, and improper driving actions are executed and traffic accidents are caused very probably due to incorrect or untimely identification.
Similarly, even if training is performed based on a training image including a blurred region in a conventional model, the distribution of image features cannot be learned, but only fixed blurred features. If the conventional model learns fixed blurred image features, the conventional model cannot give an accurate recognition result based on a clear image, and may not obtain accurate image features based on different blurred forms.
Unlike the conventional model, in the blurred image acquired by the autopilot device, the preset model of the embodiment of the present application may extract the image feature distribution in the blurred image, instead of the fixed feature, that is, may obtain the feature distribution of the continuous railing color strip. Because the preset model is also the characteristic distribution learned based on the training image added with noise in the training stage, the image characteristic distribution learned by the preset model based on the training image is consistent with the image characteristic distribution acquired based on the environment image, and the image characteristic distribution of the railing is obtained.
It will be appreciated that the environmental image acquired by the autopilot device includes a blurred region, nor is it entirely because it is in the process of traveling itself. For example, while the autopilot device is traveling, other vehicles in a traveling state may also exist around, that is, the object of image acquisition is also in motion, which further increases the degree of blurring of the environmental image.
In summary, the conventional model cannot accurately extract image features of the blurred image acquired by the autopilot device, and thus cannot help to accurately identify the image. The preset model of the embodiment of the application can acquire the image characteristic distribution of the corresponding object from the blurred railing image, the road image, the vehicle image and the traffic light image based on the weight distribution instead of the fixed image characteristic, so that the accurate object can be identified based on the blurred image, and a reasonable automatic driving action decision can be made.
After describing the specific structure of the preset model and the image feature extraction process in the embodiment of the present application, how to update the weight distribution is continuously described, so as to gradually obtain the target weight distribution.
In this embodiment of the present application, if the first weight distribution is obtained by random initialization, even if the preset model obtains a steady-state solution, a target image feature is obtained, and an identification probability distribution obtained by identifying based on the target image feature may not match with a label of the target image.
Therefore, the first weight distribution is required to be updated to obtain the target weight distribution, so that the image features extracted by the preset model are more accurate, namely, the preset model learns more accurate image feature distribution conditions in the target image.
In the embodiment of the application, whether the first weight distribution needs to be updated is determined through the identification probability distribution of the image features acquired according to the first weight distribution. For example, after iteratively extracting the target image based on the first weight distribution to obtain image features meeting the second preset condition, identifying the image features meeting the second preset condition to obtain an identification probability distribution; then comparing the target category in the identification probability distribution with the label of the target image, and calculating the similarity; if the similarity of the two is smaller than a first preset threshold value, the image features which are obtained based on the first weight distribution and meet the second preset condition are not good enough, and the first weight distribution needs to be updated.
As described in the embodiment of step S120, if the first weight distribution in the embodiment of the present application is gaussian, when updating the first weight distribution, the updating of the first weight distribution may be implemented by updating the mean and variance of the first weight distribution.
In some embodiments, the specific updating step of the first weight distribution may include steps (1) - (3):
(1) A loss value is calculated based on the similarity or the recognition probability distribution.
In some embodiments, calculating the loss value may specifically include:
(1) and calculating a loss value based on the similarity.
For example, a classification error rate or an identification error rate may be determined based on the similarity, and then a loss value may be calculated based on the classification error rate or the identification error rate.
(2) And calculating a loss value based on the identified probability distribution.
For example, a cross entropy loss function may be employed to calculate a loss value based on the identified probability distribution.
Although several different loss value calculation manners are described in the embodiment of the present application, those skilled in the art may select other loss functions to calculate the loss value according to actual needs, which is not limited in the embodiment of the present application.
(2) And calculating the gradient of the parameter of the first weight distribution according to the loss value.
After the calculation mode of the loss value is defined, the loss value and the bias derivative of the parameter can be calculated respectively, and then the ratio of the bias derivatives is taken as the gradient of the parameter.
It will be appreciated that the parameter is related to the type of the first weight distribution, and if the first weight distribution is a gaussian distribution, the parameter is a mean and a variance. The first weight distribution may also be bernoulli distribution, beta distribution, poisson distribution, etc., and the corresponding parameter to be updated needs to be determined according to the type of the corresponding distribution. For the type of the first weight distribution, those skilled in the art may set the first weight distribution according to the actual application scenario as required, which is not limited in this embodiment.
It should be noted that, if the parameter corresponding to the data distribution type of the first weight distribution is bounded, for example, the parameter of the bernoulli distribution has a value range of [0,1]. In order to update the bounded parameters conveniently by a gradient optimization method, the bounded parameters need to be converted into unbounded parameters, for example, parameters of the Bernoulli distribution can be mapped into a preset numerical range (-infinity, + -infinity) through a normalization function, then the mapped parameters are the unbounded parameters of the Bernoulli distribution, and when the parameters of the weight distribution are updated, the unbounded parameters are updated.
(3) And updating parameters of the first weight distribution according to the gradient.
If the gradient is positive, reducing the parameters of the first weight distribution according to a preset step length; if the gradient is negative, increasing the parameter of the first weight distribution according to a preset step length.
It can be appreciated that if the first weight distribution includes a plurality of parameters, gradients of the respective parameters need to be calculated respectively, and then the corresponding parameters are updated according to the respective gradients.
Although in the embodiment of the present application, the gradient optimization method is taken as an example to describe how to update the parameters of the first weight distribution, those skilled in the art may also update the parameters by using one of the least squares method, the newton method and the quasi-newton method according to actual needs until the loss value converges or other predetermined end conditions are met (for example, the weight distribution is updated by a preset number of times).
In the above embodiment, only the first weight distribution is taken as an example to describe how to update the weight distribution, and when in practical application, if the second weight distribution obtained by updating the first weight distribution does not meet the first preset condition, that is, is not the target weight distribution, then a person skilled in the art may update the second weight distribution, the third weight distribution or the nth weight distribution by referring to the updating manner of the first weight distribution until obtaining the target weight distribution meeting the first preset condition, as the final weight distribution of the preset model.
According to the model training method, the target image is obtained, the target weight distribution is obtained according to the historical weight distribution, the target image features with wide feature information are obtained from the target image according to the target weight distribution, the target image features are identified to determine whether the target image features are accurate or not, and further whether the target weight distribution can be used as the final weight distribution of the preset model or not is determined, so that training of the preset model is completed. According to the embodiment of the application, the fixed weight in the existing model is replaced by the weight distribution, and the weight distribution is used as a to-be-optimized item of the preset model for training. Because the weight distribution may include a plurality of different weights, the plurality of different weights may help the preset model obtain different image features from the image, however, the different image features may not provide effective information for image recognition, and therefore, the embodiment of the present application further performs iterative update on the weight distribution to obtain a final weight distribution. Compared with the historical weight distribution, the final weight distribution also comprises a plurality of different weights, and the different weights can enable the preset model to acquire different image features from the image, but the different image features can provide feature information effective for image identification, namely the trained preset model can acquire wide and accurate image features from the image, and the image feature distribution method has flexible and changeable feature expression capability. Compared with the existing model which can only acquire fixed image features from an image according to fixed weights, the preset model of the embodiment of the application can acquire wide and accurate target image features from the image according to final weight distribution, and the wide and accurate target image features have wider and more dimensional image information than single fixed image features, so that the target image features can improve the recognition accuracy of the image recognition model.
The above embodiments describe how the preset model is updated in the training stage, and after the training is completed, the trained preset model may be deployed in a device that needs to perform image recognition, for example, in an autopilot device, so as to improve the recognition capability of the autopilot device on the blurred image.
In order to enable the preset model to acquire more accurate target image features from the target image, in this embodiment, the weight is extracted by taking an average value of a plurality of weights sampled in the target weight distribution as the image features, specifically, the preset model acquires the target image features from the target image according to the target weight distribution, including:
first, a plurality of weights are sampled from the target weight distribution.
In this embodiment, in order to make the final extracted target image feature more accurate and stable, weights are sampled from the target weight distribution multiple times, so as to calculate the average value of the multiple weights as the weight of the extracted target image feature.
It will be appreciated that the specific number of samplings may be set by those skilled in the art according to the actual situation, for example, 3 times, 5 times, 10 times, etc., and this embodiment is not particularly limited.
Then, an average value of the plurality of weights is calculated to obtain an image feature extraction weight.
And finally, acquiring target image features from the target image according to the image feature extraction weights.
Because the preset model of the embodiment of the application has the advantages of small occupation of storage resources and accurate acquisition of image features, the preset model is deployed in the automatic driving equipment, the image features of the environment image comprising the fuzzy area can be extracted by using the preset model so as to perform more accurate environment recognition, the storage resources can be saved, so that the automatic driving equipment can install more programs, other functions are expanded, and the equipment performance optimization is greatly improved.
In order to verify the improvement of the image recognition effect by the preset model in the embodiment of the application, the inventor uses a Bayesian neural network model and a Bayesian-DEQ neural network model (a feature extraction model obtained based on the model training method in the embodiment of the application) in the same level to perform experiments on CIFAR-10 data sets of image recognition. Under the premise of the same super parameters and the same model parameter quantity, the inventor finds that the test accuracy of the Bayesian neural network model is very unstable, and is 61.58 at the highest and only 40.92 at the lowest. And the accuracy of the Bayesian-DEQ neural network model can be continuously stabilized to be more than 60, and the highest accuracy reaches 62.39. Therefore, the experimental result can be considered to indicate that the Bayesian-DEQ neural network model is more stable than the Bayesian neural network model with the same magnitude, and the extracted image features have higher effect on improving the recognition accuracy, namely the accuracy of image feature extraction is better.
Having described the method of the embodiments of the present application, next, a model training apparatus of the embodiments of the present application will be described with reference to fig. 8, which apparatus is equally applicable to the server 01 shown in fig. 1, and the apparatus 60 includes:
an input-output module 601 configured to acquire a target image;
a processing module 602 configured to obtain a target weight distribution, obtain a target image feature from the target image according to the target weight distribution, wherein the target weight distribution is updated according to a historical weight distribution;
the processing module 602 is further configured to identify the target image feature, so as to obtain an identification probability distribution of the target image;
the processing module 602 is further configured to use the target weight distribution as a final weight distribution of a preset model if the similarity between the target category and the label of the target image is not less than a first preset threshold; wherein the target category is a category with a probability value larger than a preset probability value in the identification probability distribution;
the input-output module 601 is further configured to output the trained preset model or the final weight for deployment into other production environments.
In some embodiments, the processing module 602 is further configured to obtain a first weight and a second weight from the target weight distribution; and
acquiring a first image feature based on the historical image feature according to the first weight, and acquiring a second image feature based on the first image feature according to a second weight; and
if the similarity between the first image feature and the second image feature is not smaller than a second preset threshold, the second image feature is used as the target image feature;
wherein the historical image features are derived based on the target image.
In some embodiments, after the target image includes noise, the mapping of the target image features in the two-dimensional Euclidean space is a tree structure.
In some embodiments, the acquiring target image features from the target image according to the target weight distribution includes an image feature acquisition step of at least two time steps;
the image features of the current time step, which are acquired by the time step above, are acquired as input of the preset model.
In some embodiments, the pre-set model comprises an implicit neural network model for at least acquiring target image features from the target image;
The implicit neural network model includes a bayesian neural network model for sampling weights from at least a target weight distribution for target image feature acquisition.
In some embodiments, the processing module 602 is further configured to obtain a plurality of weights from the target weight distribution; and
calculating the average value of the weights to obtain an image feature extraction weight; and
and acquiring target image features from the target image according to the image feature extraction weights.
In the embodiment of the application, when the image feature extraction is performed, because the image feature extraction is performed based on the weights obtained by sampling from the target weight distribution, in order to make the target image feature obtained by final extraction more accurate and stable, the weights are sampled from the target weight distribution for a plurality of times, so as to calculate the average value of the weights as the weight for extracting the target image feature.
It will be appreciated that the specific number of samplings may be set by those skilled in the art according to the actual situation, for example, 3 times, 5 times, 10 times, etc., and this embodiment is not particularly limited.
In some embodiments, the apparatus 60 is deployed into an autopilot device, and the target image is an image to be identified acquired by the autopilot device;
The image to be identified comprises a blurring area;
the target image features acquired from the image to be identified include image features acquired based on the blurred region.
For example, in a blurred image acquired by an autopilot device, the preset model in the embodiment of the present application may extract an image feature distribution in the blurred image, instead of a fixed feature, that is, may obtain a feature distribution of a continuous railing color stripe. Because the preset model is also the characteristic distribution learned based on the training image added with noise in the training stage, the image characteristic distribution learned by the preset model based on the training image is consistent with the image characteristic distribution acquired based on the environment image, and the image characteristic distribution of the railing is obtained.
It will be appreciated that the environmental image acquired by the autopilot device includes a blurred region, nor is it entirely because it is in the process of traveling itself. For example, while the autopilot device is traveling, other vehicles in a traveling state may also exist around, that is, the object of image acquisition is also in motion, which further increases the degree of blurring of the environmental image. The preset model of the embodiment of the application can acquire the image characteristic distribution of the corresponding object from the blurred railing image, the road image, the vehicle image and the traffic light image based on the weight distribution instead of the fixed image characteristic, so that the accurate object can be identified based on the blurred image, and a reasonable automatic driving action decision can be made.
Because the automatic driving equipment collects the environment images in the driving process and often comprises the fuzzy area, the preset model of the embodiment of the application can obtain the image features more widely and accurately from the environment images comprising the fuzzy area, so that the image recognition result is more accurate, and the driving accuracy and safety of the automatic driving equipment are improved. In addition, compared with the convolutional neural network model with the same magnitude, the implicit neural network model only needs to consume half of GPU memory in the training process, and even less. In mobile terminal devices represented by autopilot devices, the hardware resources are particularly valuable, and the lower resource consumption means that the autopilot device can provide more resources for other functional implementations. Namely, the automatic driving equipment for realizing image recognition by deploying the implicit neural network model can provide more expansion functions.
According to the model training device, the target image is obtained, the target weight distribution is obtained according to the historical weight distribution, the target image features with wide feature information are obtained from the target image according to the target weight distribution, the target image features are identified to determine whether the target image features are accurate or not, and further whether the target weight distribution can be used as the final weight distribution of the preset model or not is determined, so that training of the preset model is completed. According to the embodiment of the application, the fixed weight in the existing model is replaced by the weight distribution, and the weight distribution is used as a to-be-optimized item of the preset model for training. Because the weight distribution may include a plurality of different weights, the plurality of different weights may help the preset model obtain different image features from the image, however, the different image features may not provide effective information for image recognition, and therefore, the embodiment of the present application further performs iterative update on the weight distribution to obtain a final weight distribution. Compared with the historical weight distribution, the final weight distribution also comprises a plurality of different weights, and the different weights can enable the preset model to acquire different image features from the image, but the different image features can provide feature information effective for image identification, namely the trained preset model can acquire wide and accurate image features from the image, and the image feature distribution method has flexible and changeable feature expression capability. Compared with the existing model which can only acquire fixed image features from an image according to fixed weights, the preset model of the embodiment of the application can acquire wide and accurate target image features from the image according to final weight distribution, and the wide and accurate target image features have wider and more dimensional image information than single fixed image features, so that the target image features can improve the recognition accuracy of the image recognition model.
Having described the methods and apparatus of the embodiments of the present application, a description will now be made of a computer-readable storage medium of the embodiments of the present application, which may be an optical disc having a computer program (i.e., a program product) stored thereon, which when executed by a processor, performs the steps described in the method embodiments described above, for example, obtaining a target image; acquiring target weight distribution, and acquiring target image characteristics from the target image according to the target weight distribution, wherein the target weight distribution is updated according to historical weight distribution; identifying the target image characteristics to obtain the identification probability distribution of the target image; if the similarity between the target category and the label of the target image is not smaller than a first preset threshold, the target weight distribution is used as the final weight distribution of a preset model; the target category is a category with a probability value larger than a preset probability value in the identification probability distribution. The specific implementation of each step is not repeated here.
It should be noted that examples of the computer readable storage medium may also include, but are not limited to, a phase change memory (PRAM), a Static Random Access Memory (SRAM), a Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a flash memory, or other optical or magnetic storage medium, which will not be described in detail herein.
The model training apparatus 60 in the embodiment of the present application is described above from the viewpoint of a modularized functional entity, and the server and the terminal device for executing the model training method in the embodiment of the present application are described below from the viewpoint of hardware processing, respectively.
It should be noted that, in the embodiment of the model training apparatus of the present application, the entity device corresponding to the input/output module 601 shown in fig. 8 may be an input/output unit, a transceiver, a radio frequency circuit, a communication module, an input/output (I/O) interface, etc., and the entity device corresponding to the processing module 602 may be a processor. The model training apparatus 60 shown in fig. 8 may have a structure as shown in fig. 9, and when the model training apparatus 60 shown in fig. 8 has a structure as shown in fig. 9, the processor and the transceiver in fig. 9 can implement the same or similar functions as the processing module 602 and the input-output module 601 provided in the foregoing apparatus embodiment corresponding to the apparatus, and the memory in fig. 9 stores a computer program to be invoked when the processor performs the above model training method.
The embodiment of the present application further provides a terminal device, as shown in fig. 10, for convenience of explanation, only the portion relevant to the embodiment of the present application is shown, and specific technical details are not disclosed, please refer to the method portion of the embodiment of the present application. The terminal device may be any terminal device including a mobile phone, a tablet computer, a personal digital assistant (Personal Digital Assistant, PDA), a Point of Sales (POS), a vehicle-mounted computer, and the like, taking the terminal device as an example of the mobile phone:
Fig. 10 is a block diagram showing a part of the structure of a mobile phone related to a terminal device provided in an embodiment of the present application. Referring to fig. 10, the mobile phone includes: radio Frequency (RF) circuitry 1010, memory 1020, input unit 1030, display unit 1040, sensor 1050, audio circuitry 1060, wireless fidelity (wireless fidelity, wiFi) module 1070, processor 1080, and power source 1090. It will be appreciated by those skilled in the art that the handset construction shown in fig. 10 is not limiting of the handset and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.
The following describes the components of the mobile phone in detail with reference to fig. 10:
the RF circuit 1010 may be used for receiving and transmitting signals during a message or a call, and particularly, after receiving downlink information of a base station, the signal is processed by the processor 1080; in addition, the data of the design uplink is sent to the base station. Generally, RF circuitry 1010 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a Low noise amplifier (Low NoiseAmplifier, LNA), a duplexer, and the like. In addition, the RF circuitry 1010 may also communicate with networks and other devices via wireless communications. The wireless communications may use any communication standard or protocol including, but not limited to, global system for mobile communications (GlobalSystem of Mobile communication, GSM), general Packet radio service (General Packet RadioService, GPRS), code division multiple access (Code DivisionMultiple Access, CDMA), wideband code division multiple access (Wideband Code Division Multiple Access, WCDMA), long term evolution (Long Term Evolution, LTE), email, short message service (Short Messaging Service, SMS), and the like.
The memory 1020 may be used to store software programs and modules that the processor 1080 performs various functional applications and data processing of the handset by executing the software programs and modules stored in the memory 1020. The memory 1020 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data (such as audio data, phonebook, etc.) created according to the use of the handset, etc. In addition, memory 1020 may include high-speed random access memory and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state memory device.
The input unit 1030 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the handset. In particular, the input unit 1030 may include a touch panel 1031 and other input devices 1032. The touch panel 1031, also referred to as a touch screen, may collect touch operations thereon or thereabout by a user (e.g., operations of the user on the touch panel 1031 or thereabout using any suitable object or accessory such as a finger, stylus, etc.), and drive the corresponding connection device according to a predetermined program. Alternatively, the touch panel 1031 may include two parts, a touch detection device and a touch controller. The touch detection device detects the touch azimuth of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch detection device and converts it into touch point coordinates, which are then sent to the processor 1080 and can receive commands from the processor 1080 and execute them. Further, the touch panel 1031 may be implemented in various types such as resistive, capacitive, infrared, and surface acoustic wave. The input unit 1030 may include other input devices 1032 in addition to the touch panel 1031. In particular, other input devices 1032 may include, but are not limited to, one or more of a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a track ball, a mouse, a joystick, etc.
The display unit 1040 may be used to display information input by a user or information provided to the user and various menus of the mobile phone. The display unit 1040 may include a display panel 1041, and alternatively, the display panel 1041 may be configured in a form of a Liquid crystal display (Liquid CrystalDisplay, LCD), an organic light-Emitting Diode (OLED), or the like. Further, the touch panel 1031 may overlay the display panel 1041, and when the touch panel 1031 detects a touch operation thereon or thereabout, the touch panel is transferred to the processor 1080 to determine a type of touch event, and then the processor 1080 provides a corresponding visual output on the display panel 1041 according to the type of touch event. Although in fig. 10, the touch panel 1031 and the display panel 1041 are two independent components to implement the input and output functions of the mobile phone, in some embodiments, the touch panel 1031 and the display panel 1041 may be integrated to implement the input and output functions of the mobile phone.
The handset may also include at least one sensor 1050, such as a light sensor, a motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor and a proximity sensor, wherein the ambient light sensor may adjust the brightness of the display panel 1041 according to the brightness of ambient light, and the proximity sensor may turn off the display panel 1041 and/or the backlight when the mobile phone moves to the ear. As one of the motion sensors, the accelerometer sensor can detect the acceleration in all directions (generally three axes), and can detect the gravity and direction when stationary, and can be used for applications of recognizing the gesture of a mobile phone (such as horizontal and vertical screen switching, related games, magnetometer gesture calibration), vibration recognition related functions (such as pedometer and knocking), and the like; other sensors such as gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc. that may also be configured with the handset are not described in detail herein.
Audio circuitry 1060, a speaker 1061, and a microphone 1062 may provide an audio interface between a user and a cell phone. Audio circuit 1060 may transmit the received electrical signal after audio data conversion to speaker 1061 for conversion by speaker 1061 into an audio signal output; on the other hand, microphone 1062 converts the collected sound signals into electrical signals, which are received by audio circuit 1060 and converted into audio data, which are processed by audio data output processor 1080 for transmission to, for example, another cell phone via RF circuit 1010 or for output to memory 1020 for further processing.
WiFi belongs to a short-distance wireless transmission technology, and a mobile phone can help a user to send and receive emails, browse webpages, access streaming media and the like through a WiFi module 1070, so that wireless broadband Internet access is provided for the user. Although fig. 10 shows a WiFi module 1070, it is understood that it does not belong to the necessary constitution of the handset, and can be omitted entirely as required within the scope of not changing the essence of the invention.
Processor 1080 is the control center of the handset, connects the various parts of the entire handset using various interfaces and lines, and performs various functions and processes of the handset by running or executing software programs and/or modules stored in memory 1020, and invoking data stored in memory 1020, thereby performing overall monitoring of the handset. Optionally, processor 1080 may include one or more processing units; alternatively, processor 1080 may integrate an application processor primarily handling operating systems, user interfaces, applications, etc., with a modem processor primarily handling wireless communications. It will be appreciated that the modem processor described above may not be integrated into processor 1080.
The handset further includes a power source 1090 (e.g., a battery) for powering the various components, optionally in logical communication with the processor 1080 via a power management system, such as for managing charge, discharge, and power consumption by the power management system.
Although not shown, the mobile phone may further include a camera, a bluetooth module, etc., which will not be described herein.
In the embodiment of the present application, the processor 1080 included in the mobile phone further has a control unit for executing the above method flow for training the preset model by the image processing device.
Referring to fig. 11, fig. 11 is a schematic diagram of a server structure according to an embodiment of the present application, where the server 1100 may have a relatively large difference due to different configurations or performances, and may include one or more central processing units (in english: central processing units, in english: CPU) 1122 (for example, one or more processors) and a memory 1132, and one or more storage media 1130 (for example, one or more mass storage devices) storing application programs 1142 or data 1144. Wherein the memory 1132 and the storage medium 1130 may be transitory or persistent. The program stored on the storage medium 1130 may include one or more modules (not shown), each of which may include a series of instruction operations on a server. Still further, the central processor 1122 may be provided in communication with a storage medium 1130, executing a series of instruction operations in the storage medium 1130 on the server 1100.
The Server 1100 may also include one or more power supplies 1126, one or more wired or wireless network interfaces 1150, one or more input-output interfaces 1158, and/or one or more operating systems 1141, such as Windows Server, mac OS X, unix, linux, freeBSD, and the like.
The steps performed by the server in the above embodiments may be based on the structure of the server 1100 shown in fig. 11. For example, the steps performed by the model training apparatus 60 shown in fig. 11 in the above-described embodiment may be based on the server structure shown in fig. 11. For example, the CPU 1122 may perform the following operations by calling instructions in the memory 1132:
acquiring a target image through the input-output interface 1158;
the CPU 1122 updates the historical weight distribution to obtain a target weight distribution, and obtains target image features from the target image according to the target weight distribution;
transmitting the target image features to an image processing device through an input-output interface 1158 to identify the target image features, thereby obtaining an identification probability distribution of the target image;
if the similarity between the target category and the label of the target image is not less than the first preset threshold, the central processor 1122 takes the target weight distribution as the final weight distribution of the preset model; the target category is a category with a probability value larger than a preset probability value in the identification probability distribution.
After training is completed, the final weight distribution or the preset model may also be output through the input-output interface 1158, so as to deploy the preset model into other application devices (e.g., an autopilot device) to help the other application devices obtain accurate and extensive image features from the image to be processed.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments.
It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the systems, apparatuses and modules described above may refer to the corresponding processes in the foregoing method embodiments, which are not repeated herein.
In the several embodiments provided in the embodiments of the present application, it should be understood that the disclosed systems, apparatuses, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be additional divisions when actually implemented, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or modules, which may be in electrical, mechanical, or other forms.
The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical modules, i.e., may be located in one place, or may be distributed over a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.
The computer program product includes one or more computer instructions. When the computer program is loaded and executed on a computer, the flow or functions described in accordance with embodiments of the present application are fully or partially produced. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be stored by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.
The foregoing describes in detail the technical solution provided by the embodiments of the present application, in which specific examples are applied to illustrate the principles and implementations of the embodiments of the present application, where the foregoing description of the embodiments is only used to help understand the methods and core ideas of the embodiments of the present application; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope according to the ideas of the embodiments of the present application, the present disclosure should not be construed as limiting the embodiments of the present application in view of the above.

Claims (9)

1. A method of model training, the method comprising:
acquiring a target image;
acquiring target weight distribution, and acquiring target image characteristics from the target image according to the target weight distribution, wherein the target weight distribution is updated according to historical weight distribution; the target weight distribution is used for sampling weights for acquiring target image characteristics;
identifying the target image characteristics to obtain the identification probability distribution of the target image;
if the similarity between the target category and the label of the target image is not smaller than a first preset threshold, the target weight distribution is used as the final weight distribution of a preset model;
Wherein the target category is a category with a probability value larger than a preset probability value in the identification probability distribution;
the image feature acquisition step of acquiring target image features from the target image according to the target weight distribution, wherein the image feature acquisition step comprises at least two time steps;
the image features acquired at the time step above the image features of the current time step are acquired as input of the preset model;
and the similarity between the target image characteristic and the image characteristic of the last time step of the target image characteristic is not smaller than a second preset threshold value.
2. The method of claim 1, wherein the acquiring target image features from the target image according to the target weight distribution comprises:
acquiring a first weight and a second weight from the target weight distribution;
acquiring a first image feature based on the historical image feature according to the first weight, and acquiring a second image feature based on the first image feature according to a second weight;
if the similarity between the first image feature and the second image feature is not smaller than a second preset threshold, the second image feature is used as the target image feature;
wherein the historical image features are derived based on the target image.
3. The method of claim 1, wherein the target image comprises noise, and the mapping of the target image features in two-dimensional euclidean space is a tree structure.
4. A method according to any one of claims 1-3, wherein the pre-set model comprises an implicit neural network model;
the implicit neural network model is used at least for acquiring target image features from the target image;
the implicit neural network model includes a bayesian neural network model for sampling weights from at least a target weight distribution for target image feature acquisition.
5. The method of claim 1, wherein acquiring target image features from the target image according to the target weight distribution comprises:
acquiring a plurality of weights from the target weight distribution;
calculating the average value of the weights to obtain an image feature extraction weight;
and acquiring target image features from the target image according to the image feature extraction weights.
6. The method of claim 1, wherein the target image is acquired based on an image to be identified acquired by an autopilot device;
the image to be identified comprises a blurring area;
The target image features acquired from the target image include image features acquired based on the blurred region.
7. A model training apparatus comprising:
an input-output module configured to acquire a target image;
the processing module is configured to acquire a target weight distribution, acquire target image characteristics from the target image according to the target weight distribution, wherein the target weight distribution is updated according to a historical weight distribution; the target weight distribution is used for sampling weights for acquiring target image characteristics;
the processing module is further configured to identify the characteristics of the target image to obtain an identification probability distribution of the target image;
the processing module is further configured to take the target weight distribution as a final weight distribution of a preset model if the similarity between the target category and the label of the target image is not smaller than a first preset threshold;
wherein the target category is a category with a probability value larger than a preset probability value in the identification probability distribution;
the image feature acquisition step of acquiring target image features from the target image according to the target weight distribution, wherein the image feature acquisition step comprises at least two time steps;
The image features acquired at the time step above the image features of the current time step are acquired as input of the preset model;
and the similarity between the target image characteristic and the image characteristic of the last time step of the target image characteristic is not smaller than a second preset threshold value.
8. A computing device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any of claims 1-6 when the computer program is executed.
9. A computer readable storage medium comprising instructions which, when run on a computer, cause the computer to perform the method of any of claims 1-6.
CN202210502310.8A 2022-05-10 2022-05-10 Model training method, related device and storage medium Active CN114743081B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210502310.8A CN114743081B (en) 2022-05-10 2022-05-10 Model training method, related device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210502310.8A CN114743081B (en) 2022-05-10 2022-05-10 Model training method, related device and storage medium

Publications (2)

Publication Number Publication Date
CN114743081A CN114743081A (en) 2022-07-12
CN114743081B true CN114743081B (en) 2023-06-20

Family

ID=82284823

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210502310.8A Active CN114743081B (en) 2022-05-10 2022-05-10 Model training method, related device and storage medium

Country Status (1)

Country Link
CN (1) CN114743081B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113569895A (en) * 2021-02-20 2021-10-29 腾讯科技(北京)有限公司 Image processing model training method, processing method, device, equipment and medium
CN114357166A (en) * 2021-12-31 2022-04-15 北京工业大学 Text classification method based on deep learning
CN114444579A (en) * 2021-12-31 2022-05-06 北京瑞莱智慧科技有限公司 General disturbance acquisition method and device, storage medium and computer equipment

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106548210B (en) * 2016-10-31 2021-02-05 腾讯科技(深圳)有限公司 Credit user classification method and device based on machine learning model training
CN110348428B (en) * 2017-11-01 2023-03-24 腾讯科技(深圳)有限公司 Fundus image classification method and device and computer-readable storage medium
CN110163234B (en) * 2018-10-10 2023-04-18 腾讯科技(深圳)有限公司 Model training method and device and storage medium
EP3722894B1 (en) * 2019-04-09 2022-08-10 Robert Bosch GmbH Control and monitoring of physical system based on trained bayesian neural network
CN111783551B (en) * 2020-06-04 2023-07-25 中国人民解放军军事科学院国防科技创新研究院 Countermeasure sample defense method based on Bayesian convolutional neural network
CN112733729B (en) * 2021-01-12 2024-01-09 北京爱笔科技有限公司 Model training and regression analysis method, device, storage medium and equipment
CN112598091B (en) * 2021-03-08 2021-09-07 北京三快在线科技有限公司 Training model and small sample classification method and device
CN112926789B (en) * 2021-03-17 2024-05-14 阳光慧碳科技有限公司 Satellite cloud image prediction method, prediction device and readable storage medium
CN113449188A (en) * 2021-06-30 2021-09-28 东莞市小精灵教育软件有限公司 Application recommendation method and device, electronic equipment and readable storage medium
CN113887325A (en) * 2021-09-10 2022-01-04 北京三快在线科技有限公司 Model training method, expression recognition method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113569895A (en) * 2021-02-20 2021-10-29 腾讯科技(北京)有限公司 Image processing model training method, processing method, device, equipment and medium
CN114357166A (en) * 2021-12-31 2022-04-15 北京工业大学 Text classification method based on deep learning
CN114444579A (en) * 2021-12-31 2022-05-06 北京瑞莱智慧科技有限公司 General disturbance acquisition method and device, storage medium and computer equipment

Also Published As

Publication number Publication date
CN114743081A (en) 2022-07-12

Similar Documents

Publication Publication Date Title
CN110364144B (en) Speech recognition model training method and device
US10943091B2 (en) Facial feature point tracking method, apparatus, storage medium, and device
CN114297730B (en) Countermeasure image generation method, device and storage medium
CN114612531A (en) Image processing method and device, electronic equipment and storage medium
CN117332844A (en) Challenge sample generation method, related device and storage medium
CN115239941B (en) Countermeasure image generation method, related device and storage medium
CN114743081B (en) Model training method, related device and storage medium
CN115392405A (en) Model training method, related device and storage medium
CN114625657A (en) Model interpretation method and device, electronic equipment and storage medium
CN116259083A (en) Image quality recognition model determining method and related device
CN113569043A (en) Text category determination method and related device
CN114943639B (en) Image acquisition method, related device and storage medium
CN116386647B (en) Audio verification method, related device, storage medium and program product
CN117079356A (en) Object fake identification model construction method, false object detection method and false object detection device
CN117011649B (en) Model training method and related device
CN116580268B (en) Training method of image target positioning model, image processing method and related products
CN109918684B (en) Model training method, translation method, related device, equipment and storage medium
CN117009845A (en) Training method, device and storage medium of class increment model
CN117115590A (en) Content auditing model training method, device and medium based on self-supervision learning
CN113568984A (en) Data processing method and related device
CN116958581A (en) Image processing method, device and storage medium
CN116992125A (en) Object recommendation method and device and storage medium
CN117132851A (en) Anti-patch processing method, related device and storage medium
CN117218506A (en) Model training method for image recognition, image recognition method and related device
CN115905416A (en) Data processing method and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant