WO2024021350A1 - 图像识别模型训练方法、装置、计算机设备和存储介质 - Google Patents

图像识别模型训练方法、装置、计算机设备和存储介质 Download PDF

Info

Publication number
WO2024021350A1
WO2024021350A1 PCT/CN2022/128994 CN2022128994W WO2024021350A1 WO 2024021350 A1 WO2024021350 A1 WO 2024021350A1 CN 2022128994 W CN2022128994 W CN 2022128994W WO 2024021350 A1 WO2024021350 A1 WO 2024021350A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
recognition
image
training sample
objects
Prior art date
Application number
PCT/CN2022/128994
Other languages
English (en)
French (fr)
Inventor
戴晶帼
陈�光
苏新铎
Original Assignee
广州广电运通金融电子股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广州广电运通金融电子股份有限公司 filed Critical 广州广电运通金融电子股份有限公司
Publication of WO2024021350A1 publication Critical patent/WO2024021350A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects

Definitions

  • the present disclosure relates to the field of image recognition technology, and in particular to an image recognition model training method, device, computer equipment and storage medium.
  • deep neural networks are used in various industries.
  • deep neural network models can be trained through pre-labeled image samples, so that the trained deep neural network model can recognize various targets appearing in the image to be recognized, such as identifying people in the image to be recognized. , cars, cats, dogs, etc.
  • training samples are usually collected first, the training samples are labeled, and the model is trained using the labeled training samples.
  • the number of various targets in the training samples on different local devices i.e. edge terminals
  • the target number distribution of the samples collected on each device is mutually exclusive. They are independent, do not influence each other, and satisfy the same probability distribution properties.
  • the technical problem solved by this disclosure is: the model recognition accuracy and adaptability obtained based on the existing model training method are not high enough.
  • An image recognition model training method includes:
  • the full training sample image set includes a variety of recognition objects
  • the image recognition model is trained using the target training sample image set to obtain a trained image recognition model.
  • the target recognition objects include main class recognition objects and sub-class recognition objects; based on the preset data construction rules, according to the global non-independent and identically distributed degree value and each recognition object Initial number, calculate the number of targets for each target recognition object, including:
  • the target number of the main class identification objects and the target number of the sub-class identification objects are respectively calculated; wherein, the main class identification objects are in The local object category accounts for the largest proportion.
  • determining the local main category identification object and the sub-category identification object includes:
  • the target number of the main category recognition objects and the target number of the sub-category recognition objects are respectively calculated based on the global non-independent and identically distributed degree value according to the preset data construction rules, include:
  • the number of targets for each target recognition object is calculated as:
  • i represents the i-th target recognition object
  • total_num_label(i) represents the number of targets of the i-th target recognition object.
  • the i-th target recognition object is the main category recognition object
  • its target number consists of two parts: num_basic_part(i) and num_label(i), where num_basic_part(i) represents the basic number of targets of the i-th target recognition object, num_label (i) represents the number of additional targets of the i-th target recognition object;
  • the i-th target recognition object is a sub-category recognition object, its number of targets only consists of the number of additional targets num_label(i).
  • the value range of the global non-IID degree value is R ⁇ [0, 1], where R is the global non-IID degree value.
  • An image recognition method includes:
  • the image to be recognized contains at least one of the target recognition objects
  • the image to be recognized is input to the trained image recognition model to identify the category of the target recognition object.
  • An image recognition model training device includes:
  • the training sample acquisition module is used to acquire the full training sample image set and the global non-IID degree value from the master node; the full training sample image set includes a variety of recognition objects;
  • a target recognition object determination module used to determine local target recognition objects and calculate the initial number of each type of recognition object in the full training sample image set
  • a target number calculation module configured to calculate the number of targets for each type of target recognition object based on the preset data construction rules and the global non-IID degree value and the initial number of each type of recognition object;
  • a target training sample construction module configured to construct a target training sample image set according to the target number of each target recognition object
  • the image recognition model training module is used to train the image recognition model using the target training sample image set to obtain a trained image recognition model.
  • An image recognition device the device includes:
  • the image acquisition module to be recognized is used to acquire the image to be recognized; the image to be recognized contains at least one of the target recognition objects;
  • the model training module is used to train any of the above image recognition model training methods to obtain a trained image recognition model
  • a category recognition module configured to input the image to be recognized into the trained image recognition model to identify the category of the target recognition object.
  • a computer device including a memory and a processor.
  • the memory stores a computer program.
  • the processor executes the computer program, it implements the steps in the above image recognition model training method embodiment and the above image recognition method embodiment. each step in.
  • a computer-readable storage medium has a computer program stored thereon.
  • the computer program is executed by a processor, the steps in the above image recognition model training method embodiment and the steps in the above image recognition method embodiment are implemented.
  • a computer program product includes a computer program that, when executed by a processor, implements each step in the above image recognition model training method embodiment and each step in the above image recognition method embodiment.
  • the above image recognition model training method, device, computer equipment and storage medium include obtaining the full training sample image set and the global non-independent and identically distributed degree value from the master node; the full training sample image set includes a variety of recognition objects; determining the local target Identify objects and calculate the initial number of each type of recognition object in the full training sample image set; based on the preset data construction rules, calculate each type of target recognition based on the global non-independent and identically distributed degree value and the initial number of each type of recognition object.
  • the target number of objects according to the target number of each target recognition object, a target training sample image set is constructed; the image recognition model is trained using the target training sample image set to obtain a trained image recognition model.
  • the present disclosure can transform any overall data set used for classification tasks (ie, the data set in the above-mentioned master node) into a distributed target training sample set that satisfies a certain degree of global non-independence and identical distribution.
  • This distributed sample collection can simulate heterogeneous data sets collected by edge terminals or terminal devices in different environments in real distributed machine learning scenarios. Based on the above concept, this disclosure can take into account the needs of privacy protection and data security in actual scenarios. It does not need to collect historical actual data in each scenario. It can simulate training samples that meet the corresponding global non-IID degree values according to the actual situation. Conduct training to improve training accuracy and model adaptability while taking into account privacy and security.
  • current distributed data set construction methods are usually limited to dividing the entire data set according to categories, that is, different devices have different target categories.
  • the distributed data sets generated by such methods do not include other non-independent identical objects in actual scenarios. Distribution, for example, the target categories of data collected by each device are the same, but the number of each category is different.
  • the data set construction method proposed in this disclosure can also more comprehensively cover various non-IID situations. Therefore, using the data set constructed by this method to train the image recognition model can better simulate the effect of training the above model with non-independent and identically distributed data sets in actual scenarios.
  • the advantage of this is: on the one hand, there is no need for enterprises or organizations to provide distributed data sets in real scenarios, reducing the risk of privacy leaks; on the other hand, through the data set construction method provided by this disclosure, it can generate various non-independent Using distributed data sets with the same distribution, training the target recognition algorithm model on this basis can more comprehensively understand the impact of data sets under different skewness conditions on the model effect obtained by distributed training, thus playing a guiding role. According to The above model prediction results can more accurately adjust parameters or select training plans for models in actual scenarios, improve model training efficiency, and reduce trial and error costs.
  • Figure 1 is an application environment diagram of the image recognition model training method in one embodiment
  • Figure 2 is a schematic flowchart of an image recognition model training method in one embodiment
  • Figure 3 is a schematic flowchart of an image recognition method in one embodiment
  • Figure 4 is a structural block diagram of an image recognition model training device in one embodiment
  • Figure 5 is a structural block diagram of an image recognition device in one embodiment
  • Figure 6 is an internal structure diagram of a computer device in one embodiment.
  • the image recognition model training method provided by the present disclosure can be applied in the application environment as shown in Figure 1.
  • the slave node 101 communicates with the master node 102 through the network.
  • the slave node 101 is used to obtain the image to be recognized and identify the objects in the image.
  • the slave node can be a variety of camera devices or other sensing devices located in different scenes.
  • the master node 102 can use an independent server or It is implemented by a server cluster composed of multiple servers.
  • an image recognition model training method is provided.
  • the method is applied to the slave node 101 in Figure 1 as an example to illustrate, including the following steps:
  • Step S201 Obtain the full training sample image set and the global non-IID degree value from the master node; the full training sample image set includes a variety of recognition objects.
  • the full training sample image set refers to a set that contains all data samples.
  • Each recognition object in the full training sample image set is labeled. For example, bicycles, pedestrians, and cars are labeled with corresponding tags. Label.
  • non-independent and identically distributed degree value refers to the degree to which non-independent and identical distribution is satisfied.
  • non-independent and identically distributed means that the data distribution attributes of the sample are independent of each other but have different distributions. The different distributions are reflected in different nodes in the same system. There are differences in the amount of data for samples of the same category. For example, in the road camera recognition system, there are a total of 3 categories in the sample images collected by the highway camera recognition node and the community camera recognition node: pedestrians, non-motorized vehicles and motor vehicles.
  • the number of motor vehicle images collected by the highway camera recognition node is much higher than the number of samples in the other two categories, while the number of pedestrian and non-motor vehicle sample images obtained by the community camera recognition node is much more than the number of motor vehicle categories.
  • the data samples obtained are highly skewed data.
  • the master node determines the global non-IID degree value R.
  • This value R is flexibly set by the user according to actual needs, R ⁇ [0, 1].
  • the entire system contains one master node and multiple slave nodes. There is only one global non-independent and identically distributed degree value R in the entire system, which means that the data distribution on all nodes in a system satisfies the global non-independent and identically distributed degree value R.
  • Step S202 Determine local target recognition objects, and calculate the initial number of each recognition object in the above-mentioned full training sample image set.
  • the slave node 101 determines the local target recognition object.
  • the target recognition objects are also flexibly set by the user according to actual needs. For example, a camera device on the highway needs to set the target recognition objects as motor vehicles and pedestrians.
  • the slave node 101 calculates the initial number n i of each target recognition object based on the labels of each recognition object in the full training sample image set (where i represents the i-th category).
  • Step S203 Based on the preset data construction rules, the target number of each target recognition object is calculated according to the global non-IID degree value and the initial number of each recognition object.
  • the preset data construction rules refer to the rules on how to construct a data set with a global non-independent and identically distributed degree value of R.
  • the rules are as follows:
  • the slave node 101 determines the main categories of the local target recognition objects as j 1 and j 2 , and obtains the global non-independent and identically distributed degree value R from the master node. Calculate the initial number corresponding to category j 1 and category j 2 in the full training sample image set as and Set the category labels to j 1 and j 2 respectively and data is assigned to the image recognition node at the same time in, is the initial number of category j 1 ; is the initial number of category j 2 ; R is the global non-independent and identically distributed degree value; at this time, n j1 R and is the basic target number num_basic_part(i).
  • the number of each main category recognition object is composed of the addition of two parts
  • the first part is the basic target number
  • the second part is the additional target quantity It is the second component of the target number of the j -th type or j -th type of target recognition object in the aforementioned formula total_num_label: the value of the remaining target number num_label.
  • the number of each sub-category recognition object is the number of additional targets.
  • the image recognition node The total amount of data on is:
  • Formula (A.4) can become:
  • main class j 2 there are:
  • Q k is the probability distribution of the data on the k-th node, based on formula (A.10), there is:
  • Step S204 Construct a target training sample image set according to the number of targets for each target recognition object.
  • images of corresponding categories are selected from the image database to form a target training sample image set and assigned to local nodes;
  • Step S205 Use the target training sample image set to train the image recognition model to obtain a trained image recognition model.
  • the image recognition model can be VGG (VGG is a deep neural network used for image classification and localization problems. It originated from the article Very Deep Convolutional Networks for Large Scale Image Recognition by Simonyan and Zisserman, where VGG is the vision network of Oxford University where the two authors are working. The abbreviation of Visual Geometry Group), ResNet or MobileNet and other deep neural network models.
  • VGG is a deep neural network used for image classification and localization problems. It originated from the article Very Deep Convolutional Networks for Large Scale Image Recognition by Simonyan and Zisserman, where VGG is the vision network of Oxford University where the two authors are working. The abbreviation of Visual Geometry Group), ResNet or MobileNet and other deep neural network models.
  • the full training sample image set and the global non-IID degree value are obtained from the master node; the full training sample image set includes a variety of recognition objects; the local target recognition object is determined, and the total training sample image set is calculated
  • the initial number of each type of recognition object based on the preset data construction rules, according to the global non-independent and identically distributed degree value and the initial number of each type of recognition object, the number of targets for each type of target recognition object is calculated; according to each type of target recognition object number of targets, construct a target training sample image set; use the target training sample image set to train the image recognition model, and obtain a trained image recognition model.
  • This embodiment can transform any overall data set used for classification tasks (that is, the data set in the above-mentioned master node) into a distributed target training sample set that satisfies a certain non-independent and identically distributed degree.
  • This distributed sample collection can simulate heterogeneous data sets collected by edge terminals or terminal devices in different environments in real distributed machine learning scenarios. Based on the above concept, this embodiment can take into account the needs of privacy protection and data security in actual scenarios. It does not need to collect historical actual data in each scenario. It can simulate training that satisfies the corresponding global non-IID degree value according to the actual situation. Samples are used for training, while taking into account privacy and security, it improves training accuracy and improves model adaptability.
  • the local target recognition objects include main class recognition objects and sub-class recognition objects; the above step S203 includes: determining the local main class recognition objects and sub-class recognition objects; according to the preset data construction rules, based on The global non-IID degree value calculates the number of targets of the main category recognition objects and the number of targets of the sub-category recognition objects respectively; among them, the number of main category recognition objects accounts for the largest proportion among the local object categories.
  • num_major(k) is the number of types of main class recognition objects owned by the k-th slave node; d is the number of types of recognition objects in the full training sample image set; num_party is the number of nodes participating in the image recognition task.
  • any slave node k calculate the basic target number num_basic_part(i) of the i-th main class recognition object based on the global non-independent and identically distributed degree value R and the initial number n i of each recognition object in the full training sample image set:
  • num_basic_part(i) R ⁇ n i
  • R is the global non-independent and identically distributed degree value
  • n i is the initial number of the i-th recognition object in the full training sample image set.
  • qi represents the probability distribution of the i-th recognition object in the full training sample image set; num_major(k) is the number of main-class recognition objects owned by the k-th slave node; label_major(k, j) represents the k-th The j-th main class recognition object from the node; num_label_remain(label_major(k, j)) represents the remaining number of samples of the j-th main class recognition object from the k-th slave node; k ⁇ [1, K), where k represents The kth slave node; K means there are K nodes in total.
  • the number of targets of the i-th category on the k-th image recognition node is:
  • the preset data construction rules are used to allocate the number of targets corresponding to different target recognition objects from node k, thereby paving the way for the subsequent construction of a new target training sample image set.
  • the above-mentioned determination of local main class identification objects and sub-class identification objects includes:
  • Receive user preset commands and generate main category identification objects and sub-category identification objects according to user preset commands.
  • the user can specify the main class identification object and the sub-class identification object for each slave node.
  • the above-mentioned main class identification objects and sub-class identification objects can also be randomly generated by the local node.
  • the main category recognition object and the sub-category recognition object are flexibly set through user instructions, which can improve the adaptability of the system.
  • calculating the target number of identified objects of each main category and the target number of identified objects of each subcategory according to the preset data construction rules includes: calculating each target number according to the preset data construction rules.
  • the number of targets for each type of target recognition object is:
  • i represents the i-th target recognition object
  • total_num_label(i) represents the number of targets of the i-th target recognition object.
  • the i-th target recognition object is a main category recognition object
  • its target number consists of two parts: num_basic_part(i) and num_label(i), where the former represents the basic target number of the i-th target recognition object, and the latter represents the number of basic targets of the i-th target recognition object.
  • the preset data construction rules are used to allocate target numbers to different sample categories on the image recognition node k, thereby paving the way for the subsequent construction of a new target training sample image set.
  • the present disclosure also provides an image recognition method, which method includes:
  • Step S301 Obtain an image to be recognized; the image to be recognized contains at least one of the target recognition objects.
  • an image to be recognized is obtained from node 101; the image to be recognized contains a target object, for example, the target recognition object is a pedestrian.
  • Step S302 Use the above image recognition model training method to train to obtain a trained image recognition model.
  • the above image recognition model training method is used to train to obtain a trained image recognition model.
  • Step S303 Input the above-mentioned image to be recognized to the trained image recognition model to identify the category of the target recognition object.
  • the above-mentioned image to be recognized is input to the trained image recognition model to identify the category of the target recognition object. For example, if it is recognized that the image to be recognized contains a pedestrian, the label of the object is marked with pedestrian.
  • the trained image recognition model is trained through the above image recognition model training method for target object recognition, which can improve the accuracy of target object recognition.
  • the various steps in the flowcharts of Figures 2-3 are shown in sequence as indicated by the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless explicitly stated in this article, there is no strict order restriction on the execution of these steps, and these steps can be executed in other orders.
  • at least some of the steps in Figures 2-3 may include multiple steps or stages. These steps or stages are not necessarily executed at the same time, but may be executed at different times. The execution of these steps or stages The sequence is not necessarily sequential, but may be performed in turn or alternately with other steps or at least part of steps or stages in other steps.
  • an image recognition model training device 400 including: a training sample acquisition module 401, a target recognition object determination module 402, a target number calculation module 403, and a target training sample construction module 404 and image recognition model training module 405, wherein:
  • the training sample acquisition module 401 is used to acquire the full training sample image set and the global non-IID degree value from the master node; the full training sample image set includes a variety of recognition objects;
  • the target recognition object determination module 402 is used to determine local target recognition objects and calculate the initial number of each type of recognition object in the full training sample image set;
  • the target number calculation module 403 is used to calculate the target number of each target recognition object based on the preset data construction rules and according to the global non-IID degree value and the initial number of each type of recognition object;
  • the target training sample construction module 404 is used to construct a target training sample image set according to the target number of each target recognition object
  • the image recognition model training module 405 is used to train the image recognition model using the target training sample image set to obtain a trained image recognition model.
  • the target recognition objects include main category recognition objects and sub-category recognition objects; the above-mentioned target quantity calculation module 403 is further used to determine the local main category recognition objects and the sub-category recognition objects; According to the preset data construction rules and based on the global non-IID degree value, the target number of the main class identification objects and the target number of the sub-class identification objects are respectively calculated; wherein, the main class identification objects are in The local object category accounts for the largest proportion.
  • the target number calculation module 403 is further configured to: receive a user preset command, and generate the main category recognition object and the sub-category recognition object according to the user preset command.
  • the target number calculation module 403 is further configured to calculate the target number of each target recognition object according to the preset data construction rules as:
  • i represents the i-th target recognition object
  • total_num_label(i) represents the target number of the i-th target recognition object
  • the target number consists of two parts: num_basic_part( i) and num_label(i), where num_basic_part(i) represents the number of basic targets of the i-th target recognition object, and num_label(i) represents the number of additional targets of the i-th target recognition object;
  • num_basic_part(i) represents the number of basic targets of the i-th target recognition object
  • num_label(i) represents the number of additional targets of the i-th target recognition object
  • the i-th target recognition object is When a class identifies an object, its number of targets consists only of the number of additional targets num_label(i).
  • the value range of the global non-IID degree value is R ⁇ [0, 1], where R is the global non-IID degree value.
  • an image recognition device 500 including: an image acquisition module 501 to be recognized, a model training module 502 and a category recognition module 503, wherein:
  • the image to be recognized acquisition module 501 is used to acquire the image to be recognized; the image to be recognized contains at least one of the target recognition objects;
  • the model training module 502 uses the method as in the above image recognition model training method embodiment to train to obtain a trained image recognition model
  • the category recognition module 503 is used to input the image to be recognized into the trained image recognition model to detect the category of the target recognition object.
  • Each module in the above-mentioned image recognition model training device and image recognition device can be implemented in whole or in part by software, hardware, and combinations thereof.
  • Each of the above modules may be embedded in or independent of the processor of the computer device in the form of hardware, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
  • a computer device is provided.
  • the computer device may be a server, and its internal structure diagram may be as shown in Figure 6.
  • the computer device includes a processor, memory, and network interfaces connected through a system bus. Wherein, the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes non-volatile storage media and internal memory.
  • the non-volatile storage medium stores operating systems, computer programs and databases. This internal memory provides an environment for the execution of operating systems and computer programs in non-volatile storage media.
  • the database of the computer device is used to store training sample data and recognition results.
  • the network interface of the computer device is used to communicate with external terminals through a network connection.
  • the computer program implements an image recognition model training method and an image recognition method when executed by a processor.
  • FIG. 6 is only a block diagram of a partial structure related to the disclosed solution, and does not constitute a limitation on the computer equipment to which the disclosed solution is applied.
  • Specific computer equipment can May include more or fewer parts than shown, or combine certain parts, or have a different arrangement of parts.
  • a computer device including a memory and a processor.
  • a computer program is stored in the memory.
  • the processor executes the computer program, it implements the above image recognition model training method embodiment and image recognition method embodiment. of each step.
  • a computer-readable storage medium is provided, with a computer program stored thereon.
  • the computer program is executed by a processor, the steps in the above image recognition model training method embodiment and the image recognition method embodiment are implemented. .
  • a computer program product including a computer program that implements the steps in each of the above method embodiments when executed by a processor.
  • Non-volatile memory can include read-only memory (ROM), magnetic tape, floppy disk, flash memory or optical memory, etc.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM can be in many forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM).
  • the image recognition model training method provided by the present disclosure can transform any overall data set used for classification tasks into a distributed target training sample set that satisfies a certain degree of global non-independent and identical distribution, making it possible to simulate realistic distributed machine learning scenarios.
  • Training samples that meet the corresponding global non-independent and identically distributed degree values can be simulated according to the actual situation. Training, while taking into account privacy and security, improves training accuracy and model adaptability, and has strong industrial practicability.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

本公开涉及一种图像识别模型训练方法、装置、计算机设备和存储介质。本公开能够根据实际情况模拟出满足相应的全局非独立同分布程度值的训练样本进行训练,在兼顾隐私安全的同时,提高训练准确率。该方法包括:从主节点获取全量训练样本图像集合和全局非独立同分布程度值;全量训练样本图像集合中包括多种识别对象;确定本地的目标识别对象,并计算全量训练样本图像集合中每种识别对象的初始数量;基于预设的数据构造规则,根据全局非独立同分布程度值,计算得到每种目标识别对象的目标数量;根据每种目标识别对象的目标数量,构造目标训练样本图像集合;利用目标训练样本图像集合对图像识别模型进行训练,得到训练好的图像识别模型。

Description

图像识别模型训练方法、装置、计算机设备和存储介质
本公开要求于2022年07月28日提交中国专利局、申请号为202210896895.6、发明名称为“图像识别模型训练方法、装置、计算机设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。
技术领域
本公开涉及图像识别技术领域,特别是涉及一种图像识别模型训练方法、装置、计算机设备和存储介质。
背景技术
随着人工智能技术的发展,深度神经网络被用于各行各业。在图像识别领域,可通过事先标记的图像样本对深度神经网络模型进行训练,使得训练后的深度神经网络模型能够识别出待识别图像中出现的各种目标,例如识别出待识别图像中的人、汽车、猫、狗等。
在模型训练过程中,通常是先收集训练样本,对训练样本进行标注,使用带标注的训练样本对模型进行训练。现有的分布式训练方法中,通常是假设不同本地设备(即边缘端)上的训练样本中各种目标数量满足独立同分布的假设条件,即各设备上收集到的样本的目标数量分布相互独立、互不影响,且满足同一种概率分布属性。
然而,在现实应用过程中,由于实际环境的不同,导致不同设备(即边缘端)上收集到的训练样本中的待识别目标具有不同的分布属性,甚至高度偏态,例如普通居民街道上的摄像头采集的图像中,行人和自行车占据较大比例,而在高速公路上的摄像头采集到的图像中行人和自行车较少,各种类型的汽车更多。还例如,某些特色服饰只会出现在少数民族区域或特定国家,某类动物只会出现在特定区域的动物园监控摄像头中,而如果在图像识别模型的训练过程中,采用上述理想的独立同分布的样本进行分布式模型训练,获得的模型识别准确率不够高,另一方面,在现实中受到数据收集场景的局限,例如,考虑到隐私保护和数据安全的现实问题,无法收集边缘端上产生的历 史实际数据,因此,在边缘端上使用历史数据训练得到的模型无法适用于当前场景,适应性不够高。
发明内容
(一)要解决的技术问题
本公开解决的技术问题为:基于现有的模型训练方法获得的模型识别准确率和适应性不够高的问题。
(二)技术方案
基于此,有必要针对上述技术问题,提供一种图像识别模型训练方法、装置、计算机设备和存储介质。
一种图像识别模型训练方法,所述方法包括:
从主节点获取全量训练样本图像集合以及全局非独立同分布程度值;所述全量训练样本图像集合中包括多种识别对象;
确定本地的目标识别对象,并计算所述全量训练样本图像集合中每种所述识别对象的初始数量;
基于预设的数据构造规则,根据所述全局非独立同分布程度值和每种识别对象的初始数量,计算得到每种目标识别对象的目标数量;
根据所述每种目标识别对象的目标数量,构造目标训练样本图像集合;
利用所述目标训练样本图像集合对图像识别模型进行训练,得到训练好的图像识别模型。
在其中一个实施例中,所述目标识别对象包括主类识别对象和次类识别对象;所述基于预设的数据构造规则,根据所述全局非独立同分布程度值和所述每种识别对象的初始数量,计算得到每种目标识别对象的目标数量,包括:
确定本地的所述主类识别对象和所述次类识别对象;
根据预设的数据构造规则,根据所述全局非独立同分布程度值,分别计算所述主类识别对象的目标数量和所述次类识别对象的目标数量;其中,所述主类识别对象在本地对象类别中数量占比最大。
在其中一个实施例中,所述确定本地的所述主类识别对象和所述次类识别对象,包括:
接收用户预设命令,按照用户预设命令生成所述主类识别对象和所述次类识别对象。
在其中一个实施例中,所述根据预设的数据构造规则,基于所述全局非独立同分布程度值,分别计算所述主类识别对象的目标数量和所述次类识别对象的目标数量,包括:
根据所述预设的数据构造规则,计算每种目标识别对象的目标数量为:
Figure PCTCN2022128994-appb-000001
其中,i表示第i种目标识别对象,total_num_label(i)表示第i种目标识别对象的目标数量。当第i种目标识别对象是主类识别对象时,其目标数量由两部分组成:num_basic_part(i)和num_label(i),其中num_basic_part(i)表示第i种目标识别对象的基础目标数量,num_label(i)表示第i种目标识别对象的附加目标数量;当第i种目标识别对象是次类识别对象时,其目标数量仅由附加目标数量num_label(i)构成。
在其中一个实施例中,所述全局非独立同分布程度值的取值范围为R∈[0,1],其中,R为全局非独立同分布程度值。
一种图像识别方法,所述方法包括:
获取待识别图像;所述待识别图像中包含目标识别对象中的至少一个;
利用上述任一种图像识别模型训练方法训练得到训练好的图像识别模型;
将所述待识别图像输入至所述训练好的图像识别模型,以识别出所述目标识别对象的类别。
一种图像识别模型训练装置,所述装置包括:
训练样本获取模块,用于从主节点获取全量训练样本图像集合以及全局非独立同分布程度值;所述全量训练样本图像集合中包括多种识别对象;
目标识别对象确定模块,用于确定本地的目标识别对象,并计算所述全量训练样本图像集合中每种所述识别对象的初始数量;
目标数量计算模块,用于基于预设的数据构造规则,根据所述全局非独立同分布程度值和每种识别对象的初始数量,计算得到每种目标识别对象的目标数量;
目标训练样本构造模块,用于根据所述每种目标识别对象的目标数量,构造目标训练样本图像集合;
图像识别模型训练模块,用于利用所述目标训练样本图像集合对图像识别模型进行训练,得到训练好的图像识别模型。
一种图像识别装置,所述装置包括:
待识别图像获取模块,用于获取待识别图像;所述待识别图像中包含目标识别对象中的至少一个;
模型训练模块,用于上述任一种图像识别模型训练方法训练得到训练好的图像识别模型;
类别识别模块,用于将所述待识别图像输入至所述训练好的图像识别模型,以识别出所述目标识别对象的类别。
一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时实现如上述图像识别模型训练方法实施例中的各步骤以及上述图像识别方法实施例中的各步骤。
一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现如上述图像识别模型训练方法实施例中的各步骤以及上述图像识别方法实施例中的各步骤。
一种计算机程序产品,包括计算机程序,该计算机程序被处理器执行时实现如上述图像识别模型训练方法实施例中的各步骤以及上述图像识别方法实施例中的各步骤。
(三)有益效果
上述图像识别模型训练方法、装置、计算机设备和存储介质,包括从主节点获取全量训练样本图像集合和全局非独立同分布程度值;全量训练样本图像集合中包括多种识别对象;确定本地的目标识别对象,并计算全量训练样本图像集合中每种识别对象的初始数量;基于预设的数据构造规则,根据全局非独立同分布程度值和每种识别对象的初始数量,计算得到每种目标识别对象的目标数量;根据每种目标识别对象的目标数量,构造目标训练样本图像集合;利用目标训练样本图像集合对图像识别模型进行训练,得到训练好的图像识别模型。本公开能够将任意一个用于分类任务的整体数据集(即上述主节点中的数据集)改造成满足一定全局非独立同分布程度的分布式目标训练样本集合。该分布式样本集合能够模拟出现实分布式机器学习场景中不同环境下各边缘端或终端设备采集到的异构数据集。基于上述构思,本公开能够考虑到实际场景中隐私保护和数据安全的需求,不需要收集各个场景下的历史实际数据,可根据实际情况模拟出满足相应的全局非独立同分布程度值的训练样本进行训练,在兼顾隐私安全的同时,提高训练准确率,提高模型适应性。
进一步地,目前的分布式数据集构造方法通常仅限于按照类别划分整体数据集,即不同设备拥有的目标类别不同,这类方法生成的分布式数据集并未包含实际场景中的其它非独立同分布情况,例如各设备收集到的数据目标类别都相同,但每个类别的数量不同。而本公开提出的数据集构造方法除了包含上述传统方法生成的不同设备包含不同类别数据的情况,还能够更加全面的涵盖各种不同的非独立同分布情况。因此使用这种方法构造的数据集对图像识别模型进行训练,能够更好地模拟出实际场景中非独立同分布数据集训练上述模型的效果。这样做的好处是:一方面不需要企业或组织提供真实场景下的分布式数据集,降低泄露隐私的风险;另一方面,通过本公开提供的数据集构造方法可以生成满足各种不同非独立同分布情况的分布式数据集,在此基础上训练目标识别算法模型,能够更加全面地了解不同偏态情况下的数据集对分布式训练获得的模型效果的影响,从而起到指导作用,根据上述模型预测结果,更准确地对实际场景中的模型进行调参或训练方案选择,提高模型训练效率,降低试错成本。
附图说明
图1为一个实施例中图像识别模型训练方法的应用环境图;
图2为一个实施例中图像识别模型训练方法的流程示意图;
图3为一个实施例中图像识别方法的流程示意图;
图4为一个实施例中图像识别模型训练装置的结构框图;
图5为一个实施例中图像识别装置的结构框图;
图6为一个实施例中计算机设备的内部结构图。
具体实施方式
为了使本公开的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本公开进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本公开,并不用于限定本公开。
本公开提供的图像识别模型训练方法,可以应用于如图1所示的应用环境中。其中,从节点101通过网络与主节点102进行通信。其中,从节点101用于获取待识别图像并对图像中的对象进行识别,从节点可以是位于不同场景下的从节点各种摄像装置或其他传感装置,主节点102可以用独立的服务器或者是多个服务器组成的服务器集群来实现。
在一个实施例中,如图2所示,提供了一种图像识别模型训练方法,以该方法应用于图1中的从节点101为例进行说明,包括以下步骤:
步骤S201,从主节点获取全量训练样本图像集合和全局非独立同分布程度值;全量训练样本图像集合中包括多种识别对象。
具体地,全量训练样本图像集合是指包含所有数据样本的集合,该全量训练样本图像集合中的每个识别对象都被标注了标签,例如对于其中的自行车、行人、汽车都被标注了对应的标签。
其中,非独立同分布程度值是指满足非独立同分布的程度,本公开中的非独立同分布是指样本的数据分布属性相互独立但分布不同,其中,分布不同体现在同一系统中不同节点上拥有的同一类别样本的数据量有差异,例如道路摄像识别体系中,公路摄像识别节点和社区摄像识别节点采集到的样本图像中,总共有3个类别:行人、非机动车和机动车。显然,公路摄像识别节点采集到的机动车图像数量远远高于其他两个类别的样本数量,而社区摄像识别节点获得的行人和非机动车样本图像数量远远多于机动车类别的数量。在类似于上述场景 的检测节点中,获得的数据样本为高度偏态数据。
具体地,主节点确定全局非独立同分布程度值R,该数值R是用户根据实际需要灵活设置的,R∈[0,1]。整个系统包含一个主节点和多个从节点,整个系统中只有一个全局非独立同分布程度值R,表示一个系统中所有节点上的数据分布之间满足全局非独立同分布程度值R。
步骤S202,确定本地的目标识别对象,并计算上述全量训练样本图像集合中每种识别对象的初始数量。
具体地,从节点101确定本地的目标识别对象。目标识别对象也是由用户根据实际需要灵活设置的,例如公路上的摄像装置需要设置目标识别对象为机动车和行人。从节点101根据全量训练样本图像集合中每种识别对象的标签计算每种目标识别对象的初始数量n i(其中,i表示第i种类别)。
步骤S203,基于预设的数据构造规则,根据全局非独立同分布程度值和每种识别对象的初始数量,计算得到每种目标识别对象的目标数量。
其中,预设的数据构造规则是指如何构造全局非独立同分布程度值为R的数据集的规则。该规则如下:
以某一个拥有主类识别对象为j 1,j 2的图像识别节点
Figure PCTCN2022128994-appb-000002
为例,对其进行数据分配(其它图像识别节点的数据分配方式相同),具体过程如下:
(1)从节点101确定本地的目标识别对象的主类类别为j 1和j 2,并从主节点上获取全局非独立同分布程度值为R。计算全量训练样本 图像集中类别j 1和类别j 2对应的初始数量为
Figure PCTCN2022128994-appb-000003
Figure PCTCN2022128994-appb-000004
将类别标签分别为j 1和j 2
Figure PCTCN2022128994-appb-000005
Figure PCTCN2022128994-appb-000006
个数据同时分配给图像识别节点
Figure PCTCN2022128994-appb-000007
其中,
Figure PCTCN2022128994-appb-000008
为类别j 1的初始数量;
Figure PCTCN2022128994-appb-000009
为类别j 2的初始数量;R为全局非独立同分布程度值;此时,n j1R和
Figure PCTCN2022128994-appb-000010
是基础目标数量num_basic_part(i)。
(2)在此基础上,对于每一个标签类别i=1,2,...d(d为该节点中目标识别对象的类别总数,例如行人、非机动车和机动车共3种目标识别对象,则d=3),将
Figure PCTCN2022128994-appb-000011
个数据分配给
Figure PCTCN2022128994-appb-000012
(其中n i为第i种类别的识别对象对应的初始数量,
Figure PCTCN2022128994-appb-000013
分别为主类识别对象j 1、j 2在全量训练样本图像集合中的数量占比),则使用上述数据分配规则生成的数据集的集合,其全局非独立同分布程度的度量值是R。上述预设数据构造规则用公式A表达如下:
Figure PCTCN2022128994-appb-000014
其中,当i为主类识别对象(例如主类识别对象j 1或j 2)时,每种主类识别对象的数量由两部分相加构成,第一部分为基础目标数量
Figure PCTCN2022128994-appb-000015
第二部分为附加目标数量
Figure PCTCN2022128994-appb-000016
Figure PCTCN2022128994-appb-000017
是前述公式total_num_label中的第j 1种或第j 2种目标识别对象的目标数量中的第二个组成部分:剩余目标数量num_label的值。当i为次类识别对象时,则每种次类识别对象的数量为附加目标数量
Figure PCTCN2022128994-appb-000018
上述规则的证明过程如下:
根据上述数据构造规则,对于任意从节点,即图像识别节点
Figure PCTCN2022128994-appb-000019
其所拥有的每一个类别的目标识别对象对应的数据量为:
Figure PCTCN2022128994-appb-000020
根据式(A.1),图像识别节点
Figure PCTCN2022128994-appb-000021
上的总数据量为:
Figure PCTCN2022128994-appb-000022
因为
Figure PCTCN2022128994-appb-000023
Figure PCTCN2022128994-appb-000024
因此式(A.2)可以变为:
Figure PCTCN2022128994-appb-000025
基于式(A.1)和式(A.3)可知,主类j 1在主类为j 1、j 2的图像识别节点
Figure PCTCN2022128994-appb-000026
上的总样本中的数量占比
Figure PCTCN2022128994-appb-000027
为:
Figure PCTCN2022128994-appb-000028
因为
Figure PCTCN2022128994-appb-000029
式(A.4)可变为:
Figure PCTCN2022128994-appb-000030
同理,对于主类j 2有:
Figure PCTCN2022128994-appb-000031
对于标签类别i∈{1,2,…,d}/{j 1,j 2},有
Figure PCTCN2022128994-appb-000032
因此,结合式(A.6)、式(A.7)和式(A.8),有
Figure PCTCN2022128994-appb-000033
当两个图像识别节点拥有的两个主类都不相同时,假设图像识别节点
Figure PCTCN2022128994-appb-000034
的主类为j 11和j 12,图像识别节点
Figure PCTCN2022128994-appb-000035
的主类为j 21和j 22。考虑到需要计算以上两个节点上的数据分布差异,因此令节点
Figure PCTCN2022128994-appb-000036
和节点
Figure PCTCN2022128994-appb-000037
上d个类别的概率分布分别为
Figure PCTCN2022128994-appb-000038
Figure PCTCN2022128994-appb-000039
其中
Figure PCTCN2022128994-appb-000040
根据式(A.9),有
Figure PCTCN2022128994-appb-000041
式(A.10)中||·|| 1表示L 1范式。因此,对于任意两个拥有不同两个主类的节点,他们之间的数据概率分布之差的L 1范式的值均为2R。根据R值计算公式:
Figure PCTCN2022128994-appb-000042
其中K是系统中工作节点(设备)数量,Q k是第k个节点上拥有 的数据的概率分布,基于式(A.10),有:
Figure PCTCN2022128994-appb-000043
因此按照上述步骤生成的数据集,其非独立同分布程度为R。定理得证。
值得注意的是,由上述证明过程可知,若需生成非独立同分布程度为R的数据集,按照上述步骤要求每个从节点上的主类不重叠,但并不要求主类数目相同。
步骤S204,根据每种目标识别对象的目标数量,构造目标训练样本图像集合。
具体地,根据上述主类和次类的目标数量,从图像数据库中选取相应类别的图像,形成目标训练样本图像集合,并分配给本地节点;
步骤S205,利用目标训练样本图像集合对图像识别模型进行训练,得到训练好的图像识别模型。
具体地,利用上述重新构造的目标训练样本图像集合对相应的图像识别节点上所使用的模型进行训练,得到训练好的图像识别模型。其中图像识别模型可以是VGG(VGG是用于图像分类和定位问题的深度神经网络,起源于Simonyan和Zisserman的文章Very Deep Convolutional Networks for Large Scale Image Recognition,其中VGG是两位作者所在的牛津大学视觉几何组Visual Geometry Group的缩写)、ResNet或MobileNet等深度神经网络模型。
上述实施例,通过从主节点获取全量训练样本图像集合和全局非独立同分布程度值;全量训练样本图像集合中包括多种识别对象;确定本地的目标识别对象,并计算全量训练样本图像集合中每种识别对象的初始数量;基于预设的数据构造规则,根据全局非独立同分布程度值和每种识别对象的初始数量,计算得到每种目标识别对象的目标数量;根据每种目标识别对象的目标数量,构造目标训练样本图像集合;利 用目标训练样本图像集合对图像识别模型进行训练,得到训练好的图像识别模型。本实施例能够将任意一个用于分类任务的整体数据集(即上述主节点中的数据集)改造成满足一定非独立同分布程度的分布式目标训练样本集合。该分布式样本集合能够模拟出现实分布式机器学习场景中不同环境下各边缘端或终端设备采集到的异构数据集。基于上述构思,本实施例能够考虑到实际场景中隐私保护和数据安全的需求,不需要收集各个场景下的历史实际数据,可根据实际情况模拟出满足相应的全局非独立同分布程度值的训练样本进行训练,在兼顾隐私安全的同时,提高训练准确率,提高模型适应性。
在一实施例中,本地的目标识别对象包括主类识别对象和次类识别对象;上述步骤S203,包括:确定本地的主类识别对象和次类识别对象;根据预设的数据构造规则,基于所述全局非独立同分布程度值,分别计算主类识别对象的目标数量和次类识别对象的目标数量;其中,主类识别对象在本地对象类别中数量占比最大。
具体地,首先,确定整个分布式训练系统中参与图像识别任务的从节点数量num_party,num_party∈[2,K]和全局非独立同分布程度值R,其中,R∈[0,1];确定各个从节点的主类识别对象和次类识别对象;计算全量训练样本图像集合中每一种识别对象的初始数量n i和相应的概率分布q i,i=1,2,……d,d为该全量训练样本图像集合中识别对象的种类数量,例如道路摄像识别体系中,总共有3个类别:行人、非机动车和机动车,则d=3;确定第k个从节点上主类识别对象的种类数num_major(k):
Figure PCTCN2022128994-appb-000044
其中,num_major(k)是第k个从节点拥有的主类识别对象的种类数;d为该全量训练样本图像集合中识别对象的种类数;num_party为参与图像识别任务的节点数量。
对于任意从节点k,根据全局非独立同分布程度值R和全量训练样本图像集合中每一种识别对象的初始数量n i,计算第i个主类识别对象的基础目标数量num_basic_part(i):
num_basic_part(i)=R·n i
其中,R为全局非独立同分布程度值;n i为全量训练样本图像集合中第i种识别对象的初始数量。
计算每个类别的识别对象i对应的剩余样本数num_label_remain(i):
num_label_remain(i)=n i-num_basic_part(i)=n i-R·n i
接着,为本地节点k分配各类目标识别对象i的附加目标数量num_label(i):
Figure PCTCN2022128994-appb-000045
其中,q i表示全量训练样本图像集合中第i种识别对象的概率分布;num_major(k)是第k个从节点拥有的主类识别对象的种类数;label_major(k,j)表示第k个从节点的第j个主类识别对象; num_label_remain(label_major(k,j))表示第k个从节点的第j个主类识别对象的剩余样本数;k∈[1,K),其中k表示第k个从节点;K表示总共有K个节点。
综上,第k个图像识别节点上的第i个类别的目标数量为:
Figure PCTCN2022128994-appb-000046
上述实施例,通过预设的数据构造规则为从节点k分配不同目标识别对象对应的目标数量,为后续构造新的目标训练样本图像集合提供铺垫。
在一实施例中,上述确定本地的主类识别对象和次类识别对象,包括:
接收用户预设命令,按照用户预设命令生成主类识别对象和次类识别对象。
具体地,用户可为每一个从节点指定主类识别对象和次类识别对象。
可选地,上述主类识别对象和次类识别对象也可以由本地节点随机生成。
上述实施例,通过用户指令灵活设置主类识别对象和次类识别对象,能够提高系统的适应性。
在一实施例中,根据预设的数据构造规则,分别计算每种主类识别对象的目标数量和每种次类识别对象的目标数量,包括:根据所述预设的数据构造规则,计算每种目标识别对象的目标数量为:
Figure PCTCN2022128994-appb-000047
其中,i表示第i种目标识别对象,total_num_label(i)表示第i种目标识别对象的目标数量。当第i种目标识别对象是主类识别对象时,其 目标数量由两部分组成:num_basic_part(i)和num_label(i),其中前者表示第i种目标识别对象的基础目标数量,后者表示第i种目标识别对象的附加目标数量;当第i种目标识别对象是次类识别对象时,其目标数量仅由附加目标数量num_label(i)构成。
上述实施例,通过预设的数据构造规则为图像识别节点k上的不同样本类别分配目标数量,为后续构造新的目标训练样本图像集合提供铺垫。
在一实施例中,如图3所示,本公开还提供了一种图像识别方法,该方法包括:
步骤S301,获取待识别图像;该待识别图像中包含目标识别对象中的至少一个。
具体地,从节点101获取待识别图像;待识别图像中包含有目标对象,例如目标识别对象为行人。
步骤S302,利用上述图像识别模型训练方法训练得到训练好的图像识别模型。
具体地,利用上述图像识别模型训练方法训练得到训练好的图像识别模型。
步骤S303,将上述待识别图像输入至该训练好的图像识别模型,以识别出该目标识别对象的类别。
具体地,将上述待识别图像输入至该训练好的图像识别模型,以识别出目标识别对象的类别,例如识别出该待识别图像中包含有行人,则将该对象的标签用行人标注。
上述实施例,通过上述图像识别模型训练方法训练得到训练好的图像识别模型进行目标对象识别,能够提高目标对象识别的准确性。应该理解的是,虽然图2-3的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图2-3中的至少一部分步骤可以包括多个步骤或者多个阶段,这些步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤中的步骤或者阶段的至少一部分轮流或者交替地执行。
在一个实施例中,如图4所示,提供了一种图像识别模型训练装置 400,包括:训练样本获取模块401、目标识别对象确定模块402、目标数量计算模块403、目标训练样本构造模块404和图像识别模型训练模块405,其中:
训练样本获取模块401,用于从主节点获取全量训练样本图像集合以及全局非独立同分布程度值;所述全量训练样本图像集合中包括多种识别对象;
目标识别对象确定模块402,用于确定本地的目标识别对象,并计算所述全量训练样本图像集合中每种所述识别对象的初始数量;
目标数量计算模块403,用于基于预设的数据构造规则,根据所述全局非独立同分布程度值和所述每种识别对象的初始数量,计算得到每种目标识别对象的目标数量;
目标训练样本构造模块404,用于根据所述每种目标识别对象的目标数量,构造目标训练样本图像集合;
图像识别模型训练模块405,用于利用所述目标训练样本图像集合对图像识别模型进行训练,得到训练好的图像识别模型。
在一实施例中,所述目标识别对象包括主类识别对象和次类识别对象;上述目标数量计算模块403,进一步用于,确定本地的所述主类识别对象和所述次类识别对象;根据预设的数据构造规则,基于所述全局非独立同分布程度值,分别计算所述主类识别对象的目标数量和所述次类识别对象的目标数量;其中,所述主类识别对象在本地对象类别中数量占比最大。
在一实施例中,上述目标数量计算模块403,进一步用于:接收用户预设命令,按照用户预设命令生成所述主类识别对象和所述次类识别对象。
在一实施例中,上述目标数量计算模块403,进一步用于,根据所述预设的数据构造规则,计算每种目标识别对象的目标数量为:
Figure PCTCN2022128994-appb-000048
其中,i表示第i种目标识别对象,total_num_label(i)表示第i种 目标识别对象的目标数量;当第i种目标识别对象是主类识别对象时,其目标数量由两部分组成:num_basic_part(i)和num_label(i),其中num_basic_part(i)表示第i种目标识别对象的基础目标数量,num_label(i)表示第i种目标识别对象的附加目标数量;当第i种目标识别对象是次类识别对象时,其目标数量仅由附加目标数量num_label(i)构成。
在一实施例中,所述全局非独立同分布程度值的取值范围为R∈[0,1],其中,R为全局非独立同分布程度值。
在一个实施例中,如图5所示,提供了一种图像识别装置500,包括:待识别图像获取模块501、模型训练模块502和类别识别模块503,其中:
待识别图像获取模块501,用于获取待识别图像;所述待识别图像中包含所述目标识别对象中的至少一个;
模型训练模块502,利用如上述图像识别模型训练方法实施例中的方法训练得到训练好的图像识别模型;
类别识别模块503,用于将所述待识别图像输入至所述训练好的图像识别模型,以检测出所述目标识别对象的类别。
关于图像识别模型训练装置和图像识别装置的具体限定可以参见上文中对于图像识别模型训练方法和图像识别方法的限定,在此不再赘述。上述图像识别模型训练装置和图像识别装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
在一个实施例中,提供了一种计算机设备,该计算机设备可以是 服务器,其内部结构图可以如图6所示。该计算机设备包括通过系统总线连接的处理器、存储器和网络接口。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机程序和数据库。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的数据库用于存储训练样本数据以及识别结果。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现一种图像识别模型训练方法和图像识别方法。
本领域技术人员可以理解,图6中示出的结构,仅仅是与本公开方案相关的部分结构的框图,并不构成对本公开方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
在一个实施例中,提供了一种计算机设备,包括存储器和处理器,存储器中存储有计算机程序,该处理器执行计算机程序时实现如上述图像识别模型训练方法实施例以及图像识别方法实施例中的各步骤。
在一个实施例中,提供了一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现如上述图像识别模型训练方法实施例以及图像识别方法实施例中的各步骤。
在一个实施例中,提供了一种计算机程序产品,包括计算机程序,该计算机程序被处理器执行时实现上述各方法实施例中的步骤。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本公开所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和易失性存储器中的至少一种。非易失性存储器可包括只读存储器(Read-Only Memory,ROM)、磁带、软盘、闪存或光存储器等。易失性存储器可包括随机存取存储器(Random Access Memory,RAM)或外部高速缓冲存储器。作为说明而非局限,RAM可以是多种形式,比如静态随机存取存储器(Static Random Access Memory,SRAM)或动态随机存取存储器(Dynamic Random Access Memory,DRAM)等。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本公开的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本公开构思的前提下,还可以做出若干变形和改进,这些都属于本公开的保护范围。因此,本公开专利的保护范围应以所附权利要求为准。
工业实用性
本公开提供的图像识别模型训练方法,能够将任意一个用于分类任务的整体数据集改造成满足一定全局非独立同分布程度的分布式目标训练样本集合,使得能够模拟出现实分布式机器学习场景中不同环境下各边缘端或终端设备采集到的异构数据集,因此不需要收集各个场景下的历史实际数据,可根据实际情况模拟出满足相应的全局非独立同分布程度值的训练样本进行训练,在兼顾隐私安全的同时,提高训练准确率和模型适应性,具有很强的工业实用性。

Claims (10)

  1. 一种图像识别模型训练方法,其特征在于,所述方法包括:
    从主节点获取全量训练样本图像集合以及全局非独立同分布程度值;所述全量训练样本图像集合中包括多种识别对象;
    确定本地的目标识别对象,并计算所述全量训练样本图像集合中每种所述识别对象的初始数量;
    基于预设的数据构造规则,根据所述全局非独立同分布程度值和每种识别对象的初始数量,计算得到每种目标识别对象的目标数量;
    根据所述每种目标识别对象的目标数量,构造目标训练样本图像集合;
    利用所述目标训练样本图像集合对图像识别模型进行训练,得到训练好的图像识别模型。
  2. 根据权利要求1所述的方法,其特征在于,所述目标识别对象包括主类识别对象和次类识别对象;所述基于预设的数据构造规则,根据所述全局非独立同分布程度值和每种识别对象的初始数量,计算得到每种目标识别对象的目标数量,包括:
    确定本地的所述主类识别对象和所述次类识别对象;
    根据预设的数据构造规则,基于所述全局非独立同分布程度值,分别计算所述主类识别对象的目标数量和所述次类识别对象的目标数量;其中,所述主类识别对象在本地对象类别中数量占比最大。
  3. 根据权利要求2所述的方法,其特征在于,所述确定本地的所述主类识别对象和所述次类识别对象,包括:
    接收用户预设命令,按照用户预设命令生成所述主类识别对象和所述次类识别对象。
  4. 根据权利要求2所述的方法,其特征在于,所述根据预设的数据构造规则,基于所述全局非独立同分布程度值,分别计算所述主类识别对象的目标数量和所述次类识别对象的目标数量,包括:
    根据所述预设的数据构造规则,计算每种目标识别对象的目标数量为:
    Figure PCTCN2022128994-appb-100001
    其中,i表示第i种目标识别对象,total_num_label(i)表示第i种目标识别对象的目标数量;当第i种目标识别对象是主类识别对象时,其目标数量由两部分组成:num_basic_part(i)和num_label(i),其中num_basic_part(i)表示第i种目标识别对象的基础目标数量,num_label(i)表示第i种目标识别对象的附加目标数量;当第i种目标识别对象是次类识别对象时,其目标数量仅由附加目标数量num_label(i)构成。
  5. 根据权利要求1至4任意一项所述的方法,其特征在于,所述全局非独立同分布程度值的取值范围为R∈[0,1],其中,R为全局非独立同分布程度值。
  6. 一种图像识别方法,其特征在于,所述方法包括:
    获取待识别图像;所述待识别图像中包含目标识别对象中的至少一个;
    利用如权利要求1至6所述的图像识别模型训练方法训练得到训练好的图像识别模型;
    将所述待识别图像输入至所述训练好的图像识别模型,以识别出所述目标识别对象的类别。
  7. 一种图像识别模型训练装置,其特征在于,所述装置包括:
    训练样本获取模块,用于从主节点获取全量训练样本图像集合以及全局非独立同分布程度值;所述全量训练样本图像集合中包括多种识别对象;
    目标识别对象确定模块,用于确定本地的目标识别对象,并计算所述全量训练样本图像集合中每种所述识别对象的初始数量;
    目标数量计算模块,用于基于预设的数据构造规则,根据所述全局非独立同分布程度值和每种识别对象的初始数量,计算得到每种目标识别对象的目标数量;
    目标训练样本构造模块,用于根据所述每种目标识别对象的目标数量,构造目标训练样本图像集合;
    图像识别模型训练模块,用于利用所述目标训练样本图像集合对图像识别模型进行训练,得到训练好的图像识别模型。
  8. 一种图像识别装置,其特征在于,所述装置包括:
    待识别图像获取模块,用于获取待识别图像;所述待识别图像中包含目标识别对象中的至少一个;
    模型训练模块,用于利用如权利要求1至6所述的图像识别模型训练方法训练得到训练好的图像识别模型;
    类别识别模块,用于将所述待识别图像输入至所述训练好的图像识别模型,以识别出所述目标识别对象的类别。
  9. 一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,其特征在于,所述处理器执行所述计算机程序时实现权利要求1至6中任一项所述的方法的步骤。
  10. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1至6中任一项所述的方法的步骤。
PCT/CN2022/128994 2022-07-28 2022-11-01 图像识别模型训练方法、装置、计算机设备和存储介质 WO2024021350A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210896895.6 2022-07-28
CN202210896895.6A CN115131631A (zh) 2022-07-28 2022-07-28 图像识别模型训练方法、装置、计算机设备和存储介质

Publications (1)

Publication Number Publication Date
WO2024021350A1 true WO2024021350A1 (zh) 2024-02-01

Family

ID=83386534

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/128994 WO2024021350A1 (zh) 2022-07-28 2022-11-01 图像识别模型训练方法、装置、计算机设备和存储介质

Country Status (2)

Country Link
CN (1) CN115131631A (zh)
WO (1) WO2024021350A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115131631A (zh) * 2022-07-28 2022-09-30 广州广电运通金融电子股份有限公司 图像识别模型训练方法、装置、计算机设备和存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110162627A (zh) * 2019-04-28 2019-08-23 平安科技(深圳)有限公司 数据增量方法、装置、计算机设备及存储介质
WO2020220220A1 (zh) * 2019-04-29 2020-11-05 西门子(中国)有限公司 分类模型训练方法、装置和计算机可读介质
US20210056456A1 (en) * 2019-08-19 2021-02-25 International Business Machines Corporation Tree-based associative data augmentation
US20210073671A1 (en) * 2019-09-09 2021-03-11 Adobe, Inc. Generating combined feature embedding for minority class upsampling in training machine learning models with imbalanced samples
US11182691B1 (en) * 2014-08-14 2021-11-23 Amazon Technologies, Inc. Category-based sampling of machine learning data
CN115131631A (zh) * 2022-07-28 2022-09-30 广州广电运通金融电子股份有限公司 图像识别模型训练方法、装置、计算机设备和存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11182691B1 (en) * 2014-08-14 2021-11-23 Amazon Technologies, Inc. Category-based sampling of machine learning data
CN110162627A (zh) * 2019-04-28 2019-08-23 平安科技(深圳)有限公司 数据增量方法、装置、计算机设备及存储介质
WO2020220220A1 (zh) * 2019-04-29 2020-11-05 西门子(中国)有限公司 分类模型训练方法、装置和计算机可读介质
US20210056456A1 (en) * 2019-08-19 2021-02-25 International Business Machines Corporation Tree-based associative data augmentation
US20210073671A1 (en) * 2019-09-09 2021-03-11 Adobe, Inc. Generating combined feature embedding for minority class upsampling in training machine learning models with imbalanced samples
CN115131631A (zh) * 2022-07-28 2022-09-30 广州广电运通金融电子股份有限公司 图像识别模型训练方法、装置、计算机设备和存储介质

Also Published As

Publication number Publication date
CN115131631A (zh) 2022-09-30

Similar Documents

Publication Publication Date Title
CN109978893B (zh) 图像语义分割网络的训练方法、装置、设备及存储介质
WO2023138300A1 (zh) 目标检测方法及应用其的移动目标跟踪方法
CN109087510B (zh) 交通监测方法及装置
US11797725B2 (en) Intelligent imagery
US20210192227A1 (en) Method and apparatus for detecting parking space usage condition, electronic device, and storage medium
Dai et al. Residential building facade segmentation in the urban environment
CN109493119B (zh) 一种基于poi数据的城市商业中心识别方法及系统
CN115797736B (zh) 目标检测模型的训练和目标检测方法、装置、设备和介质
Despotovic et al. Prediction and analysis of heating energy demand for detached houses by computer vision
WO2024021350A1 (zh) 图像识别模型训练方法、装置、计算机设备和存储介质
CN112613569A (zh) 图像识别方法、图像分类模型的训练方法及装置
Zhu et al. Spatial and visual data fusion for capturing, retrieval, and modeling of as-built building geometry and features
CN110909656B (zh) 一种雷达与摄像机融合的行人检测方法和系统
San Blas et al. A Platform for Swimming Pool Detection and Legal Verification Using a Multi-Agent System and Remote Image Sensing.
Kalfarisi et al. Detecting and geolocating city-scale soft-story buildings by deep machine learning for urban seismic resilience
CN115331199A (zh) 障碍物的检测方法、装置、电子设备及存储介质
Wang et al. Instance segmentation of soft‐story buildings from street‐view images with semiautomatic annotation
Yang et al. YOLOX with CBAM for insulator detection in transmission lines
Daudt et al. Learning to understand earth observation images with weak and unreliable ground truth
CN113514053B (zh) 生成样本图像对的方法、装置和更新高精地图的方法
Saadeldin et al. Real-time vehicle counting using custom YOLOv8n and DeepSORT for resource-limited edge devices
Zhai et al. Latent knowledge reasoning incorporated for multi-fitting decoupling detection on electric transmission line
Doménech-Asensi et al. On the use of Bayesian networks for real-time urban traffic measurements: a case study with low-cost devices
Tarasov et al. The developing of targets tracking complex
CN114639036B (zh) 确定交通拥堵等级的方法及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22952788

Country of ref document: EP

Kind code of ref document: A1