WO2021159633A1 - Method and system for training image recognition model, and image recognition method - Google Patents

Method and system for training image recognition model, and image recognition method Download PDF

Info

Publication number
WO2021159633A1
WO2021159633A1 PCT/CN2020/093033 CN2020093033W WO2021159633A1 WO 2021159633 A1 WO2021159633 A1 WO 2021159633A1 CN 2020093033 W CN2020093033 W CN 2020093033W WO 2021159633 A1 WO2021159633 A1 WO 2021159633A1
Authority
WO
WIPO (PCT)
Prior art keywords
image recognition
recognition model
yuv
trained
input branch
Prior art date
Application number
PCT/CN2020/093033
Other languages
French (fr)
Chinese (zh)
Inventor
朱禹萌
陆进
陈斌
宋晨
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021159633A1 publication Critical patent/WO2021159633A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/60Extraction of image or video features relating to illumination properties, e.g. using a reflectance or lighting model
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Definitions

  • the embodiments of the present application relate to the field of artificial intelligence technology, in particular to an image recognition model training method and system, and an image recognition method.
  • the color space used by images in actual production equipment varies according to their equipment advantages.
  • the YUV format used by video transmission equipment to save bandwidth the corresponding image recognition model is the YUV image recognition model, or with an infrared probe RGB+IR format, the corresponding image recognition model is the RGB image recognition model.
  • the RGB image recognition model cannot recognize the image in the YUV format. It is necessary to rebuild a YUV image recognition model, and then use the training data in the YUV data format to analyze the YUV image The recognition model is trained. In order to improve the accuracy of the YUV image recognition model, a large amount of training data needs to be manually annotated, which is costly.
  • the knowledge distillation method uses the prior knowledge contained in the high computing power and high precision model to teach the deep learning network of the small model, which can realize the compression and speed of the network model.
  • the traditional knowledge distillation method is only to reduce the network size and computing requirements, but it is still limited to the same form of training data.
  • the RGB image recognition model can only distill to obtain a smaller structure RGB image recognition model.
  • Less than the YUV model brings application limitations to the model distillation.
  • the embodiments of the present application provide an image recognition model training method, system, computer equipment, computer readable storage medium, and image recognition method, which are used to solve the problem of cumbersome steps and high cost of building a new image recognition model.
  • An image recognition model training method including:
  • RGB image recognition model Training an RGB image recognition model using the training set and the verification set, and the RGB image recognition model is used to train a YUV image recognition model
  • the YUV image recognition model to be trained includes an input layer, a prediction layer and an output layer, and the input layer includes a luminance input branch and a chrominance input branch;
  • RGB image recognition model Use the trained RGB image recognition model to train the luminance input branch, chrominance input branch and prediction layer of the YUV image recognition model to be trained using the distillation method to obtain the YUV image recognition model, and the YUV image recognition model is used for recognition Image in YUV data format.
  • An image recognition model training system including:
  • Training set and validation set creation module used to create training set and validation set for image recognition based on RGB data format
  • the RGB image recognition model training module is used to train an RGB image recognition model using the training set and the verification set, and the RGB image recognition model is used to train a YUV image recognition model;
  • the YUV image recognition model building module to be trained is used to build the YUV image recognition model to be trained.
  • the YUV image recognition model to be trained includes an input layer, a prediction layer and an output layer.
  • the input layer includes a luminance input branch and a chrominance input branch ;
  • the YUV image recognition model training module is used to train the luminance input branch, chrominance input branch and prediction layer of the YUV image recognition model to be trained using the distillation method using the trained RGB image recognition model to obtain the YUV image recognition model.
  • the YUV image recognition model is used to recognize images in the YUV data format.
  • an embodiment of the present application further provides a computer device that includes a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor, and the processor executes The following steps are performed when the computer-readable instructions are:
  • the YUV image recognition model to be trained includes an input layer, a prediction layer and an output layer, and the input layer includes a luminance input branch and a chrominance input branch;
  • RGB image recognition model Use the trained RGB image recognition model to train the luminance input branch, chrominance input branch and prediction layer of the YUV image recognition model to be trained using the distillation method to obtain the YUV image recognition model, and the YUV image recognition model is used for recognition Image in YUV data format.
  • embodiments of the present application also provide a computer-readable storage medium, in which computer-readable instructions are stored, and the computer-readable instructions can be executed by at least one processor to Make the at least one processor execute the following steps:
  • the YUV image recognition model to be trained includes an input layer, a prediction layer and an output layer, and the input layer includes a luminance input branch and a chrominance input branch;
  • RGB image recognition model Use the trained RGB image recognition model to train the luminance input branch, chrominance input branch and prediction layer of the YUV image recognition model to be trained using the distillation method to obtain the YUV image recognition model, and the YUV image recognition model is used for recognition Image in YUV data format.
  • This application also provides an image recognition method, including the following steps:
  • the image recognition model training method, system, computer equipment, computer readable storage medium, and image recognition method provided in this application train the input layer and prediction layer of the YUV image recognition model through the distillation of the RGB image recognition model, which improves the performance of the YUV image recognition model.
  • the training efficiency reduces the training cost of the YUV image recognition model.
  • FIG. 1 is a flowchart of the steps of the image recognition model training method according to the first embodiment of the application
  • FIG. 2 is a schematic diagram of the input layer structure of an RGB image recognition model according to an embodiment of the application
  • Fig. 3 is an embodiment of the application using a trained RGB image recognition model to train the luminance input branch, chromaticity input branch and prediction layer of the YUV image recognition model to be trained using a distillation method to obtain a YUV image recognition model, the YUV image
  • FIG. 4 is a flowchart of steps for obtaining the overall target loss function of the YUV image recognition model to be trained according to the trained RGB image recognition model according to the embodiment of the application;
  • FIG. 5 is a flowchart of steps in which the overall target loss function is minimized to obtain the YUV image recognition model according to an embodiment of the application, and the overall target loss function is adjusted by a learning rate;
  • FIG. 6 is a schematic diagram of the program modules of the second embodiment of the image recognition model training system of this application.
  • FIG. 7 is a schematic diagram of the hardware structure of the third embodiment of the computer equipment of the image recognition model training system according to the application.
  • FIG. 8 is a flowchart of steps of an image recognition method according to an embodiment of this application.
  • Fig. 9 is a flow chart of the steps of outputting the recognition result of the image to be recognized in the YUV data format through the YUV image recognition model according to an embodiment of the application.
  • FIG. 1 shows a flowchart of steps of an image recognition model training method according to an embodiment of the present application. It can be understood that the flowchart in this method embodiment is not used to limit the order of execution of the steps.
  • the following is an exemplary description with computer equipment as the main body of execution, and the details are as follows:
  • an image recognition model training method includes:
  • creating a training set and a verification set for image recognition based on RGB data format refers to images in RGB data format that have been manually annotated, where the training set is used to train the RGB image recognition model, and the verification set is used To verify the recognition accuracy of the trained RGB image recognition model.
  • S200 Use the training set and the verification set to train an RGB image recognition model, where the RGB image recognition model is used to train a YUV image recognition model;
  • the network structure of the RGB image recognition model can be divided into an input layer and a prediction layer, as shown in Figure 2:
  • the input layer is a pre-trained classification model ResNet50, and the feature extraction layer has 5 groups of convolutional blocks.
  • a vector convolution operation the convolution kernel is 7x7, the number of channels is 64, and 2 times downsampling;
  • the second set of conv2 the second vector convolution operation includes 1 layer of 3x3 maximum pooling layer and 3 sets of residuals Module, the number of channels is expanded by 4 times; and so on, each group of vector convolution operations are down-sampled by 2 times, and the number of channels is expanded by 2 times.
  • the prediction layer uses the extracted picture features for label prediction.
  • the prediction layer is composed of a 1x1 convolutional layer with C channels and an average pooling layer.
  • the YUV image recognition model to be trained includes an input layer, a prediction layer and an output layer.
  • the input layer includes a luminance input branch and a chrominance input branch.
  • the input layer is used to extract the picture features of the picture to be recognized, and the input layer includes a luminance input branch and a chrominance input branch, and is used to extract the luminance characteristic and chrominance characteristic of the YUV image.
  • the prediction layer uses the extracted brightness features and chroma features to perform label prediction.
  • the output layer is the classification category used to output the image.
  • S400 Use the trained RGB image recognition model to train the luminance input branch, the chroma input branch and the prediction layer of the YUV image recognition model to be trained using a distillation method to obtain a YUV image recognition model.
  • the YUV image recognition model uses To identify images in YUV data format.
  • Distillation refers to migrating the predictive ability of a well-trained complex model to a model with a simpler structure, so as to achieve the purpose of model compression.
  • the complex model is the distilled model
  • the simple model is the distillation model.
  • the image recognition capability of the RGB image recognition model is transferred to the YUV image recognition model.
  • the distilled model has excellent performance and high accuracy, but Compared with the distillation model, the structure of the distillation model is complex, the parameter weight is more, and the calculation speed is slower.
  • the distillation model is a faster calculation speed and is suitable for deployment to a single neural network that requires high real-time performance. Compared with the distilled model, the distillation model has greater computational throughput, simpler network structure and fewer model parameters. .
  • the RGB image recognition model is used as a distilled model, and its advantage is that a large public pre-training network and a considerable amount of RGB training data can be used to obtain model parameters with higher accuracy.
  • step S400 further includes:
  • the RGB image recognition model predicts C categories, and the target loss function of category c is
  • y c refers to the value predicted by the RGB image recognition model
  • c refers to the C categories predicted by the RGB image recognition model, denoted as [x 1 ,x 2 ,...,x c ,...,x C ]
  • L C hard refers to the objective loss function of category C when the temperature parameter T is not added
  • L hard is the overall objective function of the RGB image recognition model when the temperature parameter T is not added.
  • L hard that is, the model parameter with the smallest loss function value of the RGB image recognition model, through a large number of RGB images of a known label training set, so as to minimize the recognition error of the RGB image recognition model.
  • step S401 further includes:
  • the soft target refers to the output result of the distilled model using the prediction layer loss function with the temperature parameter T.
  • the error output will be enlarged and the correct classification will be reduced. That is to say, by adding the temperature parameter T, the training difficulty is artificially increased. Once T is reset to 1 , The classification result will be very close to the classification result of the RGB image recognition model.
  • the soft target is expressed as a formula:
  • the hard target refers to the target of normal network training with the temperature parameter set to 1.
  • q c is the soft target
  • c refers to the C categories predicted by the RGB image recognition model, denoted as [x 1 ,x 2 ,...,x c ,...,x C ]
  • T is the temperature parameter.
  • S4012 Obtain the overall target loss function of the YUV image recognition model to be trained according to the soft target of the RGB image recognition model.
  • the first objective loss function corresponds to the soft objective, and is a function that includes the temperature parameters learned by distillation.
  • y soft is the value predicted by the RGB image recognition model under the condition of temperature T.
  • the second objective loss function of the YUV image recognition model is
  • the second objective loss function corresponds to the hard objective, and is a loss function that does not include the temperature parameter learned by distillation.
  • L 1 is the first objective loss function
  • L 2 is the second objective loss function
  • L is the overall objective loss function
  • step S402 further includes:
  • the deep learning model contains a large number of learnable parameters
  • the training model is a process of continuously adjusting the parameters until the objective function value is the smallest.
  • the learning rate is an important indicator to measure the "step" of adjusting the parameters, that is, the training progress of the model can be controlled by adjusting the learning rate.
  • the learning rate is the control of the change of the model parameters, expressed by the formula:
  • the updated parameter current parameter-learning rate*gradient of loss function.
  • step S4021 further includes:
  • S4021A Adjust the learning rate of the luminance input branch, the chrominance input branch, and the prediction layer to the first learning rate, and perform preliminary training;
  • the first learning rate of the luminance input branch and the prediction layer is set to 0.01, and the chrominance input branch does not participate in training at this time, and the first learning rate is 0.
  • S4021B Adjust the learning rate of the luminance input branch, the chrominance input branch and the prediction layer to a second learning rate, and perform fine training;
  • the YUV image recognition model can already recognize the target, but the recognition accuracy is low due to the lack of chromaticity information.
  • the chromaticity input branch is added to supplement the model's ability.
  • the feature extraction of the luminance input branch has been completed in the first step, so the luminance input branch needs to be fixed, that is, the second learning rate of the luminance input branch is set to 0.
  • the second learning rate of the chroma input branch is set to 0.01
  • the learning rate is set to 0.001.
  • the chroma input branch and the prediction layer learn residual loss, which can quickly converge, reducing the learning difficulty and training time.
  • S4021C Adjust the learning rate of the luminance input branch, the chrominance input branch and the prediction layer to a third learning rate to obtain the YUV image recognition model.
  • distributed tuning can reduce the difficulty of model learning, but in the end, joint adjustments are needed to obtain the overall optimal solution.
  • Set the third learning rate of the luminance input branch, chrominance input branch and prediction layer to 0.0005 adjust the parameter values in small steps to obtain the best model parameters, and then obtain the YUV image recognition model.
  • the embodiment of the present application proposes a method for constructing a YUV image recognition model, which can use different types of data formats for transfer learning.
  • this application adjusts the input module of the model according to the characteristics of the input data format, and adds the brightness branch and the chroma branch; at the same time, it takes advantage of the high computing power performance of the RGB image recognition model, by adding the "soft target ", the distribution difference before learning different categories; in addition, after adjusting the model structure, the training process of the YUV image recognition model is refined, using a staged training step, first using the brightness component to complete the prediction target, and then using the chroma component to learn the residual The poor part reduces the difficulty of transfer learning and improves the accuracy of the model.
  • the embodiment of the application also provides an image recognition method, which can directly use the YUV image recognition model to recognize images with YUV, without converting the YUV image to an RGB image, and then use the YUV image recognition model to recognize, which improves YUV. Image recognition efficiency.
  • FIG. 6 shows a schematic diagram of program modules of the image recognition system of the present application.
  • the image recognition model training system 20 may include or be divided into one or more program modules, and the one or more program modules are stored in a storage medium and executed by one or more processors to This application is completed, and the above-mentioned image recognition method can be realized.
  • the program module referred to in the embodiments of the present application refers to a series of computer-readable instruction instruction segments capable of completing specific functions, and is more suitable for describing the execution process of the image recognition model training system 20 in the storage medium than the program itself. The following description will specifically introduce the functions of each program module in this embodiment:
  • Training set and validation set creation module 200 used to create training set and validation set for image recognition based on RGB data format;
  • RGB image recognition model training module 202 used to train an RGB image recognition model using the training set and the verification set, and the RGB image recognition model is used to train a YUV image recognition model;
  • To-be-trained YUV image recognition model construction module 204 used to construct a to-be-trained YUV image recognition model.
  • the to-be-trained YUV image recognition model includes an input layer, a prediction layer and an output layer.
  • the input layer includes a luminance input branch and a chrominance input Branch
  • YUV image recognition model training module 206 used to train the luminance input branch, chrominance input branch and prediction layer of the YUV image recognition model to be trained using the distillation method using the trained RGB image recognition model to obtain the YUV image recognition model.
  • the YUV image recognition model is used to recognize images in YUV data format.
  • YUV data format image training module 206 is also used for:
  • the input layer and the prediction layer of the YUV image recognition model to be trained are trained by the overall target loss function to obtain the YUV image recognition model.
  • YUV data format image training module 206 is also used for:
  • the overall target loss function of the YUV image recognition model to be trained is obtained.
  • YUV data format image training module 206 is also used for:
  • the overall target loss function is minimized to obtain the YUV image recognition model, and the overall target loss function is adjusted by a learning rate.
  • YUV data format image training module 206 is also used for:
  • the learning rate of the luminance input branch, the chrominance input branch and the prediction layer is adjusted to a third learning rate to obtain the YUV image recognition model.
  • the computer device 2 is a device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions.
  • the computer device 2 may be a rack server, a blade server, a tower server, or a cabinet server (including an independent server or a server cluster composed of multiple servers).
  • the computer device 2 at least includes, but is not limited to, a memory 21, a processor 22, a network interface 23, and an image recognition model training system 20 that can communicate with each other through a system bus. in:
  • the memory 21 includes at least one type of computer-readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory ( RAM), static random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disks, optical disks, etc.
  • the memory 21 may be an internal storage unit of the computer device 2, for example, a hard disk or a memory of the computer device 2.
  • the memory 21 may also be an external storage device of the computer device 2, such as a plug-in hard disk, a smart media card (SMC), and a secure digital (Secure Digital, SMC) equipped on the computer device 2. SD) card, flash card (Flash Card), etc.
  • the memory 21 may also include both the internal storage unit of the computer device 2 and its external storage device.
  • the memory 21 is generally used to store an operating system and various application software installed in the computer device 2, such as the program code of the image recognition model training system 20 described in the foregoing embodiment.
  • the memory 21 can also be used to temporarily store various types of data that have been output or will be output.
  • the processor 22 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips in some embodiments.
  • the processor 22 is generally used to control the overall operation of the computer device 2.
  • the processor 22 is used to run the program code or process data stored in the memory 21, for example, to run the image recognition model training system 20, so as to implement the image recognition model training method of the foregoing embodiment.
  • the network interface 23 may include a wireless network interface or a wired network interface, and the network interface 23 is generally used to establish a communication connection between the computer device 2 and other electronic devices.
  • the network interface 23 is used to connect the computer device 2 with an external terminal through a network, and establish a data transmission channel and a communication connection between the computer device 2 and the external terminal.
  • the network may be Intranet, Internet, Global System of Mobile Communication (GSM), Wideband Code Division Multiple Access (WCDMA), 4G network, 5G Network, Bluetooth (Bluetooth), Wi-Fi and other wireless or wired networks.
  • FIG. 7 only shows the computer device 2 with components 20-23, but it should be understood that it is not required to implement all the components shown, and more or fewer components may be implemented instead.
  • the image recognition model training system 20 stored in the memory 21 can also be divided into one or more program modules, and the one or more program modules are stored in the memory 21 and consist of one Or executed by multiple processors (in this embodiment, the processor 22) to complete the application.
  • FIG. 6 shows a schematic diagram of the program modules of the second embodiment of the image recognition model training system 20.
  • the image recognition model training system 20 can be divided into a training set and a verification set creation module 200.
  • RGB image recognition model training module 202 to-be-trained YUV image recognition model construction module 204
  • YUV image recognition model training module 206 a program module referred to in this application refers to a series of computer-readable instruction segments that can complete specific functions, and is more suitable than a program to describe the execution process of the image recognition model training system 20 in the computer device 2.
  • the specific functions of the program module training set and verification set creation module 200-YUV image recognition model training module 206 have been described in detail in the foregoing embodiment, and will not be repeated here.
  • This embodiment also provides a computer-readable storage medium, such as flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM), only Read memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disks, optical disks, servers, App application malls, etc., on which are stored computer readable Instructions and programs implement corresponding functions when executed by the processor, and the computer-readable storage medium may be a non-volatile computer-readable storage medium or a volatile computer-readable storage medium.
  • the computer-readable storage medium of this embodiment is used to store the image recognition model training system 20, and when executed by a processor, realizes the image recognition method of the foregoing embodiment.
  • FIG. 8 shows a flowchart of the steps of the image recognition method according to the fifth embodiment of the present application. It can be understood that the flowchart in this method embodiment is not used to limit the order of execution of the steps. details as follows.
  • S310 Output the recognition result of the image to be recognized in the YUV data format through the YUV image recognition model.
  • step S310 further includes:
  • S311 Receive the to-be-identified image in the YUV data format
  • S312 Extract the chromaticity feature and brightness feature of the image to be recognized in the YUV data format through the input layer of the YUV image recognition model, and output the image recognition result through the output layer of the YUV image recognition model after recognition .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

Provided is an image recognition method. The method comprises: creating a training set and a validation set for image recognition based on an RGB data format; training an RGB image recognition model by using the training set and the validation set; constructing a YUV image recognition model to be trained, wherein the YUV image recognition model to be trained comprises an input layer, a prediction layer and an output layer, and the input layer comprises a luminance input branch and a chrominance input branch; and training the luminance input branch, the chrominance input branch and the prediction layer of the YUV image recognition model to be trained by using the trained RGB image recognition model and using a distillation method to obtain a YUV image recognition model, wherein the YUV image recognition model is used for recognizing an image in a YUV data format. According to the present application, training an input layer and a prediction layer of a YUV image recognition model by means of distillation by using an RGB image recognition model improves the efficiency of training the YUV image recognition model, and reduces the training cost for the YUV image recognition model.

Description

图像识别模型训练方法及系统和图像识别方法Image recognition model training method and system and image recognition method
本申请要求于2020年2月13日提交中国专利局、申请号为202010090927.4、发明名称为“图像识别模型训练方法及系统和图像识别方法”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on February 13, 2020, the application number is 202010090927.4, and the invention title is "Image Recognition Model Training Method and System and Image Recognition Method", the entire content of which is incorporated by reference In this application.
技术领域Technical field
本申请实施例涉及人工智能技术领域,尤其涉及图像识别模型训练方法及系统和图像识别方法。The embodiments of the present application relate to the field of artificial intelligence technology, in particular to an image recognition model training method and system, and an image recognition method.
背景技术Background technique
在图像识别领域,实际生产设备中图像使用的颜色空间根据其设备优势各有不同,比如视频传输设备为了节省带宽使用的YUV格式,对应的图像识别模型为YUV图像识别模型,或者带有红外探头的RGB+IR格式,对应的图像识别模型为RGB图像识别模型,RGB图像识别模型无法对YUV格式的图像进行识别,需要重新搭建一个YUV图像识别模型,再用YUV数据格式的训练数据对YUV图像识别模型进行训练,为提升YUV图像识别模型的准确度,需要对大量的训练数据进行人工标注,成本较高。In the field of image recognition, the color space used by images in actual production equipment varies according to their equipment advantages. For example, the YUV format used by video transmission equipment to save bandwidth, the corresponding image recognition model is the YUV image recognition model, or with an infrared probe RGB+IR format, the corresponding image recognition model is the RGB image recognition model. The RGB image recognition model cannot recognize the image in the YUV format. It is necessary to rebuild a YUV image recognition model, and then use the training data in the YUV data format to analyze the YUV image The recognition model is trained. In order to improve the accuracy of the YUV image recognition model, a large amount of training data needs to be manually annotated, which is costly.
为了降低深度学习模型应用的门槛,利用知识蒸馏的方法将高计算能力、高精度的模型包含的先验知识用于教授小模型的深度学习网络,可以实现对网络模型的压缩提速。然而,发明人发现,传统的知识蒸馏方法只是为了缩小网络规模和计算需求,但是仍然局限在同样形式的训练数据上,例如RGB图像识别模型只能蒸馏得到结构更小的RGB图像识别模型,得不到YUV模型,给模型蒸馏带来了应用上的限制。In order to lower the threshold of deep learning model application, the knowledge distillation method uses the prior knowledge contained in the high computing power and high precision model to teach the deep learning network of the small model, which can realize the compression and speed of the network model. However, the inventor found that the traditional knowledge distillation method is only to reduce the network size and computing requirements, but it is still limited to the same form of training data. For example, the RGB image recognition model can only distill to obtain a smaller structure RGB image recognition model. Less than the YUV model brings application limitations to the model distillation.
发明内容Summary of the invention
有鉴于此,本申请实施例提供了一种图像识别模型训练方法、系统、计算机设备及计算机可读存储介质和图像识别方法,用于解决构建新的图像识别模型步骤繁琐且成本高的问题。In view of this, the embodiments of the present application provide an image recognition model training method, system, computer equipment, computer readable storage medium, and image recognition method, which are used to solve the problem of cumbersome steps and high cost of building a new image recognition model.
本申请实施例是通过下述技术方案来解决上述技术问题:The embodiments of this application solve the above technical problems through the following technical solutions:
一种图像识别模型训练方法,包括:An image recognition model training method, including:
创建基于RGB数据格式的图像识别的训练集和验证集;Create training set and validation set for image recognition based on RGB data format;
利用所述训练集和所述验证集训练RGB图像识别模型,所述RGB图像识别模型用于训练YUV图像识别模型;Training an RGB image recognition model using the training set and the verification set, and the RGB image recognition model is used to train a YUV image recognition model;
搭建待训练YUV图像识别模型,所述待训练YUV图像识别模型包括输入层,预测层和输出层,所述输入层包括亮度输入分支和色度输入分支;Build a YUV image recognition model to be trained, the YUV image recognition model to be trained includes an input layer, a prediction layer and an output layer, and the input layer includes a luminance input branch and a chrominance input branch;
利用训练好的所述RGB图像识别模型使用蒸馏方法训练所述待训练YUV图像识别模型的亮度输入分支、色度输入分支和预测层,得到YUV图像识别模型,所述YUV图像识别模型用于识别YUV数据格式的图像。Use the trained RGB image recognition model to train the luminance input branch, chrominance input branch and prediction layer of the YUV image recognition model to be trained using the distillation method to obtain the YUV image recognition model, and the YUV image recognition model is used for recognition Image in YUV data format.
一种图像识别模型训练系统,包括:An image recognition model training system, including:
训练集和验证集创建模块,用于创建基于RGB数据格式的图像识别的训练集和验证集;Training set and validation set creation module, used to create training set and validation set for image recognition based on RGB data format;
RGB图像识别模型训练模块,用于利用所述训练集和所述验证集训练RGB图像识别模型,所述RGB图像识别模型用于训练YUV图像识别模型;The RGB image recognition model training module is used to train an RGB image recognition model using the training set and the verification set, and the RGB image recognition model is used to train a YUV image recognition model;
待训练YUV图像识别模型构建模块,用于构建待训练YUV图像识别模型,所述待训练YUV图像识别模型包括输入层,预测层和输出层,所述输入层包括亮度输入分支和色度输入分支;The YUV image recognition model building module to be trained is used to build the YUV image recognition model to be trained. The YUV image recognition model to be trained includes an input layer, a prediction layer and an output layer. The input layer includes a luminance input branch and a chrominance input branch ;
YUV图像识别模型训练模块,用于利用训练好的RGB图像识别模型使用蒸馏方法训练所述待训练YUV图像识别模型的亮度输入分支、色度输入分支和预测层,得到YUV图像识 别模型,所述YUV图像识别模型用于识别YUV数据格式的图像。The YUV image recognition model training module is used to train the luminance input branch, chrominance input branch and prediction layer of the YUV image recognition model to be trained using the distillation method using the trained RGB image recognition model to obtain the YUV image recognition model. The YUV image recognition model is used to recognize images in the YUV data format.
为了实现上述目的,本申请实施例还提供一种计算机设备,所述计算机设备包括存储器、处理器以及存储在所述存储器上并可在处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时执行以下步骤:In order to achieve the foregoing objective, an embodiment of the present application further provides a computer device that includes a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor, and the processor executes The following steps are performed when the computer-readable instructions are:
创建基于RGB数据格式的图像识别的训练集和验证集;Create training set and validation set for image recognition based on RGB data format;
利用所述训练集和所述验证集训练RGB图像识别模型,所述训练好的RGB图像识别模型用于训练YUV图像识别模型;Training an RGB image recognition model using the training set and the verification set, and the trained RGB image recognition model is used to train a YUV image recognition model;
搭建待训练YUV图像识别模型,所述待训练YUV图像识别模型包括输入层,预测层和输出层,所述输入层包括亮度输入分支和色度输入分支;Build a YUV image recognition model to be trained, the YUV image recognition model to be trained includes an input layer, a prediction layer and an output layer, and the input layer includes a luminance input branch and a chrominance input branch;
利用训练好的所述RGB图像识别模型使用蒸馏方法训练所述待训练YUV图像识别模型的亮度输入分支、色度输入分支和预测层,得到YUV图像识别模型,所述YUV图像识别模型用于识别YUV数据格式的图像。Use the trained RGB image recognition model to train the luminance input branch, chrominance input branch and prediction layer of the YUV image recognition model to be trained using the distillation method to obtain the YUV image recognition model, and the YUV image recognition model is used for recognition Image in YUV data format.
为了实现上述目的,本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质内存储有计算机可读指令,所述计算机可读指令可被至少一个处理器所执行,以使所述至少一个处理器执行以下步骤:In order to achieve the foregoing objective, embodiments of the present application also provide a computer-readable storage medium, in which computer-readable instructions are stored, and the computer-readable instructions can be executed by at least one processor to Make the at least one processor execute the following steps:
创建基于RGB数据格式的图像识别的训练集和验证集;Create training set and validation set for image recognition based on RGB data format;
利用所述训练集和所述验证集训练RGB图像识别模型,所述训练好的RGB图像识别模型用于训练YUV图像识别模型;Training an RGB image recognition model using the training set and the verification set, and the trained RGB image recognition model is used to train a YUV image recognition model;
搭建待训练YUV图像识别模型,所述待训练YUV图像识别模型包括输入层,预测层和输出层,所述输入层包括亮度输入分支和色度输入分支;Build a YUV image recognition model to be trained, the YUV image recognition model to be trained includes an input layer, a prediction layer and an output layer, and the input layer includes a luminance input branch and a chrominance input branch;
利用训练好的所述RGB图像识别模型使用蒸馏方法训练所述待训练YUV图像识别模型的亮度输入分支、色度输入分支和预测层,得到YUV图像识别模型,所述YUV图像识别模型用于识别YUV数据格式的图像。Use the trained RGB image recognition model to train the luminance input branch, chrominance input branch and prediction layer of the YUV image recognition model to be trained using the distillation method to obtain the YUV image recognition model, and the YUV image recognition model is used for recognition Image in YUV data format.
本申请还提供一种图像识别方法,包括以下步骤:This application also provides an image recognition method, including the following steps:
获取YUV数据格式的待识别图像;Obtain the image to be recognized in the YUV data format;
将所述YUV数据格式的待识别图像输入YUV图像识别模型,其中,所述YUV图像识别模型通过所述图像识别模型训练方法训练得到;Inputting the to-be-recognized image in the YUV data format into a YUV image recognition model, wherein the YUV image recognition model is obtained through training of the image recognition model training method;
通过所述YUV图像识别模型输出所述YUV数据格式的待识别图像的识别结果。Outputting the recognition result of the image to be recognized in the YUV data format through the YUV image recognition model.
本申请提供的图像识别模型训练方法、系统、计算机设备及计算机可读存储介质和图像识别方法,通过RGB图像识别模型蒸馏训练YUV图像识别模型的输入层和预测层,提高了YUV图像识别模型的训练效率,降低了YUV图像识别模型的训练成本。The image recognition model training method, system, computer equipment, computer readable storage medium, and image recognition method provided in this application train the input layer and prediction layer of the YUV image recognition model through the distillation of the RGB image recognition model, which improves the performance of the YUV image recognition model. The training efficiency reduces the training cost of the YUV image recognition model.
附图说明Description of the drawings
图1为本申请实施例一之图像识别模型训练方法的步骤流程图;FIG. 1 is a flowchart of the steps of the image recognition model training method according to the first embodiment of the application;
图2为本申请实施例RGB图像识别模型输入层结构示意图;2 is a schematic diagram of the input layer structure of an RGB image recognition model according to an embodiment of the application;
图3为本申请实施例利用训练好的RGB图像识别模型使用蒸馏方法训练所述待训练YUV图像识别模型的亮度输入分支、色度输入分支和预测层,得到YUV图像识别模型,所述YUV图像识别模型用于识别YUV数据格式的图像的步骤流程图;Fig. 3 is an embodiment of the application using a trained RGB image recognition model to train the luminance input branch, chromaticity input branch and prediction layer of the YUV image recognition model to be trained using a distillation method to obtain a YUV image recognition model, the YUV image A flow chart of the steps used by the recognition model to recognize images in YUV data format;
图4为本申请实施例根据训练好的RGB图像识别模型,获取所述待训练YUV图像识别模型的整体目标损失函数的步骤流程图;FIG. 4 is a flowchart of steps for obtaining the overall target loss function of the YUV image recognition model to be trained according to the trained RGB image recognition model according to the embodiment of the application;
图5为本申请实施例最小化所述整体目标损失函数,以得到所述YUV图像识别模型,所述整体目标损失函数通过学习率调整的步骤流程图;FIG. 5 is a flowchart of steps in which the overall target loss function is minimized to obtain the YUV image recognition model according to an embodiment of the application, and the overall target loss function is adjusted by a learning rate;
图6为本申请图像识别模型训练系统之实施例二的程序模块示意图;6 is a schematic diagram of the program modules of the second embodiment of the image recognition model training system of this application;
图7为本申请图像识别模型训练系统计算机设备之实施例三的硬件结构示意图;FIG. 7 is a schematic diagram of the hardware structure of the third embodiment of the computer equipment of the image recognition model training system according to the application;
图8为本申请实施例图像识别方法的步骤流程图;FIG. 8 is a flowchart of steps of an image recognition method according to an embodiment of this application;
图9为本申请实施例通过所述YUV图像识别模型输出所述YUV数据格式的待识别图像 的识别结果的步骤流程图。Fig. 9 is a flow chart of the steps of outputting the recognition result of the image to be recognized in the YUV data format through the YUV image recognition model according to an embodiment of the application.
具体实施方式Detailed ways
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to make the purpose, technical solutions, and advantages of this application clearer and clearer, the following further describes the application in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the application, and are not used to limit the application. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.
各个实施例之间的技术方案可以相互结合,但是必须是以本领域普通技术人员能够实现为基础,当技术方案的结合出现相互矛盾或无法实现时应当认为这种技术方案的结合不存在,也不在本申请要求的保护范围之内。The technical solutions between the various embodiments can be combined with each other, but they must be based on what can be achieved by a person of ordinary skill in the art. When the combination of technical solutions is contradictory or cannot be achieved, it should be considered that such a combination of technical solutions does not exist. It is not within the scope of protection required by this application.
实施例一Example one
请参阅图1,示出了本申请实施例之图像识别模型训练方法的步骤流程图。可以理解,本方法实施例中的流程图不用于对执行步骤的顺序进行限定。下面以计算机设备为执行主体进行示例性描述,具体如下:Please refer to FIG. 1, which shows a flowchart of steps of an image recognition model training method according to an embodiment of the present application. It can be understood that the flowchart in this method embodiment is not used to limit the order of execution of the steps. The following is an exemplary description with computer equipment as the main body of execution, and the details are as follows:
如图1所示,一种图像识别模型训练方法,包括:As shown in Figure 1, an image recognition model training method includes:
S100:创建基于RGB数据格式的图像识别的训练集和验证集;S100: Create training set and verification set for image recognition based on RGB data format;
具体的,在本实施例中,创建基于RGB数据格式的图像识别的训练集和验证集是指已经进行人工标注的RGB数据格式的图像,其中训练集用于训练RGB图像识别模型,验证集用于验证训练好的RGB图像识别模型的识别准确率。Specifically, in this embodiment, creating a training set and a verification set for image recognition based on RGB data format refers to images in RGB data format that have been manually annotated, where the training set is used to train the RGB image recognition model, and the verification set is used To verify the recognition accuracy of the trained RGB image recognition model.
S200:利用所述训练集和所述验证集训练RGB图像识别模型,所述RGB图像识别模型用于训练YUV图像识别模型;S200: Use the training set and the verification set to train an RGB image recognition model, where the RGB image recognition model is used to train a YUV image recognition model;
所述RGB图像识别模型的网络结构可分为输入层与预测层,如图2所示:其中输入层为预训练分类模型ResNet50,特征提取层有5组卷积块,第一组conv1(第一向量卷积运算),卷积核为7x7,通道数为64,2倍降采样;第二组conv2(第二向量卷积运算),包含1层3x3的最大池化层以及3组残差模块,通道数扩大4倍;以此类推,每一组向量卷积运算都进行2倍降采样,通道数扩大2倍。The network structure of the RGB image recognition model can be divided into an input layer and a prediction layer, as shown in Figure 2: The input layer is a pre-trained classification model ResNet50, and the feature extraction layer has 5 groups of convolutional blocks. A vector convolution operation), the convolution kernel is 7x7, the number of channels is 64, and 2 times downsampling; the second set of conv2 (the second vector convolution operation) includes 1 layer of 3x3 maximum pooling layer and 3 sets of residuals Module, the number of channels is expanded by 4 times; and so on, each group of vector convolution operations are down-sampled by 2 times, and the number of channels is expanded by 2 times.
预测层是利用提取的图片特征进行标签预测,对于C类目标分类任务,预测层由C个通道的1x1卷积层和平均池化层组成。The prediction layer uses the extracted picture features for label prediction. For the C-type target classification task, the prediction layer is composed of a 1x1 convolutional layer with C channels and an average pooling layer.
S300:搭建待训练YUV图像识别模型,所述待训练YUV图像识别模型包括输入层,预测层和输出层,所述输入层包括亮度输入分支和色度输入分支。S300: Build a YUV image recognition model to be trained. The YUV image recognition model to be trained includes an input layer, a prediction layer and an output layer. The input layer includes a luminance input branch and a chrominance input branch.
其中,其中输入层用于提取待识别图片的图片特征,所述输入层包括亮度输入分支和色度输入分支,用于提取YUV图像的亮度特征和色度特征。预测层是利用提取的亮度特征和色度特征进行标签预测,以图像分类案例说明:图像识别模型的识别目标是将多个类别的图片准确分类。具体的,待识别图像中有N张图片,属于猫、狗、车、树木等C个类别;对于待识别图像中的任意一张图片,已知的正确标签为[y 1,y 2,…,y c,…,y C],其中y i(i≠c)=0,y c=1,c为该图片所属类别。输出层是用于输出图像的分类类别。 Wherein, the input layer is used to extract the picture features of the picture to be recognized, and the input layer includes a luminance input branch and a chrominance input branch, and is used to extract the luminance characteristic and chrominance characteristic of the YUV image. The prediction layer uses the extracted brightness features and chroma features to perform label prediction. An example of image classification is used to illustrate: the recognition goal of the image recognition model is to accurately classify pictures of multiple categories. Specifically, there are N pictures in the image to be recognized, belonging to C categories such as cats, dogs, cars, and trees; for any picture in the image to be recognized, the known correct label is [y 1 ,y 2 ,... ,y c ,...,y C ], where y i (i≠c)=0, y c =1, and c is the category of the picture. The output layer is the classification category used to output the image.
S400:利用训练好的所述RGB图像识别模型使用蒸馏方法训练所述待训练YUV图像识别模型的亮度输入分支、色度输入分支和预测层,得到YUV图像识别模型,所述YUV图像识别模型用于识别YUV数据格式的图像。S400: Use the trained RGB image recognition model to train the luminance input branch, the chroma input branch and the prediction layer of the YUV image recognition model to be trained using a distillation method to obtain a YUV image recognition model. The YUV image recognition model uses To identify images in YUV data format.
蒸馏指的是将训练好的复杂模型中的预测能力迁移到一个结构更为简单的模型中,从而达到模型压缩的目的。复杂模型为被蒸馏模型,简单模型为蒸馏模型,在本实施例中,是将RGB图像识别模型的图像识别能力迁移到YUV图像识别模型上,其中,被蒸馏模型性能优良、准确率高,但是相对于蒸馏模型,被蒸馏模型的结构复杂、参数权重多、计算速度较慢。蒸馏模型是计算速度较快、适合部署到对实时性要求较高的单个神经网络,蒸馏模型相比于被蒸馏模型,具有更大的运算吞吐量、更简单的网络结构和更少的模型参数。Distillation refers to migrating the predictive ability of a well-trained complex model to a model with a simpler structure, so as to achieve the purpose of model compression. The complex model is the distilled model, and the simple model is the distillation model. In this embodiment, the image recognition capability of the RGB image recognition model is transferred to the YUV image recognition model. Among them, the distilled model has excellent performance and high accuracy, but Compared with the distillation model, the structure of the distillation model is complex, the parameter weight is more, and the calculation speed is slower. The distillation model is a faster calculation speed and is suitable for deployment to a single neural network that requires high real-time performance. Compared with the distilled model, the distillation model has greater computational throughput, simpler network structure and fewer model parameters. .
具体的,在本实施例中,所述RGB图像识别模型作为被蒸馏模型,其优势在于可以利用大的公开预训练网络和数量可观的RGB训练数据,得到精度较高的模型参数。Specifically, in this embodiment, the RGB image recognition model is used as a distilled model, and its advantage is that a large public pre-training network and a considerable amount of RGB training data can be used to obtain model parameters with higher accuracy.
在一实施方式中,如图3所示,步骤S400进一步包括:In an embodiment, as shown in FIG. 3, step S400 further includes:
S401:根据训练好的RGB图像识别模型,获取所述待训练YUV图像识别模型的整体目标损失函数;S401: Obtain the overall target loss function of the YUV image recognition model to be trained according to the trained RGB image recognition model;
具体的,对于一张待分类图像,RGB图像识别模型预测出C个类别,类别c的目标损失函数为Specifically, for an image to be classified, the RGB image recognition model predicts C categories, and the target loss function of category c is
Figure PCTCN2020093033-appb-000001
Figure PCTCN2020093033-appb-000001
则所述RGB图像识别模型的整体目标损失方程为Then the overall target loss equation of the RGB image recognition model is
Figure PCTCN2020093033-appb-000002
Figure PCTCN2020093033-appb-000002
其中,y c是指RGB图像识别模型预测出的值,c是指RGB图像识别模型预测出的C个类别,记为[x 1,x 2,…,x c,…,x C],L C hard是指不加入温度参数T时,类别C的目标损失函数,L hard是不加入温度参数T时,RGB图像识别模型的整体目标函数。 Among them, y c refers to the value predicted by the RGB image recognition model, and c refers to the C categories predicted by the RGB image recognition model, denoted as [x 1 ,x 2 ,...,x c ,...,x C ], L C hard refers to the objective loss function of category C when the temperature parameter T is not added, and L hard is the overall objective function of the RGB image recognition model when the temperature parameter T is not added.
具体的,可以通过大量已知标签训练集的RGB图像学得使L hard,即RGB图像识别模型损失函数值最小的模型参数,使所述RGB图像识别模型识别误差最小。 Specifically, it is possible to learn L hard , that is, the model parameter with the smallest loss function value of the RGB image recognition model, through a large number of RGB images of a known label training set, so as to minimize the recognition error of the RGB image recognition model.
在一实施方式中,如图4所示,步骤S401进一步包括:In one embodiment, as shown in FIG. 4, step S401 further includes:
S4011:获取所述RGB图像识别模型的软目标;S4011: Acquire the soft target of the RGB image recognition model;
具体的,软目标指的被蒸馏模型使用带有温度参数T的预测层损失函数的输出结果。通过加入温度参数T,错误分类再经过预测层后,错误输出会被放大,正确分类会被缩小,也就是说,通过加入温度参数T,人为的增加了训练难度,一旦将T重新设置为1,分类结果会非常接近RGB图像识别模型的分类结果。Specifically, the soft target refers to the output result of the distilled model using the prediction layer loss function with the temperature parameter T. By adding the temperature parameter T, after the error classification passes through the prediction layer, the error output will be enlarged and the correct classification will be reduced. That is to say, by adding the temperature parameter T, the training difficulty is artificially increased. Once T is reset to 1 , The classification result will be very close to the classification result of the RGB image recognition model.
软目标用公式表达为:The soft target is expressed as a formula:
Figure PCTCN2020093033-appb-000003
Figure PCTCN2020093033-appb-000003
当T=1时,此时When T=1, at this time
Figure PCTCN2020093033-appb-000004
Figure PCTCN2020093033-appb-000004
此时得到所述RGB图像识别模型的硬目标,硬目标指的是将温度参数设为1,正常网络训练的目标。At this time, the hard target of the RGB image recognition model is obtained. The hard target refers to the target of normal network training with the temperature parameter set to 1.
其中,q c为软目标,c是指RGB图像识别模型预测出的C个类别,记为[x 1,x 2,…,x c,…,x C],T为温度参数。 Among them, q c is the soft target, c refers to the C categories predicted by the RGB image recognition model, denoted as [x 1 ,x 2 ,...,x c ,...,x C ], and T is the temperature parameter.
S4012:根据所述RGB图像识别模型的软目标,获取所述待训练YUV图像识别模型的整体目标损失函数。S4012: Obtain the overall target loss function of the YUV image recognition model to be trained according to the soft target of the RGB image recognition model.
具体的,通过损失函数
Figure PCTCN2020093033-appb-000005
Figure PCTCN2020093033-appb-000006
得到YUV图像识别模型的第一目标损失函数为
Specifically, through the loss function
Figure PCTCN2020093033-appb-000005
and
Figure PCTCN2020093033-appb-000006
The first objective loss function of the YUV image recognition model is obtained as
Figure PCTCN2020093033-appb-000007
其中第一目标损失函数与软目标对应,是包含蒸馏学习的温度参数的函数。
Figure PCTCN2020093033-appb-000007
The first objective loss function corresponds to the soft objective, and is a function that includes the temperature parameters learned by distillation.
其中,y soft为RGB图像识别模型在温度T的条件下,预测出的值。 Among them, y soft is the value predicted by the RGB image recognition model under the condition of temperature T.
YUV图像识别模型的第二目标损失函数为The second objective loss function of the YUV image recognition model is
Figure PCTCN2020093033-appb-000008
Figure PCTCN2020093033-appb-000008
其中第二目标损失函数与硬目标对应,是不包含蒸馏学习的温度参数的损失函数。The second objective loss function corresponds to the hard objective, and is a loss function that does not include the temperature parameter learned by distillation.
具体的,所述蒸馏模型的整体目标损失函数为L=L 1+L 2Specifically, the overall objective loss function of the distillation model is L=L 1 +L 2 ,
因此,YUV图像识别模型的整体目标损失函数为:Therefore, the overall objective loss function of the YUV image recognition model is:
Figure PCTCN2020093033-appb-000009
Figure PCTCN2020093033-appb-000009
其中,L 1为第一目标损失函数,L 2为第二目标损失函数,L为整体目标损失函数。 Among them, L 1 is the first objective loss function, L 2 is the second objective loss function, and L is the overall objective loss function.
S402:通过所述整体目标损失函数对所述待训练YUV图像识别模型的输入层和预测层进行训练,得到所述YUV图像识别模型。S402: Train the input layer and the prediction layer of the YUV image recognition model to be trained by using the overall target loss function to obtain the YUV image recognition model.
在一实施方式中,步骤S402进一步包括:In an embodiment, step S402 further includes:
S4021:最小化所述整体目标损失函数,以得到所述YUV图像识别模型,所述整体目标损失函数通过学习率调整。S4021: Minimize the overall target loss function to obtain the YUV image recognition model, and the overall target loss function is adjusted by a learning rate.
具体的,深度学习模型包含大量的可学习参数,训练模型就是不断调整参数直到目标函数值最小的过程。学习率就是衡量调整参数的“步伐”的一个重要指标,即通过调整学习率是可以对模型的训练进度进行控制,具体的,学习率是对模型参数的变化情况进行控制,用公式表示为:更新后的参数=当前参数-学习率*损失函数的梯度。针对不同的模型,每一层的学习率,以及训练过程中每个阶段的学习率都有不同的选择策略。Specifically, the deep learning model contains a large number of learnable parameters, and the training model is a process of continuously adjusting the parameters until the objective function value is the smallest. The learning rate is an important indicator to measure the "step" of adjusting the parameters, that is, the training progress of the model can be controlled by adjusting the learning rate. Specifically, the learning rate is the control of the change of the model parameters, expressed by the formula: The updated parameter=current parameter-learning rate*gradient of loss function. There are different selection strategies for different models, the learning rate of each layer, and the learning rate of each stage in the training process.
在一实施方式中,如图5所示,步骤S4021进一步包括:In one embodiment, as shown in FIG. 5, step S4021 further includes:
S4021A:调整所述亮度输入分支、所述色度输入分支与所述预测层的学习率为第一学习率,进行初步训练;S4021A: Adjust the learning rate of the luminance input branch, the chrominance input branch, and the prediction layer to the first learning rate, and perform preliminary training;
在一实施方式中,调整亮度输入分支和预测层时,设置亮度输入分支和预测层的第一学习率为0.01,而此时色度输入分支不参与训练,第一学习率为0。In one embodiment, when adjusting the luminance input branch and the prediction layer, the first learning rate of the luminance input branch and the prediction layer is set to 0.01, and the chrominance input branch does not participate in training at this time, and the first learning rate is 0.
S4021B:调整所述亮度输入分支、所述色度输入分支与所述预测层的学习率为第二学习率,进行精细训练;S4021B: Adjust the learning rate of the luminance input branch, the chrominance input branch and the prediction layer to a second learning rate, and perform fine training;
具体的,完成第一步训练后,YUV图像识别模型已经可以识别目标,只是由于缺少色度信息,识别精度较低,此时,加入色度输入分支补充模型能力。亮度输入分支的特征提取已经在第一步中完成,因此需要固定亮度输入分支,即将亮度输入分支的第二学习率设置为0。训练色度输入分支与预测分支时,色度输入分支的第二学习率设为0.01,而由于预测层已经经过学习,不是随机初始化的参数,需要减小“步伐”,因此将预测层的第二 学习率设为0.001。此时经过第一步的训练,色度输入分支与预测层学习的是残差损失,可以快速收敛,降低了学习难度和训练时间。Specifically, after the first step of training is completed, the YUV image recognition model can already recognize the target, but the recognition accuracy is low due to the lack of chromaticity information. At this time, the chromaticity input branch is added to supplement the model's ability. The feature extraction of the luminance input branch has been completed in the first step, so the luminance input branch needs to be fixed, that is, the second learning rate of the luminance input branch is set to 0. When training the chroma input branch and the prediction branch, the second learning rate of the chroma input branch is set to 0.01, and because the prediction layer has been learned and is not a randomly initialized parameter, the "step" needs to be reduced, so the second learning rate of the prediction layer is changed. Second, the learning rate is set to 0.001. At this time, after the first step of training, the chroma input branch and the prediction layer learn residual loss, which can quickly converge, reducing the learning difficulty and training time.
S4021C:调整所述亮度输入分支、所述色度输入分支与所述预测层的学习率为第三学习率,得到所述YUV图像识别模型。S4021C: Adjust the learning rate of the luminance input branch, the chrominance input branch and the prediction layer to a third learning rate to obtain the YUV image recognition model.
具体的,分布调参可以减小模型学习难度,但最后还是需要进行联合调整,得到整体最优解。将亮度输入分支、色度输入分支以及预测层的第三学习率都设为0.0005,小步伐地调整参数值,得到最佳模型参数,进而得到YUV图像识别模型。Specifically, distributed tuning can reduce the difficulty of model learning, but in the end, joint adjustments are needed to obtain the overall optimal solution. Set the third learning rate of the luminance input branch, chrominance input branch and prediction layer to 0.0005, adjust the parameter values in small steps to obtain the best model parameters, and then obtain the YUV image recognition model.
本申请实施例提出了一种YUV图像识别模型构建方法,可以利用不同类型的数据格式进行迁移学习。相比于传统模型蒸馏,本申请针对输入数据格式的特性调整了模型的输入模块,增加了亮度分支与色度分支;同时,利用了RGB图像识别模型的高算力性能,通过加入“软目标”,学习不同类别之前分布差异;另外,调整了模型结构后,细化了YUV图像识别模型的训练过程,采用阶段式的训练步骤,先利用亮度分量完成预测目标,后利用色度分量学习残差部分,降低了迁移学习的难度,提升模型精度。本申请实施例还提供一种图像识别方法,可以直接用YUV图像识别模型对具有YUV的图像进行识别,不需要将YUV的图像转换为RGB的图像,再利用YUV图像识别模型识别,提高了YUV图像的识别效率。The embodiment of the present application proposes a method for constructing a YUV image recognition model, which can use different types of data formats for transfer learning. Compared with the traditional model distillation, this application adjusts the input module of the model according to the characteristics of the input data format, and adds the brightness branch and the chroma branch; at the same time, it takes advantage of the high computing power performance of the RGB image recognition model, by adding the "soft target ", the distribution difference before learning different categories; in addition, after adjusting the model structure, the training process of the YUV image recognition model is refined, using a staged training step, first using the brightness component to complete the prediction target, and then using the chroma component to learn the residual The poor part reduces the difficulty of transfer learning and improves the accuracy of the model. The embodiment of the application also provides an image recognition method, which can directly use the YUV image recognition model to recognize images with YUV, without converting the YUV image to an RGB image, and then use the YUV image recognition model to recognize, which improves YUV. Image recognition efficiency.
实施例二Example two
请继续参阅图6,示出了本申请图像识别系统的程序模块示意图。在本实施例中,图像识别模型训练系统20可以包括或被分割成一个或多个程序模块,一个或者多个程序模块被存储于存储介质中,并由一个或多个处理器所执行,以完成本申请,并可实现上述图像识别方法。本申请实施例所称的程序模块是指能够完成特定功能的一系列计算机可读指令指令段,比程序本身更适合于描述图像识别模型训练系统20在存储介质中的执行过程。以下描述将具体介绍本实施例各程序模块的功能:Please continue to refer to FIG. 6, which shows a schematic diagram of program modules of the image recognition system of the present application. In this embodiment, the image recognition model training system 20 may include or be divided into one or more program modules, and the one or more program modules are stored in a storage medium and executed by one or more processors to This application is completed, and the above-mentioned image recognition method can be realized. The program module referred to in the embodiments of the present application refers to a series of computer-readable instruction instruction segments capable of completing specific functions, and is more suitable for describing the execution process of the image recognition model training system 20 in the storage medium than the program itself. The following description will specifically introduce the functions of each program module in this embodiment:
训练集和验证集创建模块200:用于创建基于RGB数据格式的图像识别的训练集和验证集;Training set and validation set creation module 200: used to create training set and validation set for image recognition based on RGB data format;
RGB图像识别模型训练模块202:用于利用所述训练集和所述验证集训练RGB图像识别模型,所述RGB图像识别模型用于训练YUV图像识别模型;RGB image recognition model training module 202: used to train an RGB image recognition model using the training set and the verification set, and the RGB image recognition model is used to train a YUV image recognition model;
待训练YUV图像识别模型构建模块204:用于构建待训练YUV图像识别模型,所述待训练YUV图像识别模型包括输入层,预测层和输出层,所述输入层包括亮度输入分支和色度输入分支;To-be-trained YUV image recognition model construction module 204: used to construct a to-be-trained YUV image recognition model. The to-be-trained YUV image recognition model includes an input layer, a prediction layer and an output layer. The input layer includes a luminance input branch and a chrominance input Branch
YUV图像识别模型训练模块206:用于利用训练好的RGB图像识别模型使用蒸馏方法训练所述待训练YUV图像识别模型的亮度输入分支、色度输入分支和预测层,得到YUV图像识别模型,所述YUV图像识别模型用于识别YUV数据格式的图像。YUV image recognition model training module 206: used to train the luminance input branch, chrominance input branch and prediction layer of the YUV image recognition model to be trained using the distillation method using the trained RGB image recognition model to obtain the YUV image recognition model. The YUV image recognition model is used to recognize images in YUV data format.
进一步地,所述YUV数据格式图像训练模块206还用于:Further, the YUV data format image training module 206 is also used for:
根据训练好的RGB图像识别模型,获取所述待训练YUV图像识别模型的整体目标损失函数;Obtaining the overall objective loss function of the YUV image recognition model to be trained according to the trained RGB image recognition model;
通过所述整体目标损失函数对所述待训练YUV图像识别模型的输入层和预测层进行训练,得到所述YUV图像识别模型。The input layer and the prediction layer of the YUV image recognition model to be trained are trained by the overall target loss function to obtain the YUV image recognition model.
进一步地,所述YUV数据格式图像训练模块206还用于:Further, the YUV data format image training module 206 is also used for:
获取所述RGB图像识别模型的软目标;Acquiring the soft target of the RGB image recognition model;
根据所述RGB图像识别模型的软目标,获取所述待训练YUV图像识别模型的整体目标损失函数。According to the soft target of the RGB image recognition model, the overall target loss function of the YUV image recognition model to be trained is obtained.
进一步地,所述YUV数据格式图像训练模块206还用于:Further, the YUV data format image training module 206 is also used for:
最小化所述整体目标损失函数,以得到所述YUV图像识别模型,所述整体目标损失函数通过学习率调整。The overall target loss function is minimized to obtain the YUV image recognition model, and the overall target loss function is adjusted by a learning rate.
进一步地,所述YUV数据格式图像训练模块206还用于:Further, the YUV data format image training module 206 is also used for:
调整所述亮度输入分支、所述色度输入分支与所述预测层的学习率为第一学习率,进行初步训练;Adjusting the learning rate of the luminance input branch, the chrominance input branch and the prediction layer to the first learning rate, and performing preliminary training;
调整所述亮度输入分支、所述色度输入分支与所述预测层的学习率为第二学习率,进行精细训练;Adjusting the learning rate of the luminance input branch, the chrominance input branch, and the prediction layer to a second learning rate, and performing fine training;
调整所述亮度输入分支、所述色度输入分支与所述预测层的学习率为第三学习率,得到所述YUV图像识别模型。The learning rate of the luminance input branch, the chrominance input branch and the prediction layer is adjusted to a third learning rate to obtain the YUV image recognition model.
实施例三Example three
参阅图7,是本申请实施例三之计算机设备的硬件架构示意图。本实施例中,所述计算机设备2是一种能够按照事先设定或者存储的指令,自动进行数值计算和/或信息处理的设备。该计算机设备2可以是机架式服务器、刀片式服务器、塔式服务器或机柜式服务器(包括独立的服务器,或者多个服务器所组成的服务器集群)等。如图7所示,所述计算机设备2至少包括,但不限于,可通过系统总线相互通信连接存储器21、处理器22、网络接口23、以及图像识别模型训练系统20。其中:Refer to FIG. 7, which is a schematic diagram of the hardware architecture of the computer device according to the third embodiment of the present application. In this embodiment, the computer device 2 is a device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions. The computer device 2 may be a rack server, a blade server, a tower server, or a cabinet server (including an independent server or a server cluster composed of multiple servers). As shown in FIG. 7, the computer device 2 at least includes, but is not limited to, a memory 21, a processor 22, a network interface 23, and an image recognition model training system 20 that can communicate with each other through a system bus. in:
本实施例中,存储器21至少包括一种类型的计算机可读存储介质,所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,存储器21可以是计算机设备2的内部存储单元,例如该计算机设备2的硬盘或内存。在另一些实施例中,存储器21也可以是计算机设备2的外部存储设备,例如该计算机设备2上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。当然,存储器21还可以既包括计算机设备2的内部存储单元也包括其外部存储设备。本实施例中,存储器21通常用于存储安装于计算机设备2的操作系统和各类应用软件,例如上述实施例所述的图像识别模型训练系统20的程序代码等。此外,存储器21还可以用于暂时地存储已经输出或者将要输出的各类数据。In this embodiment, the memory 21 includes at least one type of computer-readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory ( RAM), static random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disks, optical disks, etc. In some embodiments, the memory 21 may be an internal storage unit of the computer device 2, for example, a hard disk or a memory of the computer device 2. In other embodiments, the memory 21 may also be an external storage device of the computer device 2, such as a plug-in hard disk, a smart media card (SMC), and a secure digital (Secure Digital, SMC) equipped on the computer device 2. SD) card, flash card (Flash Card), etc. Of course, the memory 21 may also include both the internal storage unit of the computer device 2 and its external storage device. In this embodiment, the memory 21 is generally used to store an operating system and various application software installed in the computer device 2, such as the program code of the image recognition model training system 20 described in the foregoing embodiment. In addition, the memory 21 can also be used to temporarily store various types of data that have been output or will be output.
处理器22在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器22通常用于控制计算机设备2的总体操作。本实施例中,处理器22用于运行存储器21中存储的程序代码或者处理数据,例如运行图像识别模型训练系统20,以实现上述实施例的图像识别模型训练方法。The processor 22 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips in some embodiments. The processor 22 is generally used to control the overall operation of the computer device 2. In this embodiment, the processor 22 is used to run the program code or process data stored in the memory 21, for example, to run the image recognition model training system 20, so as to implement the image recognition model training method of the foregoing embodiment.
所述网络接口23可包括无线网络接口或有线网络接口,该网络接口23通常用于在所述计算机设备2与其他电子装置之间建立通信连接。例如,所述网络接口23用于通过网络将所述计算机设备2与外部终端相连,在所述计算机设备2与外部终端之间的建立数据传输通道和通信连接等。所述网络可以是企业内部网(Intranet)、互联网(Internet)、全球移动通讯系统(Global System of Mobile communication,GSM)、宽带码分多址(Wideband Code Division Multiple Access,WCDMA)、4G网络、5G网络、蓝牙(Bluetooth)、Wi-Fi等无线或有线网络。The network interface 23 may include a wireless network interface or a wired network interface, and the network interface 23 is generally used to establish a communication connection between the computer device 2 and other electronic devices. For example, the network interface 23 is used to connect the computer device 2 with an external terminal through a network, and establish a data transmission channel and a communication connection between the computer device 2 and the external terminal. The network may be Intranet, Internet, Global System of Mobile Communication (GSM), Wideband Code Division Multiple Access (WCDMA), 4G network, 5G Network, Bluetooth (Bluetooth), Wi-Fi and other wireless or wired networks.
需要指出的是,图7仅示出了具有部件20-23的计算机设备2,但是应理解的是,并不要求实施所有示出的部件,可以替代的实施更多或者更少的部件。It should be pointed out that FIG. 7 only shows the computer device 2 with components 20-23, but it should be understood that it is not required to implement all the components shown, and more or fewer components may be implemented instead.
在本实施例中,存储于存储器21中的所述图像识别模型训练系统20还可以被分割为一个或者多个程序模块,所述一个或者多个程序模块被存储于存储器21中,并由一个或多个处理器(本实施例为处理器22)所执行,以完成本申请。In this embodiment, the image recognition model training system 20 stored in the memory 21 can also be divided into one or more program modules, and the one or more program modules are stored in the memory 21 and consist of one Or executed by multiple processors (in this embodiment, the processor 22) to complete the application.
例如,图6示出了所述实现图像识别模型训练系统20实施例二的程序模块示意图,该实施例中,所述基于图像识别模型训练系统20可以被划分为训练集和验证集创建模块200、RGB图像识别模型训练模块202、待训练YUV图像识别模型构建模块204和YUV图像识别模型训练模块206。其中,本申请所称的程序模块是指能够完成特定功能的一系列计算机可读指令段,比程序更适合于描述所述图像识别模型训练系统20在所述计算机设备2 中的执行过程。所述程序模块训练集和验证集创建模块200-YUV图像识别模型训练模块206的具体功能在上述实施例中已有详细描述,在此不再赘述。For example, FIG. 6 shows a schematic diagram of the program modules of the second embodiment of the image recognition model training system 20. In this embodiment, the image recognition model training system 20 can be divided into a training set and a verification set creation module 200. , RGB image recognition model training module 202, to-be-trained YUV image recognition model construction module 204, and YUV image recognition model training module 206. Among them, the program module referred to in this application refers to a series of computer-readable instruction segments that can complete specific functions, and is more suitable than a program to describe the execution process of the image recognition model training system 20 in the computer device 2. The specific functions of the program module training set and verification set creation module 200-YUV image recognition model training module 206 have been described in detail in the foregoing embodiment, and will not be repeated here.
实施例四Example four
本实施例还提供一种计算机可读存储介质,如闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘、服务器、App应用商城等等,其上存储有计算机可读指令,程序被处理器执行时实现相应功能,所述计算机可读存储介质可以是非易失性计算机可读存储介质,也可以是易失性计算机可读存储介质。本实施例的计算机可读存储介质用于存储图像识别模型训练系统20,被处理器执行时实现上述实施例的图像识别方法。This embodiment also provides a computer-readable storage medium, such as flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM), only Read memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disks, optical disks, servers, App application malls, etc., on which are stored computer readable Instructions and programs implement corresponding functions when executed by the processor, and the computer-readable storage medium may be a non-volatile computer-readable storage medium or a volatile computer-readable storage medium. The computer-readable storage medium of this embodiment is used to store the image recognition model training system 20, and when executed by a processor, realizes the image recognition method of the foregoing embodiment.
实施例五Example five
参阅图8,示出了本申请实施例五之图像识别方法的步骤流程图。可以理解,本方法实施例中的流程图不用于对执行步骤的顺序进行限定。具体如下。Referring to FIG. 8, it shows a flowchart of the steps of the image recognition method according to the fifth embodiment of the present application. It can be understood that the flowchart in this method embodiment is not used to limit the order of execution of the steps. details as follows.
S110:获取YUV数据格式的待识别图像;S110: Obtain the image to be recognized in the YUV data format;
S210:将所述YUV数据格式的待识别图像输入YUV图像识别模型;S210: Input the to-be-recognized image in the YUV data format into the YUV image recognition model;
S310:通过所述YUV图像识别模型输出所述YUV数据格式的待识别图像的识别结果。S310: Output the recognition result of the image to be recognized in the YUV data format through the YUV image recognition model.
在一实施方式中,请参阅图9,步骤S310进一步包括:In one embodiment, referring to FIG. 9, step S310 further includes:
S311:接收所述YUV数据格式的待识别图像;S311: Receive the to-be-identified image in the YUV data format;
S312:通过所述YUV图像识别模型的输入层对所述YUV数据格式的待识别图像的色度特征和亮度特征进行提取,经过识别后将图像识别结果通过所述YUV图像识别模型的输出层输出。S312: Extract the chromaticity feature and brightness feature of the image to be recognized in the YUV data format through the input layer of the YUV image recognition model, and output the image recognition result through the output layer of the YUV image recognition model after recognition .
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。The serial numbers of the foregoing embodiments of the present application are for description only, and do not represent the superiority or inferiority of the embodiments.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。Through the description of the above implementation manners, those skilled in the art can clearly understand that the above-mentioned embodiment method can be implemented by means of software plus the necessary general hardware platform, of course, it can also be implemented by hardware, but in many cases the former is better.的实施方式。
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。The above are only the preferred embodiments of the application, and do not limit the scope of the patent for this application. Any equivalent structure or equivalent process transformation made using the content of the description and drawings of the application, or directly or indirectly applied to other related technical fields , The same reason is included in the scope of patent protection of this application.

Claims (20)

  1. 一种图像识别模型训练方法,其中,包括:An image recognition model training method, which includes:
    创建基于RGB数据格式的图像识别的训练集和验证集;Create training set and verification set for image recognition based on RGB data format;
    利用所述训练集和所述验证集训练RGB图像识别模型,所述训练好的RGB图像识别模型用于训练YUV图像识别模型;Training an RGB image recognition model using the training set and the verification set, and the trained RGB image recognition model is used to train a YUV image recognition model;
    搭建待训练YUV图像识别模型,所述待训练YUV图像识别模型包括输入层,预测层和输出层,所述输入层包括亮度输入分支和色度输入分支;Build a YUV image recognition model to be trained, the YUV image recognition model to be trained includes an input layer, a prediction layer and an output layer, and the input layer includes a luminance input branch and a chrominance input branch;
    利用训练好的所述RGB图像识别模型使用蒸馏方法训练所述待训练YUV图像识别模型的亮度输入分支、色度输入分支和预测层,得到YUV图像识别模型,所述YUV图像识别模型用于识别YUV数据格式的图像。Use the trained RGB image recognition model to train the luminance input branch, chrominance input branch and prediction layer of the YUV image recognition model to be trained using the distillation method to obtain the YUV image recognition model, and the YUV image recognition model is used for recognition Image in YUV data format.
  2. 根据权利要求1所述的图像识别模型训练方法,其中,所述利用训练好的RGB图像识别模型使用蒸馏方法训练所述待训练YUV图像识别模型的亮度输入分支、色度输入分支和预测层,得到YUV图像识别模型,所述YUV图像识别模型用于识别YUV数据格式的图像包括:The image recognition model training method according to claim 1, wherein said using the trained RGB image recognition model uses a distillation method to train the luminance input branch, chrominance input branch and prediction layer of the YUV image recognition model to be trained, Obtain a YUV image recognition model, where the YUV image recognition model is used to recognize images in YUV data format including:
    根据训练好的RGB图像识别模型,获取所述待训练YUV图像识别模型的整体目标损失函数;Obtaining the overall objective loss function of the YUV image recognition model to be trained according to the trained RGB image recognition model;
    通过所述整体目标损失函数对所述待训练YUV图像识别模型的输入层和预测层进行训练,得到所述YUV图像识别模型。The input layer and the prediction layer of the YUV image recognition model to be trained are trained by the overall target loss function to obtain the YUV image recognition model.
  3. 根据权利要求2所述的图像识别模型训练方法,其中,所述根据训练好的RGB图像识别模型,获取所述待训练YUV图像识别模型的整体目标损失函数包括:The image recognition model training method according to claim 2, wherein the obtaining the overall target loss function of the YUV image recognition model to be trained according to the trained RGB image recognition model comprises:
    获取所述RGB图像识别模型的软目标;Acquiring the soft target of the RGB image recognition model;
    根据所述RGB图像识别模型的软目标,获取所述待训练YUV图像识别模型的整体目标损失函数。According to the soft target of the RGB image recognition model, the overall target loss function of the YUV image recognition model to be trained is obtained.
  4. 根据权利要求2所述的图像识别模型训练方法,其中,所述通过所述整体目标损失函数对所述待训练YUV图像识别模型的输入层和预测层进行训练,得到所述YUV图像识别模型包括:The image recognition model training method according to claim 2, wherein the training the input layer and the prediction layer of the YUV image recognition model to be trained through the overall target loss function to obtain the YUV image recognition model comprises :
    最小化所述整体目标损失函数,以得到所述YUV图像识别模型,所述整体目标损失函数通过学习率调整。The overall target loss function is minimized to obtain the YUV image recognition model, and the overall target loss function is adjusted by a learning rate.
  5. 根据权利要求4所述的图像识别模型训练方法,其中,所述最小化所述整体目标损失函数,以得到所述YUV图像识别模型,所述整体目标损失函数通过学习率调整包括:The image recognition model training method according to claim 4, wherein the minimizing the overall target loss function to obtain the YUV image recognition model, and adjusting the overall target loss function through a learning rate comprises:
    调整所述亮度输入分支、所述色度输入分支与所述预测层的学习率为第一学习率,进行初步训练;Adjusting the learning rate of the luminance input branch, the chrominance input branch and the prediction layer to the first learning rate, and performing preliminary training;
    调整所述亮度输入分支、所述色度输入分支与所述预测层的学习率为第二学习率,进行精细训练;Adjusting the learning rate of the luminance input branch, the chrominance input branch, and the prediction layer to a second learning rate, and performing fine training;
    调整所述亮度输入分支、所述色度输入分支与所述预测层的学习率为第三学习率,得到所述YUV图像识别模型。The learning rate of the luminance input branch, the chrominance input branch and the prediction layer is adjusted to a third learning rate to obtain the YUV image recognition model.
  6. 根据权利要求5所述的图像识别模型训练方法,其中,所述调整所述亮度输入分支、所述色度输入分支与所述预测层的学习率为第二学习率,进行精细训练包括:The image recognition model training method according to claim 5, wherein the adjusting the learning rate of the luminance input branch, the chrominance input branch and the prediction layer is the second learning rate, and performing fine training comprises:
    固定所述亮度输入分支,调整所述色度输入分支与所述预测层的学习率为第二学习率,进行精细训练。Fixing the luminance input branch, adjusting the learning rate of the chrominance input branch and the prediction layer to a second learning rate, and performing fine training.
  7. 一种图像识别模型训练系统,其中,包括:An image recognition model training system, which includes:
    训练集和验证集创建模块,用于创建基于RGB数据格式的图像识别的训练集和验证集;Training set and validation set creation module, used to create training set and validation set for image recognition based on RGB data format;
    RGB图像识别模型训练模块,用于利用所述训练集和所述验证集训练RGB图像识别模型,所述RGB图像识别模型用于训练YUV图像识别模型;The RGB image recognition model training module is used to train an RGB image recognition model using the training set and the verification set, and the RGB image recognition model is used to train a YUV image recognition model;
    待训练YUV图像识别模型构建模块,用于构建待训练YUV图像识别模型,所述待训练 YUV图像识别模型包括输入层,预测层和输出层,所述输入层包括亮度输入分支和色度输入分支;The YUV image recognition model building module to be trained is used to build the YUV image recognition model to be trained. The YUV image recognition model to be trained includes an input layer, a prediction layer and an output layer. The input layer includes a luminance input branch and a chrominance input branch ;
    YUV图像识别模型训练模块,用于利用训练好的RGB图像识别模型使用蒸馏方法训练所述待训练YUV图像识别模型的亮度输入分支、色度输入分支和预测层,得到YUV图像识别模型,所述YUV图像识别模型用于识别YUV数据格式的图像。The YUV image recognition model training module is used to train the luminance input branch, chrominance input branch and prediction layer of the YUV image recognition model to be trained using the distillation method using the trained RGB image recognition model to obtain the YUV image recognition model. The YUV image recognition model is used to recognize images in the YUV data format.
  8. 一种计算机设备,所述计算机设备包括存储器、处理器以及存储在所述存储器上并可在所述处理器上运行的计算机可读指令,其中,所述处理器执行所述计算机可读指令时执行以下步骤:A computer device including a memory, a processor, and computer-readable instructions stored on the memory and capable of running on the processor, wherein when the processor executes the computer-readable instructions Perform the following steps:
    创建基于RGB数据格式的图像识别的训练集和验证集;Create training set and verification set for image recognition based on RGB data format;
    利用所述训练集和所述验证集训练RGB图像识别模型,所述训练好的RGB图像识别模型用于训练YUV图像识别模型;Training an RGB image recognition model using the training set and the verification set, and the trained RGB image recognition model is used to train a YUV image recognition model;
    搭建待训练YUV图像识别模型,所述待训练YUV图像识别模型包括输入层,预测层和输出层,所述输入层包括亮度输入分支和色度输入分支;Build a YUV image recognition model to be trained, the YUV image recognition model to be trained includes an input layer, a prediction layer and an output layer, and the input layer includes a luminance input branch and a chrominance input branch;
    利用训练好的所述RGB图像识别模型使用蒸馏方法训练所述待训练YUV图像识别模型的亮度输入分支、色度输入分支和预测层,得到YUV图像识别模型,所述YUV图像识别模型用于识别YUV数据格式的图像。Use the trained RGB image recognition model to train the luminance input branch, chrominance input branch and prediction layer of the YUV image recognition model to be trained using the distillation method to obtain the YUV image recognition model, and the YUV image recognition model is used for recognition Image in YUV data format.
  9. 根据权利要求8所述的计算机设备,其中,所述处理器执行所述计算机可读指令时执行以下步骤:8. The computer device according to claim 8, wherein the processor executes the following steps when executing the computer readable instruction:
    根据训练好的RGB图像识别模型,获取所述待训练YUV图像识别模型的整体目标损失函数;Obtaining the overall objective loss function of the YUV image recognition model to be trained according to the trained RGB image recognition model;
    通过所述整体目标损失函数对所述待训练YUV图像识别模型的输入层和预测层进行训练,得到所述YUV图像识别模型。The input layer and the prediction layer of the YUV image recognition model to be trained are trained by the overall target loss function to obtain the YUV image recognition model.
  10. 根据权利要求9所述的计算机设备,其中,所述处理器执行所述计算机可读指令时执行以下步骤:The computer device according to claim 9, wherein the processor executes the following steps when executing the computer readable instruction:
    获取所述RGB图像识别模型的软目标;Acquiring the soft target of the RGB image recognition model;
    根据所述RGB图像识别模型的软目标,获取所述待训练YUV图像识别模型的整体目标损失函数。According to the soft target of the RGB image recognition model, the overall target loss function of the YUV image recognition model to be trained is obtained.
  11. 根据权利要求9所述的计算机设备,其中,所述处理器执行所述计算机可读指令时执行以下步骤:The computer device according to claim 9, wherein the processor executes the following steps when executing the computer readable instruction:
    最小化所述整体目标损失函数,以得到所述YUV图像识别模型,所述整体目标损失函数通过学习率调整。The overall target loss function is minimized to obtain the YUV image recognition model, and the overall target loss function is adjusted by a learning rate.
  12. 根据权利要求11所述的计算机设备,其中,所述处理器执行所述计算机可读指令时执行以下步骤:The computer device according to claim 11, wherein the processor executes the following steps when executing the computer readable instruction:
    调整所述亮度输入分支、所述色度输入分支与所述预测层的学习率为第一学习率,进行初步训练;Adjusting the learning rate of the luminance input branch, the chrominance input branch and the prediction layer to the first learning rate, and performing preliminary training;
    调整所述亮度输入分支、所述色度输入分支与所述预测层的学习率为第二学习率,进行精细训练;Adjusting the learning rate of the luminance input branch, the chrominance input branch, and the prediction layer to a second learning rate, and performing fine training;
    调整所述亮度输入分支、所述色度输入分支与所述预测层的学习率为第三学习率,得到所述YUV图像识别模型。The learning rate of the luminance input branch, the chrominance input branch and the prediction layer is adjusted to a third learning rate to obtain the YUV image recognition model.
  13. 一种计算机可读存储介质,其中,所述计算机可读存储介质内存储有计算机可读指令,所述计算机可读指令可被至少一个处理器所执行,以使所述至少一个处理器执行以下步骤:A computer-readable storage medium, wherein computer-readable instructions are stored in the computer-readable storage medium, and the computer-readable instructions can be executed by at least one processor, so that the at least one processor executes the following step:
    创建基于RGB数据格式的图像识别的训练集和验证集;Create training set and verification set for image recognition based on RGB data format;
    利用所述训练集和所述验证集训练RGB图像识别模型,所述训练好的RGB图像识别模型用于训练YUV图像识别模型;Training an RGB image recognition model using the training set and the verification set, and the trained RGB image recognition model is used to train a YUV image recognition model;
    搭建待训练YUV图像识别模型,所述待训练YUV图像识别模型包括输入层,预测层和 输出层,所述输入层包括亮度输入分支和色度输入分支;Build a YUV image recognition model to be trained, the YUV image recognition model to be trained includes an input layer, a prediction layer, and an output layer, and the input layer includes a luminance input branch and a chrominance input branch;
    利用训练好的所述RGB图像识别模型使用蒸馏方法训练所述待训练YUV图像识别模型的亮度输入分支、色度输入分支和预测层,得到YUV图像识别模型,所述YUV图像识别模型用于识别YUV数据格式的图像。Use the trained RGB image recognition model to train the luminance input branch, chrominance input branch and prediction layer of the YUV image recognition model to be trained using the distillation method to obtain the YUV image recognition model, and the YUV image recognition model is used for recognition Image in YUV data format.
  14. 根据权利要求13所述的计算机可读存储介质,其中,所述至少一个处理器执行以下步骤:The computer-readable storage medium according to claim 13, wherein the at least one processor performs the following steps:
    根据训练好的RGB图像识别模型,获取所述待训练YUV图像识别模型的整体目标损失函数;Obtaining the overall objective loss function of the YUV image recognition model to be trained according to the trained RGB image recognition model;
    通过所述整体目标损失函数对所述待训练YUV图像识别模型的输入层和预测层进行训练,得到所述YUV图像识别模型。The input layer and the prediction layer of the YUV image recognition model to be trained are trained by the overall target loss function to obtain the YUV image recognition model.
  15. 根据权利要求14所述的计算机可读存储介质,其中,所述至少一个处理器执行以下步骤:The computer-readable storage medium of claim 14, wherein the at least one processor performs the following steps:
    获取所述RGB图像识别模型的软目标;Acquiring the soft target of the RGB image recognition model;
    根据所述RGB图像识别模型的软目标,获取所述待训练YUV图像识别模型的整体目标损失函数。According to the soft target of the RGB image recognition model, the overall target loss function of the YUV image recognition model to be trained is obtained.
  16. 根据权利要求14所述的计算机可读存储介质,其中,所述至少一个处理器执行以下步骤:The computer-readable storage medium of claim 14, wherein the at least one processor performs the following steps:
    最小化所述整体目标损失函数,以得到所述YUV图像识别模型,所述整体目标损失函数通过学习率调整。The overall target loss function is minimized to obtain the YUV image recognition model, and the overall target loss function is adjusted by a learning rate.
  17. 根据权利要求16所述的计算机可读存储介质,其中,所述至少一个处理器执行以下步骤:The computer-readable storage medium of claim 16, wherein the at least one processor performs the following steps:
    调整所述亮度输入分支、所述色度输入分支与所述预测层的学习率为第一学习率,进行初步训练;Adjusting the learning rate of the luminance input branch, the chrominance input branch and the prediction layer to the first learning rate, and performing preliminary training;
    调整所述亮度输入分支、所述色度输入分支与所述预测层的学习率为第二学习率,进行精细训练;Adjusting the learning rate of the luminance input branch, the chrominance input branch, and the prediction layer to a second learning rate, and performing fine training;
    调整所述亮度输入分支、所述色度输入分支与所述预测层的学习率为第三学习率,得到所述YUV图像识别模型。The learning rate of the luminance input branch, the chrominance input branch and the prediction layer is adjusted to a third learning rate to obtain the YUV image recognition model.
  18. 根据权利要求17所述的计算机可读存储介质,其中,所述至少一个处理器执行以下步骤:The computer-readable storage medium of claim 17, wherein the at least one processor performs the following steps:
    固定所述亮度输入分支,调整所述色度输入分支与所述预测层的学习率为第二学习率,进行精细训练。Fixing the luminance input branch, adjusting the learning rate of the chrominance input branch and the prediction layer to a second learning rate, and performing fine training.
  19. 一种图像识别方法,其中,包括以下步骤:An image recognition method, which includes the following steps:
    获取YUV数据格式的待识别图像;Obtain the image to be recognized in the YUV data format;
    将所述YUV数据格式的待识别图像输入YUV图像识别模型,其中,所述YUV图像识别模型通过所述权利要求1-6任一项所述的图像识别模型训练方法训练得到;Inputting the image to be recognized in the YUV data format into a YUV image recognition model, wherein the YUV image recognition model is obtained by training the image recognition model training method according to any one of claims 1-6;
    通过所述YUV图像识别模型输出所述YUV数据格式的待识别图像的识别结果。Outputting the recognition result of the image to be recognized in the YUV data format through the YUV image recognition model.
  20. 根据权利要求19所述的图像识别方法,其中,所述通过所述YUV图像识别模型输出所述YUV数据格式的待识别图像的识别结果包括:The image recognition method according to claim 19, wherein the output of the recognition result of the image to be recognized in the YUV data format through the YUV image recognition model comprises:
    接收所述YUV数据格式的待识别图像;Receiving the image to be recognized in the YUV data format;
    通过所述YUV图像识别模型的输入层对所述YUV数据格式的待识别图像的色度特征和亮度特征进行提取,经过识别后将图像识别结果通过所述YUV图像识别模型的输出层输出。The chromaticity feature and brightness feature of the image to be recognized in the YUV data format are extracted through the input layer of the YUV image recognition model, and the image recognition result is output through the output layer of the YUV image recognition model after recognition.
PCT/CN2020/093033 2020-02-13 2020-05-28 Method and system for training image recognition model, and image recognition method WO2021159633A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010090927.4A CN111275128B (en) 2020-02-13 2020-02-13 Image recognition model training method and system and image recognition method
CN202010090927.4 2020-02-13

Publications (1)

Publication Number Publication Date
WO2021159633A1 true WO2021159633A1 (en) 2021-08-19

Family

ID=70999464

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/093033 WO2021159633A1 (en) 2020-02-13 2020-05-28 Method and system for training image recognition model, and image recognition method

Country Status (2)

Country Link
CN (1) CN111275128B (en)
WO (1) WO2021159633A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115150370A (en) * 2022-07-05 2022-10-04 广东魅视科技股份有限公司 Image processing method

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115661486B (en) * 2022-12-29 2023-04-07 有米科技股份有限公司 Intelligent image feature extraction method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109426858A (en) * 2017-08-29 2019-03-05 京东方科技集团股份有限公司 Neural network, training method, image processing method and image processing apparatus
US20190114809A1 (en) * 2017-10-12 2019-04-18 Sony Corporation Color leaking suppression in anchor point cloud compression
CN110163237A (en) * 2018-11-08 2019-08-23 腾讯科技(深圳)有限公司 Model training and image processing method, device, medium, electronic equipment
CN110189268A (en) * 2019-05-23 2019-08-30 西安电子科技大学 Underwater picture color correcting method based on GAN network
CN110503613A (en) * 2019-08-13 2019-11-26 电子科技大学 Based on the empty convolutional neural networks of cascade towards removing rain based on single image method
CN110659665A (en) * 2019-08-02 2020-01-07 深圳力维智联技术有限公司 Model construction method of different-dimensional features and image identification method and device

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9633263B2 (en) * 2012-10-09 2017-04-25 International Business Machines Corporation Appearance modeling for object re-identification using weighted brightness transfer functions
CN109815881A (en) * 2019-01-18 2019-05-28 成都旷视金智科技有限公司 Training method, the Activity recognition method, device and equipment of Activity recognition model
CN110188776A (en) * 2019-05-30 2019-08-30 京东方科技集团股份有限公司 Image processing method and device, the training method of neural network, storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109426858A (en) * 2017-08-29 2019-03-05 京东方科技集团股份有限公司 Neural network, training method, image processing method and image processing apparatus
US20190114809A1 (en) * 2017-10-12 2019-04-18 Sony Corporation Color leaking suppression in anchor point cloud compression
CN110163237A (en) * 2018-11-08 2019-08-23 腾讯科技(深圳)有限公司 Model training and image processing method, device, medium, electronic equipment
CN110189268A (en) * 2019-05-23 2019-08-30 西安电子科技大学 Underwater picture color correcting method based on GAN network
CN110659665A (en) * 2019-08-02 2020-01-07 深圳力维智联技术有限公司 Model construction method of different-dimensional features and image identification method and device
CN110503613A (en) * 2019-08-13 2019-11-26 电子科技大学 Based on the empty convolutional neural networks of cascade towards removing rain based on single image method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115150370A (en) * 2022-07-05 2022-10-04 广东魅视科技股份有限公司 Image processing method

Also Published As

Publication number Publication date
CN111275128A (en) 2020-06-12
CN111275128B (en) 2023-08-25

Similar Documents

Publication Publication Date Title
US11514261B2 (en) Image colorization based on reference information
US11967151B2 (en) Video classification method and apparatus, model training method and apparatus, device, and storage medium
CN111797893B (en) Neural network training method, image classification system and related equipment
WO2020253127A1 (en) Facial feature extraction model training method and apparatus, facial feature extraction method and apparatus, device, and storage medium
CN109145759B (en) Vehicle attribute identification method, device, server and storage medium
EP3968179A1 (en) Place recognition method and apparatus, model training method and apparatus for place recognition, and electronic device
US20230154142A1 (en) Fundus color photo image grading method and apparatus, computer device, and storage medium
WO2021027142A1 (en) Picture classification model training method and system, and computer device
CN112200062A (en) Target detection method and device based on neural network, machine readable medium and equipment
WO2021159633A1 (en) Method and system for training image recognition model, and image recognition method
WO2021103731A1 (en) Semantic segmentation method, and model training method and apparatus
US10733481B2 (en) Cloud device, terminal device, and method for classifying images
CN112417947B (en) Method and device for optimizing key point detection model and detecting face key points
US20230021551A1 (en) Using training images and scaled training images to train an image segmentation model
WO2023282569A1 (en) Method and electronic device for generating optimal neural network (nn) model
CN113204659A (en) Label classification method and device for multimedia resources, electronic equipment and storage medium
CN114998679A (en) Online training method, device and equipment for deep learning model and storage medium
WO2024046144A1 (en) Video processing method and related device thereof
CN111126501B (en) Image identification method, terminal equipment and storage medium
WO2022127603A1 (en) Model processing method and related device
CN113326832B (en) Model training method, image processing method, electronic device, and storage medium
CN113743448B (en) Model training data acquisition method, model training method and device
CN117037153A (en) Full-automatic training method, device, equipment and storage medium for object recognition model
CN115457352A (en) Method for acquiring network model, storage medium and electronic device
CN113705600A (en) Feature map determination method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20918575

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20918575

Country of ref document: EP

Kind code of ref document: A1