WO2024036847A1 - Image processing method and apparatus, and electronic device and storage medium - Google Patents

Image processing method and apparatus, and electronic device and storage medium Download PDF

Info

Publication number
WO2024036847A1
WO2024036847A1 PCT/CN2022/139730 CN2022139730W WO2024036847A1 WO 2024036847 A1 WO2024036847 A1 WO 2024036847A1 CN 2022139730 W CN2022139730 W CN 2022139730W WO 2024036847 A1 WO2024036847 A1 WO 2024036847A1
Authority
WO
WIPO (PCT)
Prior art keywords
loss function
model
weight
output
label
Prior art date
Application number
PCT/CN2022/139730
Other languages
French (fr)
Chinese (zh)
Inventor
郭若愚
杜宇宁
赖宝华
于佃海
马艳军
Original Assignee
北京百度网讯科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京百度网讯科技有限公司 filed Critical 北京百度网讯科技有限公司
Publication of WO2024036847A1 publication Critical patent/WO2024036847A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition

Definitions

  • the present disclosure relates to the field of computer technology, and specifically to image processing methods and devices, electronic equipment, storage media, computer program products and computer programs.
  • model training methods based on knowledge distillation have the problem of low model training accuracy.
  • the present disclosure provides image processing methods and devices, electronic devices, storage media, computer program products, and computer programs.
  • an image processing method including:
  • the image to be processed is input into a target image model, and the target image model outputs the processing result of the image to be processed, wherein the target image model is trained through the following steps:
  • the weight includes a first weight and a second weight corresponding to the teacher model
  • determining the weight corresponding to the training sample based on the label of the training sample and n second outputs includes:
  • the first weight is determined based on the target quantity, wherein the first weight is positively related to the target quantity,
  • determining the weight corresponding to the training sample based on the label of the training sample and n second outputs includes:
  • a second weight corresponding to the i-th teacher model is determined, where the second weight is positively related to the target number.
  • an image processing device including:
  • Acquisition module used to obtain images to be processed
  • a processing module configured to input the image to be processed into a target image model, and the target image model to output the processing result of the image to be processed, wherein the target image model is trained through the following modules: a first acquisition module, Used to input training samples into the student model and n teacher models respectively, obtain the first output of the student model, and obtain the second output of the n teacher models, where n is a positive integer;
  • a determination module configured to determine the weight corresponding to the training sample based on the label of the training sample and n second outputs
  • a second acquisition module configured to acquire the total loss function of the student model based on the first output and the weight
  • a training module used to update the model parameters of the student model based on the total loss function to obtain the trained target model
  • the weight includes a first weight and a second weight corresponding to the teacher model
  • the second acquisition module is also used for:
  • the determination module is also used to:
  • the first weight is determined, wherein the first weight is positively related to the target quantity
  • the determination module is also used to:
  • a second weight corresponding to the i-th teacher model is determined, where the second weight is positively related to the target number.
  • an electronic device including: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores information that can be used by the at least one processor. Instructions executed by the processor, the instructions being executed by the at least one processor, so that the at least one processor can execute the image processing method described in any embodiment of the foregoing aspect.
  • a non-transitory computer-readable storage medium storing computer instructions for causing the computer to execute the image described in any embodiment of the foregoing aspect. Approach.
  • a computer program product including a computer program that, when executed by a processor, implements the image processing method described in any embodiment of the foregoing aspect.
  • a computer program including computer program code.
  • the computer program code When the computer program code is run on a computer, it causes the computer to execute the method described in any embodiment of the foregoing aspect. Image processing methods.
  • Figure 1 is a schematic flowchart of a model training method according to the first embodiment of the present disclosure
  • Figure 2 is a schematic flowchart of a model training method according to a second embodiment of the present disclosure
  • Figure 3 is a schematic flowchart of a model training method according to a third embodiment of the present disclosure.
  • Figure 4 is a schematic flowchart of an image processing method according to the first embodiment of the present disclosure
  • Figure 5 is a block diagram of a model training device according to the first embodiment of the present disclosure.
  • FIG. 6 is a block diagram of an image processing device according to the first embodiment of the present disclosure.
  • FIG. 7 is a block diagram of an electronic device used to implement the model training method or the image processing method according to the embodiment of the present disclosure.
  • AI Artificial Intelligence
  • AI technology has the advantages of high automation, high accuracy, and low cost, and has been widely used.
  • DL Deep Learning
  • ML Machine Learning
  • Image Processing refers to the technology of using computers to analyze images to achieve the desired results.
  • Image processing generally refers to digital image processing.
  • Digital image refers to a large two-dimensional array obtained by shooting with industrial cameras, video cameras, scanners and other equipment. The elements of the array are called pixels, and their values are called grayscale values.
  • Image processing technology generally includes three parts: image compression, enhancement and restoration, matching, description and recognition.
  • Computer Vision refers to the use of cameras and computers instead of human eyes to carry out machine vision such as target recognition, tracking and measurement, and further performs graphics processing to make computer processing into images that are more suitable for human eyes to observe or transmit to instruments for detection.
  • Computer vision is a comprehensive discipline, including computer science and engineering, signal processing, physics, applied mathematics and statistics, neurophysiology and cognitive science, etc.
  • Figure 1 is a schematic flowchart of a model training method according to the first embodiment of the present disclosure.
  • the model training method according to the first embodiment of the present disclosure includes: S101-S104.
  • execution subject of the model training method in the embodiment of the present disclosure may be a hardware device with data information processing capabilities and/or the necessary software required to drive the hardware device to work.
  • execution subjects may include workstations, servers, computers, user terminals and other intelligent devices.
  • user terminals include but are not limited to mobile phones, computers, intelligent voice interaction devices, smart home appliances, vehicle-mounted terminals, etc.
  • one student model can correspond to n teacher models, and n is not too limited. In one embodiment, n can be 3 or 5.
  • training samples can be input into the student model and n teacher models respectively, the student model outputs the first output, and the n teacher models output the second output. It can be understood that each teacher model Each outputs a second output, that is, n teacher models can output n second outputs.
  • Training samples have labels.
  • the student model and teacher model are speech models.
  • the speech models include but are not limited to speech recognition models, speech synthesis models, etc.
  • the training samples include sample speech
  • the labels of the training samples include the reference recognition text of the sample speech.
  • the training samples include sample text
  • the labels of the training samples include the reference synthesized speech of the sample text.
  • the student model and teacher model are image models
  • the image models include but are not limited to action recognition models, image classification models, face recognition models, text recognition models, etc.
  • the training samples include sample images
  • the labels of the training samples include reference recognition actions of the sample images.
  • the training samples include sample images
  • the labels of the training samples include the reference general category of the sample image and the reference subcategory of the pixels in the sample image.
  • the training samples include sample images
  • the labels of the training samples include reference face recognition results of the sample images.
  • the reference face recognition results include but are not limited to face position, face width, face height, face number, etc.
  • the training samples include sample images
  • the labels of the training samples include the reference recognition text of the sample images.
  • the student model and teacher model are language models, and language models include but are not limited to text classification models, text segmentation models, etc.
  • the training samples include sample texts
  • the labels of the training samples include reference classification results of the sample texts.
  • the reference classification results include but are not limited to emotion classification results, topic classification results, etc.
  • the training samples include sample texts
  • the labels of the training samples include reference segmentation results of the sample texts.
  • determining the weight corresponding to the training sample based on the label of the training sample and the n second outputs includes obtaining the similarity between the label and the n second outputs, and determining the weight based on the similarity, where the weight Positively related to similarity. In one embodiment, if the similarity between the label and the n second outputs is higher, it indicates that the label is more likely to be correct, and the corresponding weight of the training sample is also greater. On the contrary, if the label and the n second outputs are The lower the similarity, the greater the possibility of label error, and the smaller the weight corresponding to the training sample.
  • determining the weight based on the similarity includes identifying a target setting range in which the similarity is located, and determining the setting weight corresponding to the target setting range as the weight corresponding to the training sample. It can be understood that the similarity can be divided into multiple setting ranges in advance, and different setting ranges correspond to different setting weights.
  • the categories of the total loss function are not too limited.
  • the total loss function includes but is not limited to CE (Cross Entropy, cross entropy), BCE (Binary Cross Entropy, binary cross entropy) wait.
  • the total loss function is positively related to the weights. In one embodiment, if the weight corresponding to the training sample is larger, the total loss function is also larger. On the contrary, if the weight corresponding to the training sample is smaller, the total loss function is also smaller.
  • the similarity between the label and the n second outputs is low, and the corresponding weight of the training sample is also low. At this time, the total loss function obtained is smaller, and the label can be avoided In the wrong case, the total loss function will be larger.
  • a training sample x can be obtained, the training sample x is input into the student model and n teacher models respectively, and the training sample x of the student model is obtained.
  • the first output of n teacher models is obtained for the training sample x.
  • the weight corresponding to the training sample x is determined, based on The student model obtains the total loss function of the student model in the xth training based on the first output of the training sample x and the weight corresponding to the training sample x.
  • x is a positive integer.
  • a training sample set Ax can be obtained during the xth training of the student model.
  • the training sample set Ax includes training samples 1 to m, and the training samples s in the training sample set Ax are input respectively.
  • From the student model and n teacher models obtain the first output of the student model for the training sample s, and obtain the second output of the n teacher models for the training sample s.
  • 1 ⁇ s ⁇ m, s and m are positive integers.
  • the model's loss function for training sample s is based on the student model's loss function for training samples 1 to m, and the total loss function of the student model in the xth training is obtained.
  • obtaining the total loss function of the student model at the xth training time includes calculating the sum of the loss functions of the student model for training samples 1 to m or The average value is determined as the total loss function of the student model in the xth training.
  • updating the model parameters of the student model based on the total loss function includes obtaining gradient information of the total loss function, and updating the model parameters of the student model based on the gradient information.
  • backpropagation can be performed based on gradient information to update model parameters.
  • Update the model parameters of the student model based on the total loss function to obtain the trained target model including updating the model parameters of the student model based on the total loss function, identifying that the model training end conditions are not currently met, and returning to the next training sample to continue.
  • the model parameters of the student model that adjust the model parameters are updated until the model training end conditions are met, and the student model obtained in the last training is determined as the target model.
  • the end conditions of model training are not too limited.
  • the end conditions of model training include but are not limited to the model accuracy reaching the set accuracy threshold, the number of model iterations reaching the set number threshold, and the total loss function. reaches the minimum value etc.
  • the target model is obtained by distillation learning of n teacher models.
  • the target model is small in size, high in accuracy, and requires less computing resources than the teacher model. For example, if a teacher model is deployed on the user terminal, the teacher model deployed on the user terminal can be replaced with the target model, which helps to save storage space and computing resources of the user terminal.
  • the training samples are input into the student model and n teacher models respectively, the first output of the student model is obtained, and the second output of the n teacher models is obtained, based on the training samples labels and n second outputs, determine the weight corresponding to the training sample, obtain the total loss function of the student model based on the first output and weight, update the model parameters of the student model based on the total loss function, and obtain the target model after training .
  • the label of the training sample and the second output of the teacher model can be comprehensively considered to determine the weight corresponding to the training sample, and then based on the first output and weight of the student model, the total loss function of the student model can be obtained, which can avoid training samples
  • the problem of inaccurate total loss function caused by label errors improves the accuracy of the total loss function of the student model, thereby improving the accuracy of model training.
  • the weight corresponding to the training sample includes a first weight and a second weight corresponding to the teacher model. It should be noted that one training sample can correspond to a first weight, and each teacher model can correspond to a second weight, that is, n teacher models can correspond to n second weights.
  • Figure 2 is a schematic flowchart of a model training method according to the second embodiment of the present disclosure.
  • the model training method according to the second embodiment of the present disclosure includes: S201-S207.
  • step S201 The relevant content of step S201 can be found in the above embodiments and will not be described again here.
  • the data format of the label and the data format of the second output may be different.
  • the data format of the label may be text composed of natural language, and the data format of the second output may be composed of non-natural language. text.
  • the data format of the label may be Chinese text, and the data format of the second output may be a vector.
  • the n second outputs may be converted into n second conversion outputs in the data format of the tag, that is, the data format of the n second conversion outputs is the same as the data format of the tag.
  • the labels of the training samples include “road”
  • the second output includes (1,0,0), (0,1,0), (0, 0,1), (1,0,0), (0,1,0), (0,0,1) can be converted into “road”, “landscape” and "building” respectively.
  • step S203 For the relevant content of step S203, please refer to the relevant content of step S102, which will not be described again here.
  • n second outputs can be data converted according to the data format of the tags to obtain n second converted outputs, and based on the tags and the n second converted outputs, the weights, the tags and the n second converted outputs can be determined.
  • the data format of the two converted data is the same, making it easy to determine the weight.
  • obtaining the first loss function of the student model includes obtaining the first template loss function, and substituting the first output, label, and first weight into the first template loss. function to get the first loss function.
  • the first template loss function is not too limited. In one embodiment, the first template loss function can be set in advance.
  • obtaining the first loss function of the student model based on the first output, the label and the first weight includes obtaining the first initial loss function of the student model based on the first output and the label, based on the first initial loss function and the first weight to obtain the first loss function. Therefore, in this method, the first initial loss function can be obtained based on the first output and the label, and the first loss function can be obtained based on the first initial loss function and the first weight.
  • obtaining the first loss function includes obtaining the first product of the first initial loss function and the first weight, and determining the first product as the first loss function.
  • the formula of the first loss function is as follows:
  • Lgt is the first loss function
  • Os is the first output
  • GTx is the label
  • F1(Os,GTx) is the first initial loss function
  • B is the first weight.
  • S205 Obtain the second loss function of the student model based on the first output, n second outputs and the second weight.
  • obtaining the second loss function of the student model includes obtaining the second template loss function, and adding the first output, n second outputs and The second weight is substituted into the second template loss function to obtain the second loss function.
  • the second template loss function is not too limited. In one embodiment, the second template loss function can be preset.
  • obtaining the second loss function of the student model based on the first output, n second outputs and the second weight includes obtaining the student model based on the first output and the second output of the i-th teacher model.
  • the i-th third initial loss function where 1 ⁇ i ⁇ n, i is a positive integer, based on the i-th third initial loss function and the second weight corresponding to the i-th teacher model, obtain the i-th student model n third loss functions, based on n third loss functions of the student model, obtain the second loss function.
  • a third initial loss function can be obtained based on the first output and the second output, and a third loss function can be obtained based on the third initial loss function and the second weight, and based on n third loss functions, Get the second loss function.
  • obtaining the i-th third loss function of the student model includes obtaining the i-th third initial loss function. and the second product of the second weight corresponding to the i-th teacher model, and the second product is determined as the i-th third loss function.
  • obtaining the second loss function based on n third loss functions of the student model includes obtaining an average value of the n third loss functions of the student model, and determining the average value as the second loss function. Therefore, in this method, the average value of n third loss functions can be determined as the second loss function.
  • the formula of the second loss function is as follows:
  • Ldist is the second loss function
  • Os is the first output
  • Oti is the second output of the i-th teacher model
  • 1 ⁇ i ⁇ n is the second output of the i-th teacher model
  • F2(Os,Oti) is the i-th third initial loss function
  • C i is the second weight corresponding to the i-th teacher model
  • F2(Os, Oti) ⁇ C i is the i-th third loss function.
  • obtaining the total loss function based on the first loss function and the second loss function may include performing a weighted sum of the first loss function and the second loss function to obtain the total loss function.
  • the weights of the first loss function and the second loss function are not too limited. In one embodiment, the weights of the first loss function and the second loss function can be preset. Therefore, in this method, the first loss function and the second loss function can be weighted and summed to obtain the total loss function, which improves the flexibility of the total loss function of the student model.
  • the first loss function or the second loss function can also be directly determined as the total loss function.
  • step S207 The relevant content of step S207 can be found in the above embodiments and will not be described again here.
  • the first loss function of the student model can be obtained based on the first output, the label and the first weight
  • the first loss function of the student model can be obtained based on the first output, n second outputs and the second weight.
  • the second loss function of the student model obtains the total loss function based on the first loss function and the second loss function. From this, the first loss function of the student model relative to the label and the second loss function of the student model relative to the teacher model can be comprehensively considered to obtain the total loss function of the student model, which improves the accuracy of the total loss function of the student model. , thereby improving the accuracy of model training.
  • Figure 3 is a schematic flowchart of a model training method according to the third embodiment of the present disclosure.
  • the model training method according to the third embodiment of the present disclosure includes: S301-S311.
  • S301 Input the training samples into the student model and n teacher models respectively, obtain the first output of the student model, and obtain the second output of the n teacher models, where n is a positive integer.
  • step S301 The relevant content of step S301 can be found in the above embodiments and will not be described again here.
  • the target quantity is a natural number, and the value range of the target quantity is [0,n].
  • the target number when the target number is 0, it indicates that the label is inconsistent with the n second outputs.
  • the target number is greater than or equal to 1, it indicates that there is a second output consistent with the label among the n second outputs.
  • the target number is When n, it indicates that the label and n second outputs are consistent.
  • S303 Determine the first weight based on the number of targets, where the first weight is positively related to the number of targets.
  • the value range of the first weight is [0,1].
  • the first weight is positively related to the number of targets. That is, if the number of targets in the second output that is consistent with the label is larger, it indicates that the possibility of the label being correct is greater, and the first weight is also larger. On the contrary, if The smaller the number of targets in the second output that is consistent with the label, the smaller the probability that the label is correct, and the smaller the first weight.
  • the first weight B k/n.
  • k is the number of targets
  • n is the number of teacher models (that is, the number of second outputs).
  • S304 Obtain the first loss function of the student model based on the first output, label and first weight.
  • step S304 The relevant content of step S304 can be found in the above embodiments and will not be described again here.
  • the value range of the second weight corresponding to the i-th teacher model is [0,1), that is, the second weight is Any number less than 1 and greater than or equal to 0.
  • the second weight corresponding to the i-th teacher model is positively related to the target number, that is, if the second output of the i-th teacher model is consistent with the label, The larger the number of targets, the greater the possibility that the label is correct, and the greater the second weight.
  • the number of targets in the second output that is consistent with the label is smaller, it indicates that the possibility of the label being correct is smaller, and the second weight is also The smaller.
  • the second weight Ci k/n corresponding to the i-th teacher model.
  • k is the number of targets
  • n is the number of teacher models (that is, the number of second outputs).
  • steps S309-S311 can be found in the above embodiments and will not be described again here.
  • the first weight can be determined based on the target number of the second output consistent with the label, and when the label is consistent with the second output of the i-th teacher model, the first weight can be determined
  • the second weight corresponding to the i-th teacher model is 1, or, in the case where the label is inconsistent with the second output of the i-th teacher model, the i-th teacher model is determined based on the target number of second outputs consistent with the label. The corresponding second weight.
  • the present disclosure also provides an image processing method.
  • FIG. 4 is a schematic flowchart of an image processing method according to the first embodiment of the present disclosure.
  • the image processing method according to the first embodiment of the present disclosure includes: S401-S402.
  • execution subject of the image processing method in the embodiment of the present disclosure may be a hardware device with data information processing capabilities and/or the necessary software required to drive the hardware device to work.
  • execution subjects may include workstations, servers, computers, user terminals and other intelligent devices.
  • user terminals include but are not limited to mobile phones, computers, intelligent voice interaction devices, smart home appliances, vehicle-mounted terminals, etc.
  • the images to be processed include but are not limited to two-dimensional images, three-dimensional images, etc.
  • the user terminal can obtain the image to be processed from its own storage space, and/or can obtain the image to be processed by shooting with a camera, and/or, Obtain images to be processed from web pages and APPs (Applications).
  • APPs Applications
  • S402 Input the image to be processed into the target image model, and the target image model outputs the processing result of the image to be processed, where the target image model is obtained using a model training method.
  • the target image model can be obtained by using the model training method described in Figures 1 to 3, which will not be described again here.
  • the processing results include but are not limited to action recognition results, image classification results, face recognition results, text recognition results, etc.
  • the target image model is obtained by distillation learning of n teacher models.
  • the target image model is small in size, high in accuracy, and requires less computing resources than the teacher model. For example, if a teacher model is deployed on the user terminal, the teacher model deployed on the user terminal can be replaced with the target image model, which helps to save storage space and computing resources of the user terminal.
  • the image to be processed is input into the target image model, and the target image model outputs the processing result of the image to be processed.
  • the target image model is obtained using a model training method, and the target image model is small in size. , high precision, and small computing resources required, which helps improve image processing performance.
  • model training method of the embodiment of the present disclosure can also be applied to speech models, language models, etc.
  • the speech to be processed can be obtained, the speech to be processed is input into a target speech model, and the target speech model outputs the speech processing result of the speech to be processed.
  • the target speech model is obtained using a model training method, which helps to improve speech processing performance. .
  • the text to be processed can be obtained, the text to be processed is input into the target language model, and the target language model outputs the text processing result of the text to be processed.
  • the target language model is obtained using a model training method, which helps to improve text processing performance. .
  • the collection, storage, use, processing, transmission, provision and disclosure of user personal information are in compliance with relevant laws and regulations and do not violate public order and good customs.
  • the present disclosure also provides a model training device for implementing the above model training method.
  • FIG. 5 is a block diagram of a model training device according to the first embodiment of the present disclosure.
  • the model training device 500 of the embodiment of the present disclosure includes: a first acquisition module 501, a determination module 502, a second acquisition module 503 and a training module 504.
  • the first acquisition module 501 is used to input training samples into the student model and n teacher models respectively, obtain the first output of the student model, and obtain the second output of the n teacher models, where n is positive integer.
  • the determining module 502 is configured to determine the weight corresponding to the training sample based on the label of the training sample and the n second outputs.
  • the second obtaining module 503 is used to obtain the total loss function of the student model based on the first output and the weight.
  • the training module 504 is used to update the model parameters of the student model based on the total loss function to obtain a trained target model.
  • the weight includes a first weight and a second weight corresponding to the teacher model; the second acquisition module 503 is further configured to: based on the first output, the label and the According to the first weight, the first loss function of the student model is obtained; based on the first output, n second outputs and the second weight, the second loss function of the student model is obtained; based on the The first loss function and the second loss function are used to obtain the total loss function.
  • the second acquisition module 503 is further configured to: acquire a first initial loss function of the student model based on the first output and the label; based on the first initial loss function and the first weight to obtain the first loss function.
  • the second acquisition module 503 is further configured to: acquire the i-th third initial loss of the student model based on the first output and the second output of the i-th teacher model. function, where 1 ⁇ i ⁇ n, i is a positive integer; based on the i-th third initial loss function and the second weight corresponding to the i-th teacher model, obtain the i-th student model Three loss functions; based on n third loss functions of the student model, the second loss function is obtained.
  • the second acquisition module 503 is also used to: acquire the average value of n third loss functions of the student model, and determine the average value as the second loss function .
  • the determination module 502 is further configured to: compare the label with n second outputs, and obtain a target number of second outputs consistent with the label; based on the The first weight is determined according to the target quantity, wherein the first weight is positively related to the target quantity.
  • the determination module 502 is further configured to: compare the label with the second output of the i-th teacher model; respond to the The second output is consistent, and the second weight corresponding to the i-th teacher model is determined to be 1; or, in response to the label being inconsistent with the second output of the i-th teacher model, obtaining the second weight that is consistent with the label. 2. The target number of outputs; based on the target number, determine the second weight corresponding to the i-th teacher model, wherein the second weight is positively related to the target number.
  • the determination module 502 is further configured to: convert n second outputs into n second conversion outputs in the data format of the tag according to the data format of the tag; The weight is determined based on the label and n of the second transformation outputs.
  • the second acquisition module 504 is further configured to: perform a weighted sum of the first loss function and the second loss function to obtain the total loss function.
  • the model training device of the embodiment of the present disclosure inputs training samples into the student model and n teacher models respectively, obtains the first output of the student model, and obtains the second output of the n teacher models. Based on the training samples Label and n second outputs, determine the weight corresponding to the training sample, obtain the total loss function of the student model based on the first output and weight, update the model parameters of the student model based on the total loss function, and obtain the target model after training.
  • the label of the training sample and the second output of the teacher model can be comprehensively considered to determine the weight corresponding to the training sample, and then based on the first output and weight of the student model, the total loss function of the student model can be obtained, which can avoid training samples
  • the problem of inaccurate total loss function caused by label errors improves the accuracy of the total loss function of the student model, thereby improving the accuracy of model training.
  • the present disclosure also provides an image processing device for implementing the above image processing method.
  • FIG. 6 is a block diagram of an image processing device according to the first embodiment of the present disclosure.
  • the image processing device 600 includes: an acquisition module 601 and a processing module 602 .
  • the acquisition module 601 is used to acquire images to be processed.
  • the processing module 602 is configured to input the image to be processed into a target image model, and the target image model outputs the processing result of the image to be processed, wherein the target image model is obtained using a model training method.
  • the image processing device of the embodiment of the present disclosure inputs the image to be processed into the target image model, and the target image model outputs the processing result of the image to be processed.
  • the target image model is obtained by using a model training method.
  • the target image model is small in size and It has high accuracy and requires small computing resources, which helps improve image processing performance.
  • the present disclosure also provides an electronic device, a readable storage medium, a computer program product, and a computer program.
  • an electronic device includes: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions that can be executed by the at least one processor, so The instructions are executed by the at least one processor, so that the at least one processor can execute the image processing method described in any of the preceding embodiments.
  • FIG. 7 illustrates a schematic block diagram of an example electronic device 700 that may be used to implement embodiments of the present disclosure.
  • Electronic devices are intended to refer to various forms of digital computers, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers.
  • Electronic devices may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices, and other similar computing devices.
  • the components shown herein, their connections and relationships, and their functions are examples only and are not intended to limit implementations of the disclosure described and/or claimed herein.
  • the electronic device 700 includes a computing unit 701 that can perform calculations according to a computer program stored in a read-only memory (ROM) 702 or loaded from a storage unit 708 into a random access memory (RAM) 703 . Perform various appropriate actions and processing. In the RAM 703, various programs and data required for the operation of the electronic device 700 can also be stored.
  • Computing unit 701, ROM 702 and RAM 703 are connected to each other via bus 704.
  • An input/output (I/O) interface 705 is also connected to bus 704.
  • the I/O interface 705 includes: an input unit 706, such as a keyboard, a mouse, etc.; an output unit 707, such as various types of displays, speakers, etc.; a storage unit 708, such as a magnetic disk, an optical disk, etc. etc.; and a communication unit 709, such as a network card, modem, wireless communication transceiver, etc.
  • the communication unit 709 allows the electronic device 700 to exchange information/data with other devices through a computer network such as the Internet and/or various telecommunications networks.
  • Computing unit 701 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 701 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various dedicated artificial intelligence (AI) computing chips, various computing units running machine learning model algorithms, digital signal processing processor (DSP), and any appropriate processor, controller, microcontroller, etc.
  • the computing unit 701 performs various methods and processes described above, such as the model training method described in FIGS. 1 to 3 , such as the image processing method described in FIG. 4 .
  • the model training method or the image processing method may be implemented as a computer software program, which is tangibly included in a machine-readable medium, such as the storage unit 708.
  • part or all of the computer program may be loaded and/or installed onto electronic device 700 via ROM 702 and/or communication unit 709 .
  • the computer program When the computer program is loaded into the RAM 703 and executed by the computing unit 701, one or more steps of the model training method described above may be performed, or one or more steps of the image processing method described above may be performed.
  • the computing unit 701 may be configured to perform the model training method in any other suitable manner (eg, by means of firmware), or configured to perform the image processing method.
  • Various implementations of the systems and techniques described above may be implemented in digital electronic circuit systems, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on a chip implemented in a system (SOC), load programmable logic device (CPLD), computer hardware, firmware, software, and/or a combination thereof.
  • FPGAs field programmable gate arrays
  • ASICs application specific integrated circuits
  • ASSPs application specific standard products
  • SOC system
  • CPLD load programmable logic device
  • computer hardware firmware, software, and/or a combination thereof.
  • These various embodiments may include implementation in one or more computer programs executable and/or interpreted on a programmable system including at least one programmable processor, the programmable processor
  • the processor which may be a special purpose or general purpose programmable processor, may receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device.
  • An output device may be a special purpose or general purpose programmable processor, may receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device.
  • An output device may be a special purpose or general purpose programmable processor, may receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device.
  • Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general-purpose computer, special-purpose computer, or other programmable data processing device, such that the program codes, when executed by the processor or controller, cause the functions specified in the flowcharts and/or block diagrams/ The operation is implemented.
  • the program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
  • the storage medium is a non-transitory computer-readable storage medium storing computer instructions, and the computer instructions are used to cause the computer to execute the image processing method described in any of the foregoing embodiments.
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with an instruction execution system, apparatus, or device.
  • the machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, laptop disks, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM portable compact disk read-only memory
  • magnetic storage device or any suitable combination of the above.
  • a computer program product includes a computer program that, when executed by a processor, implements the image processing method described in any of the foregoing embodiments.
  • the computer program includes computer program code.
  • the computer program code When the computer program code is run on a computer, it causes the computer to execute the image processing method described in any of the foregoing embodiments.
  • the systems and techniques described herein may be implemented on a computer having a display device (eg, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user ); and a keyboard and pointing device (eg, a mouse or a trackball) through which a user can provide input to the computer.
  • a display device eg, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor
  • a keyboard and pointing device eg, a mouse or a trackball
  • Other kinds of devices may also be used to provide interaction with the user; for example, the feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and may be provided in any form, including Acoustic input, voice input or tactile input) to receive input from the user.
  • the systems and techniques described herein may be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., A user's computer having a graphical user interface or web browser through which the user can interact with implementations of the systems and technologies described herein), or including such backend components, middleware components, or any combination of front-end components in a computing system.
  • the components of the system may be interconnected by any form or medium of digital data communication (eg, a communications network). Examples of communication networks include: local area network (LAN), wide area network (WAN), and the Internet.
  • Computer systems may include clients and servers. Clients and servers are generally remote from each other and typically interact over a communications network. The relationship of client and server is created by computer programs running on corresponding computers and having a client-server relationship with each other.
  • the server can be a cloud server, also known as cloud computing server or cloud host. It is a host product in the cloud computing service system to solve the problem of traditional physical host and VPS service ("Virtual Private Server", or "VPS" for short) Among them, there are defects such as difficult management and weak business scalability.
  • the server can also be a distributed system server or a server combined with a blockchain.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Image Analysis (AREA)
  • Apparatus For Radiation Diagnosis (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Testing Of Coins (AREA)

Abstract

Provided are an image processing method and apparatus, and an electronic device, a storage medium, a computer program product and a computer program. The image processing method comprises: acquiring an image to be processed; and inputting said image into a target image model, such that the target image model outputs a processing result of said image.

Description

图像处理方法和装置、电子设备和存储介质Image processing methods and devices, electronic equipment and storage media
相关申请的交叉引用Cross-references to related applications
本申请要求在2022年8月16日在中国提交的中国专利申请号202210981983.6的优先权,其全部内容通过引用并入本文。This application claims priority from Chinese Patent Application No. 202210981983.6 filed in China on August 16, 2022, the entire content of which is incorporated herein by reference.
技术领域Technical field
本公开涉及计算机技术领域,具体涉及图像处理方法和装置、电子设备、存储介质、计算机程序产品和计算机程序。The present disclosure relates to the field of computer technology, and specifically to image processing methods and devices, electronic equipment, storage media, computer program products and computer programs.
背景技术Background technique
目前,随着人工智能技术的不断发展,模型在图像、文本、语音等领域得到了广泛应用,具有自动化程度高、人工成本低等优点。为了满足模型预测需求,模型体积往往较大,为了压缩模型,可基于知识蒸馏来训练模型。然而,相关技术中,基于知识蒸馏的模型训练方法,存在模型训练精度低的问题。At present, with the continuous development of artificial intelligence technology, models have been widely used in fields such as images, text, and speech, and have the advantages of high automation and low labor costs. In order to meet the needs of model prediction, the model volume is often large. In order to compress the model, the model can be trained based on knowledge distillation. However, in related technologies, model training methods based on knowledge distillation have the problem of low model training accuracy.
发明内容Contents of the invention
本公开提供了图像处理方法和装置、电子设备、存储介质、计算机程序产品和计算机程序。The present disclosure provides image processing methods and devices, electronic devices, storage media, computer program products, and computer programs.
根据本公开一方面的实施例,提供了一种图像处理方法,包括:According to an embodiment of one aspect of the present disclosure, an image processing method is provided, including:
获取待处理图像;和Get the image to be processed; and
将所述待处理图像输入目标图像模型中,由所述目标图像模型输出所述待处理图像的处理结果,其中,通过下述步骤训练所述目标图像模型:The image to be processed is input into a target image model, and the target image model outputs the processing result of the image to be processed, wherein the target image model is trained through the following steps:
将训练样本分别输入至学生模型和n个教师模型中,获取所述学生模型的第一输出,并获取n个所述教师模型的第二输出,其中,n为正整数;Input the training samples into the student model and n teacher models respectively, obtain the first output of the student model, and obtain the second output of the n teacher models, where n is a positive integer;
基于所述训练样本的标签和n个所述第二输出,确定所述训练样本对应的权重;Based on the label of the training sample and n second outputs, determine the weight corresponding to the training sample;
基于所述第一输出和所述权重,获取所述学生模型的总损失函数;和Obtaining a total loss function of the student model based on the first output and the weight; and
基于所述总损失函数对所述学生模型的模型参数进行更新,得到训练后的目标模型,Update the model parameters of the student model based on the total loss function to obtain the trained target model,
其中,所述权重包括第一权重和所述教师模型对应的第二权重;Wherein, the weight includes a first weight and a second weight corresponding to the teacher model;
其中,所述基于所述第一输出和所述权重,获取所述学生模型的总损失函数,wherein the total loss function of the student model is obtained based on the first output and the weight,
包括:include:
基于所述第一输出、所述标签和所述第一权重,获取所述学生模型的第一损失函数;Obtain a first loss function of the student model based on the first output, the label, and the first weight;
基于所述第一输出、n个所述第二输出和所述第二权重,获取所述学生模型的第二损失函数;和Obtaining a second loss function of the student model based on the first output, n second outputs and the second weight; and
基于所述第一损失函数和所述第二损失函数,获取所述总损失函数,Based on the first loss function and the second loss function, the total loss function is obtained,
其中,所述基于所述训练样本的标签和n个所述第二输出,确定所述训练样本对应的权重,包括:Wherein, determining the weight corresponding to the training sample based on the label of the training sample and n second outputs includes:
将所述标签和n个所述第二输出进行比对,获取与所述标签一致的第二输出的目标数量;和Compare the label with n second outputs to obtain a target number of second outputs consistent with the label; and
基于所述目标数量,确定所述第一权重,其中,所述第一权重与所述目标数量正相关,The first weight is determined based on the target quantity, wherein the first weight is positively related to the target quantity,
其中,所述基于所述训练样本的标签和n个所述第二输出,确定所述训练样本对应的权重,包括:Wherein, determining the weight corresponding to the training sample based on the label of the training sample and n second outputs includes:
将所述标签和第i个教师模型的第二输出进行比对;Compare the label with the second output of the i-th teacher model;
响应于所述标签与所述第i个教师模型的第二输出一致,确定所述第i个教师模型对应的第二权重为1;或者,In response to the label being consistent with the second output of the i-th teacher model, determine the second weight corresponding to the i-th teacher model to be 1; or,
响应于所述标签与所述第i个教师模型的第二输出不一致,获取与所述标签一致的第二输出的目标数量;In response to the label being inconsistent with the second output of the i-th teacher model, obtaining a target number of second outputs that are consistent with the label;
基于所述目标数量,确定所述第i个教师模型对应的第二权重,其中,所述第二权重与所述目标数量正相关。Based on the target number, a second weight corresponding to the i-th teacher model is determined, where the second weight is positively related to the target number.
根据本公开另一方面的实施例,提供了一种图像处理装置,包括:According to an embodiment of another aspect of the present disclosure, an image processing device is provided, including:
获取模块,用于获取待处理图像;Acquisition module, used to obtain images to be processed;
处理模块,用于将所述待处理图像输入目标图像模型中,由所述目标图像模型输出所述待处理图像的处理结果,其中,通过以下模块训练所述目标图像模型:第一获取模块,用于将训练样本分别输入至学生模型和n个教师模型中,获取所述学生模型的第一输出,并获取n个所述教师模型的第二输出,其中,n为正整数;A processing module, configured to input the image to be processed into a target image model, and the target image model to output the processing result of the image to be processed, wherein the target image model is trained through the following modules: a first acquisition module, Used to input training samples into the student model and n teacher models respectively, obtain the first output of the student model, and obtain the second output of the n teacher models, where n is a positive integer;
确定模块,用于基于所述训练样本的标签和n个所述第二输出,确定所述训练样本对应的权重;A determination module configured to determine the weight corresponding to the training sample based on the label of the training sample and n second outputs;
第二获取模块,用于基于所述第一输出和所述权重,获取所述学生模型的总损失函数;和A second acquisition module, configured to acquire the total loss function of the student model based on the first output and the weight; and
训练模块,用于基于所述总损失函数对所述学生模型的模型参数进行更新,得 到训练后的目标模型,A training module, used to update the model parameters of the student model based on the total loss function to obtain the trained target model,
其中,所述权重包括第一权重和所述教师模型对应的第二权重;Wherein, the weight includes a first weight and a second weight corresponding to the teacher model;
其中,所述第二获取模块,还用于:Wherein, the second acquisition module is also used for:
基于所述第一输出、所述标签和所述第一权重,获取所述学生模型的第一损失函数;Obtain a first loss function of the student model based on the first output, the label, and the first weight;
基于所述第一输出、n个所述第二输出和所述第二权重,获取所述学生模型的第二损失函数;Obtain a second loss function of the student model based on the first output, n second outputs and the second weight;
基于所述第一损失函数和所述第二损失函数,获取所述总损失函数;Based on the first loss function and the second loss function, obtain the total loss function;
其中,所述确定模块,还用于:Among them, the determination module is also used to:
将所述标签和n个所述第二输出进行比对,获取与所述标签一致的第二输出的目标数量;Compare the label with n second outputs to obtain a target number of second outputs consistent with the label;
基于所述目标数量,确定所述第一权重,其中,所述第一权重与所述目标数量正相关;Based on the target quantity, the first weight is determined, wherein the first weight is positively related to the target quantity;
其中,所述确定模块,还用于:Among them, the determination module is also used to:
将所述标签和第i个教师模型的第二输出进行比对;Compare the label with the second output of the i-th teacher model;
响应于所述标签与所述第i个教师模型的第二输出一致,确定所述第i个教师模型对应的第二权重为1;或者,In response to the label being consistent with the second output of the i-th teacher model, determine the second weight corresponding to the i-th teacher model to be 1; or,
响应于所述标签与所述第i个教师模型的第二输出不一致,获取与所述标签一致的第二输出的目标数量;In response to the label being inconsistent with the second output of the i-th teacher model, obtaining a target number of second outputs that are consistent with the label;
基于所述目标数量,确定所述第i个教师模型对应的第二权重,其中,所述第二权重与所述目标数量正相关。Based on the target number, a second weight corresponding to the i-th teacher model is determined, where the second weight is positively related to the target number.
根据本公开另一方面的实施例,提供了一种电子设备,包括:至少一个处理器;以及与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行前述一方面任一实施例中所述的图像处理方法。According to an embodiment of another aspect of the present disclosure, an electronic device is provided, including: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores information that can be used by the at least one processor. Instructions executed by the processor, the instructions being executed by the at least one processor, so that the at least one processor can execute the image processing method described in any embodiment of the foregoing aspect.
根据本公开另一方面的实施例,提供了一种存储有计算机指令的非瞬时计算机可读存储介质,所述计算机指令用于使所述计算机执行前述一方面任一实施例中所述的图像处理方法。According to an embodiment of another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to execute the image described in any embodiment of the foregoing aspect. Approach.
根据本公开另一方面的实施例,提供了一种计算机程序产品,包括计算机程序,所述计算机程序在被处理器执行时实现前述一方面任一实施例中所述的图像处理方法。According to an embodiment of another aspect of the present disclosure, a computer program product is provided, including a computer program that, when executed by a processor, implements the image processing method described in any embodiment of the foregoing aspect.
根据本公开另一方面的实施例,提供了一种计算机程序,包括计算机程序代码,当所述计算机程序代码在计算机上运行时,使得所述计算机执行前述一方面任一实施例中所述的图像处理方法。According to an embodiment of another aspect of the present disclosure, a computer program is provided, including computer program code. When the computer program code is run on a computer, it causes the computer to execute the method described in any embodiment of the foregoing aspect. Image processing methods.
应当理解,本部分所描述的内容并非旨在标识本公开的实施例的关键或重要特征,也不用于限制本公开的范围。本公开的其它特征将通过以下的说明书而变得容易理解。It should be understood that what is described in this section is not intended to identify key or important features of the embodiments of the disclosure, nor is it intended to limit the scope of the disclosure. Other features of the present disclosure will become readily understood from the following description.
附图说明Description of drawings
附图用于更好地理解本方案,不构成对本公开的限定。其中:The accompanying drawings are used to better understand the present solution and do not constitute a limitation of the present disclosure. in:
图1是根据本公开第一实施例的模型训练方法的流程示意图;Figure 1 is a schematic flowchart of a model training method according to the first embodiment of the present disclosure;
图2是根据本公开第二实施例的模型训练方法的流程示意图;Figure 2 is a schematic flowchart of a model training method according to a second embodiment of the present disclosure;
图3是根据本公开第三实施例的模型训练方法的流程示意图;Figure 3 is a schematic flowchart of a model training method according to a third embodiment of the present disclosure;
图4是根据本公开第一实施例的图像处理方法的流程示意图;Figure 4 is a schematic flowchart of an image processing method according to the first embodiment of the present disclosure;
图5是根据本公开第一实施例的模型训练装置的框图;Figure 5 is a block diagram of a model training device according to the first embodiment of the present disclosure;
图6是根据本公开第一实施例的图像处理装置的框图;FIG. 6 is a block diagram of an image processing device according to the first embodiment of the present disclosure;
图7是用来实现本公开实施例的模型训练方法或者图像处理方法的电子设备的框图。FIG. 7 is a block diagram of an electronic device used to implement the model training method or the image processing method according to the embodiment of the present disclosure.
具体实施方式Detailed ways
以下结合附图对本公开的示范性实施例做出说明,其中包括本公开实施例的各种细节以助于理解,应当将它们认为仅仅是示范性的。因此,本领域普通技术人员应当认识到,可以对这里描述的实施例做出各种改变和修改,而不会背离本公开的范围和精神。同样,为了清楚和简明,以下的描述中省略了对公知功能和结构的描述。Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the present disclosure are included to facilitate understanding and should be considered to be exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications can be made to the embodiments described herein without departing from the scope and spirit of the disclosure. Also, descriptions of well-known functions and constructions are omitted from the following description for clarity and conciseness.
AI(Artificial Intelligence,人工智能)是研究、开发用于模拟、延伸和扩展人的智能的理论、方法、技术及应用系统的一门技术科学。目前,AI技术具有自动化程度高、精确度高、成本低的优点,得到了广泛的应用。AI (Artificial Intelligence) is a technical science that studies and develops theories, methods, technologies and application systems for simulating, extending and expanding human intelligence. Currently, AI technology has the advantages of high automation, high accuracy, and low cost, and has been widely used.
DL(Deep Learning,深度学习)是ML(Machine Learning,机器学习)领域中一个新的研究方向,是学习样本数据的内在规律和表示层次,使得机器能够像人一样具有分析学习能力,能够识别文字、图像和声音等数据的一门科学,广泛应用于语音和图像识别。DL (Deep Learning) is a new research direction in the field of ML (Machine Learning). It is to learn the inherent laws and representation levels of sample data, so that the machine can analyze and learn like humans and can recognize text. , a science of data such as images and sounds, and is widely used in speech and image recognition.
图像处理(Image Processing)是指用计算机对图像进行分析,以达到所需结果的技术。图像处理一般指数字图像处理,数字图像是指用工业相机、摄像机、扫描仪等设备经过拍摄得到的一个大的二维数组,该数组的元素称为像素,其值称为灰度值。图像处理技术一般包括图像压缩,增强和复原,匹配、描述和识别3个部分。Image Processing refers to the technology of using computers to analyze images to achieve the desired results. Image processing generally refers to digital image processing. Digital image refers to a large two-dimensional array obtained by shooting with industrial cameras, video cameras, scanners and other equipment. The elements of the array are called pixels, and their values are called grayscale values. Image processing technology generally includes three parts: image compression, enhancement and restoration, matching, description and recognition.
计算机视觉(Computer Vision)是指用摄影机和电脑代替人眼对目标进行识别、跟踪和测量等机器视觉,并进一步做图形处理,使电脑处理成为更适合人眼观察或传 送给仪器检测的图像。计算机视觉是一门综合性的学科,包括计算机科学和工程、信号处理、物理学、应用数学和统计学,神经生理学和认知科学等。Computer Vision refers to the use of cameras and computers instead of human eyes to carry out machine vision such as target recognition, tracking and measurement, and further performs graphics processing to make computer processing into images that are more suitable for human eyes to observe or transmit to instruments for detection. Computer vision is a comprehensive discipline, including computer science and engineering, signal processing, physics, applied mathematics and statistics, neurophysiology and cognitive science, etc.
图1是根据本公开第一实施例的模型训练方法的流程示意图。Figure 1 is a schematic flowchart of a model training method according to the first embodiment of the present disclosure.
如图1所示,本公开第一实施例的模型训练方法,包括:S101-S104。As shown in Figure 1, the model training method according to the first embodiment of the present disclosure includes: S101-S104.
S101,将训练样本分别输入至学生模型和n个教师模型中,获取学生模型的第一输出,并获取n个教师模型的第二输出,其中,n为正整数。S101. Input the training samples into the student model and n teacher models respectively, obtain the first output of the student model, and obtain the second output of the n teacher models, where n is a positive integer.
需要说明的是,本公开实施例的模型训练方法的执行主体可为具有数据信息处理能力的硬件设备和/或驱动该硬件设备工作所需必要的软件。在一些实施例中,执行主体可包括工作站、服务器,计算机、用户终端及其他智能设备。其中,用户终端包括但不限于手机、电脑、智能语音交互设备、智能家电、车载终端等。It should be noted that the execution subject of the model training method in the embodiment of the present disclosure may be a hardware device with data information processing capabilities and/or the necessary software required to drive the hardware device to work. In some embodiments, execution subjects may include workstations, servers, computers, user terminals and other intelligent devices. Among them, user terminals include but are not limited to mobile phones, computers, intelligent voice interaction devices, smart home appliances, vehicle-mounted terminals, etc.
本公开的实施例中,一个学生模型可对应n个教师模型,对n不做过多限定,在一个实施例中,n可为3、5。In the embodiment of the present disclosure, one student model can correspond to n teacher models, and n is not too limited. In one embodiment, n can be 3 or 5.
本公开的实施例中,可将训练样本分别输入至学生模型和n个教师模型中,由学生模型输出第一输出,由n个教师模型输出第二输出,可以理解的是,每个教师模型均输出一个第二输出,即n个教师模型可输出n个第二输出。In the embodiment of the present disclosure, training samples can be input into the student model and n teacher models respectively, the student model outputs the first output, and the n teacher models output the second output. It can be understood that each teacher model Each outputs a second output, that is, n teacher models can output n second outputs.
需要说明的是,对训练样本、学生模型、教师模型均不做过多限定。训练样本具有标签。It should be noted that there are no excessive restrictions on the training samples, student models, and teacher models. Training samples have labels.
在一个实施例中,学生模型、教师模型为语音模型,语音模型包括但不限于语音识别模型、语音合成模型等。In one embodiment, the student model and teacher model are speech models. The speech models include but are not limited to speech recognition models, speech synthesis models, etc.
在一个实施例中,在学生模型、教师模型为语音识别模型时,训练样本包括样本语音,训练样本的标签包括样本语音的参考识别文本。In one embodiment, when the student model and the teacher model are speech recognition models, the training samples include sample speech, and the labels of the training samples include the reference recognition text of the sample speech.
在一个实施例中,在学生模型、教师模型为语音合成模型时,训练样本包括样本文本,训练样本的标签包括样本文本的参考合成语音。In one embodiment, when the student model and the teacher model are speech synthesis models, the training samples include sample text, and the labels of the training samples include the reference synthesized speech of the sample text.
在一个实施例中,学生模型、教师模型为图像模型,图像模型包括但不限于动作识别模型、图像分类模型、人脸识别模型、文本识别模型等。In one embodiment, the student model and teacher model are image models, and the image models include but are not limited to action recognition models, image classification models, face recognition models, text recognition models, etc.
在一个实施例中,在学生模型、教师模型为动作识别模型时,训练样本包括样本图像,训练样本的标签包括样本图像的参考识别动作。In one embodiment, when the student model and the teacher model are action recognition models, the training samples include sample images, and the labels of the training samples include reference recognition actions of the sample images.
在一个实施例中,在学生模型、教师模型为图像分类模型时,训练样本包括样本图像,训练样本的标签包括样本图像的参考总类别、样本图像中的像素点的参考子类别。In one embodiment, when the student model and the teacher model are image classification models, the training samples include sample images, and the labels of the training samples include the reference general category of the sample image and the reference subcategory of the pixels in the sample image.
在一个实施例中,在学生模型、教师模型为人脸识别模型时,训练样本包括样本图像,训练样本的标签包括样本图像的参考识别人脸结果。其中,参考识别人脸结果 包括但不限于人脸位置、人脸宽度、人脸高度、人脸数量等。In one embodiment, when the student model and the teacher model are face recognition models, the training samples include sample images, and the labels of the training samples include reference face recognition results of the sample images. Among them, the reference face recognition results include but are not limited to face position, face width, face height, face number, etc.
在一个实施例中,在学生模型、教师模型为文本识别模型时,训练样本包括样本图像,训练样本的标签包括样本图像的参考识别文本。In one embodiment, when the student model and the teacher model are text recognition models, the training samples include sample images, and the labels of the training samples include the reference recognition text of the sample images.
在一个实施例中,学生模型、教师模型为语言模型,语言模型包括但不限于文本分类模型、文本切分模型等。In one embodiment, the student model and teacher model are language models, and language models include but are not limited to text classification models, text segmentation models, etc.
在一个实施例中,在学生模型、教师模型为文本分类模型时,训练样本包括样本文本,训练样本的标签包括样本文本的参考分类结果。其中,参考分类结果包括但不限于情感分类结果、主题分类结果等。In one embodiment, when the student model and the teacher model are text classification models, the training samples include sample texts, and the labels of the training samples include reference classification results of the sample texts. Among them, the reference classification results include but are not limited to emotion classification results, topic classification results, etc.
在一个实施例中,在学生模型、教师模型为文本切分模型时,训练样本包括样本文本,训练样本的标签包括样本文本的参考切分结果。In one embodiment, when the student model and the teacher model are text segmentation models, the training samples include sample texts, and the labels of the training samples include reference segmentation results of the sample texts.
S102,基于训练样本的标签和n个第二输出,确定训练样本对应的权重。S102: Determine the weight corresponding to the training sample based on the label of the training sample and the n second outputs.
可以理解的是,不同的训练样本,可对应不同的权重。权重的取值范围为[0,1]。It is understandable that different training samples can correspond to different weights. The value range of weight is [0,1].
在一种实施方式中,基于训练样本的标签和n个第二输出,确定训练样本对应的权重,包括获取标签和n个第二输出之间的相似度,基于相似度确定权重,其中,权重与相似度正相关。在一个实施例中,若标签和n个第二输出之间的相似度越高,表明标签正确的可能性越大,训练样本对应的权重也越大,反之,若标签和n个第二输出之间的相似度越低,表明标签错误的可能性越大,训练样本对应的权重也越小。In one implementation, determining the weight corresponding to the training sample based on the label of the training sample and the n second outputs includes obtaining the similarity between the label and the n second outputs, and determining the weight based on the similarity, where the weight Positively related to similarity. In one embodiment, if the similarity between the label and the n second outputs is higher, it indicates that the label is more likely to be correct, and the corresponding weight of the training sample is also greater. On the contrary, if the label and the n second outputs are The lower the similarity, the greater the possibility of label error, and the smaller the weight corresponding to the training sample.
在一种实施方式中,基于相似度确定权重,包括识别相似度处于的目标设定范围,将目标设定范围对应的设定权重确定为训练样本对应的权重。可以理解的是,可预先将相似度划分为多个设定范围,不同的设定范围对应不同的设定权重。In one implementation, determining the weight based on the similarity includes identifying a target setting range in which the similarity is located, and determining the setting weight corresponding to the target setting range as the weight corresponding to the training sample. It can be understood that the similarity can be divided into multiple setting ranges in advance, and different setting ranges correspond to different setting weights.
S103,基于第一输出和权重,获取学生模型的总损失函数。S103. Based on the first output and weight, obtain the total loss function of the student model.
需要说明的是,对总损失函数的类别不做过多限定,在一个实施例中,总损失函数包括但不限于CE(Cross Entropy,交叉熵)、BCE(Binary Cross Entropy,二值交叉熵)等。It should be noted that the categories of the total loss function are not too limited. In one embodiment, the total loss function includes but is not limited to CE (Cross Entropy, cross entropy), BCE (Binary Cross Entropy, binary cross entropy) wait.
在一种实施方式中,总损失函数与权重正相关。在一个实施例中,若训练样本对应的权重越大,总损失函数也越大,反之,若训练样本对应的权重越小,总损失函数也越小。In one implementation, the total loss function is positively related to the weights. In one embodiment, if the weight corresponding to the training sample is larger, the total loss function is also larger. On the contrary, if the weight corresponding to the training sample is smaller, the total loss function is also smaller.
在一个实施例中,在标签错误的情况下,标签和n个第二输出之间的相似度较低,训练样本对应的权重也较低,此时得到的总损失函数较小,可避免标签错误的情况下导致总损失函数较大的问题。In one embodiment, when the label is wrong, the similarity between the label and the n second outputs is low, and the corresponding weight of the training sample is also low. At this time, the total loss function obtained is smaller, and the label can be avoided In the wrong case, the total loss function will be larger.
在一种实施方式中,在对学生模型进行第x次训练的过程中,可获取一个训练样本x,将训练样本x分别输入至学生模型和n个教师模型中,获取学生模型针对训练样 本x的第一输出,并获取n个教师模型针对训练样本x的第二输出,基于训练样本x的标签y和n个教师模型针对训练样本x的第二输出,确定训练样本x对应的权重,基于学生模型针对训练样本x的第一输出和训练样本x对应的权重,获取学生模型在第x次训练时的总损失函数。其中,x为正整数。In one implementation, during the x-th training process of the student model, a training sample x can be obtained, the training sample x is input into the student model and n teacher models respectively, and the training sample x of the student model is obtained. The first output of n teacher models is obtained for the training sample x. Based on the label y of the training sample x and the second output of the n teacher models for the training sample x, the weight corresponding to the training sample x is determined, based on The student model obtains the total loss function of the student model in the xth training based on the first output of the training sample x and the weight corresponding to the training sample x. Among them, x is a positive integer.
在一种实施方式中,在对学生模型进行第x次训练的过程中,可获取训练样本集Ax,训练样本集Ax包括训练样本1至m,将训练样本集Ax中的训练样本s分别输入至学生模型和n个教师模型中,获取学生模型针对训练样本s的第一输出,并获取n个教师模型针对训练样本s的第二输出。其中,1≤s≤m,s、m为正整数。In one implementation, during the xth training of the student model, a training sample set Ax can be obtained. The training sample set Ax includes training samples 1 to m, and the training samples s in the training sample set Ax are input respectively. From the student model and n teacher models, obtain the first output of the student model for the training sample s, and obtain the second output of the n teacher models for the training sample s. Among them, 1≤s≤m, s and m are positive integers.
基于训练样本s的标签ys和n个教师模型针对训练样本s的第二输出,确定训练样本s对应的权重,基于学生模型针对训练样本s的第一输出和训练样本s对应的权重,获取学生模型针对训练样本s的损失函数,基于学生模型针对训练样本1至m的损失函数,获取学生模型在第x次训练时的总损失函数。Based on the label ys of the training sample s and the second output of the n teacher models for the training sample s, determine the weight corresponding to the training sample s. Based on the first output of the student model for the training sample s and the weight corresponding to the training sample s, obtain the student The model's loss function for training sample s is based on the student model's loss function for training samples 1 to m, and the total loss function of the student model in the xth training is obtained.
在一个实施例中,基于学生模型针对训练样本1至m的损失函数,获取学生模型在第x次训练时的总损失函数,包括将学生模型针对训练样本1至m的损失函数的和值或者平均值,确定为学生模型在第x次训练时的总损失函数。In one embodiment, based on the loss function of the student model for training samples 1 to m, obtaining the total loss function of the student model at the xth training time includes calculating the sum of the loss functions of the student model for training samples 1 to m or The average value is determined as the total loss function of the student model in the xth training.
S104,基于总损失函数对学生模型的模型参数进行更新,得到训练后的目标模型。S104. Update the model parameters of the student model based on the total loss function to obtain the trained target model.
在一种实施方式中,基于总损失函数对学生模型的模型参数进行更新,包括获取总损失函数的梯度信息,根据梯度信息对学生模型的模型参数进行更新。在一个实施例中,可根据梯度信息进行反向传播,以对模型参数进行更新。In one implementation, updating the model parameters of the student model based on the total loss function includes obtaining gradient information of the total loss function, and updating the model parameters of the student model based on the gradient information. In one embodiment, backpropagation can be performed based on gradient information to update model parameters.
在一种实施方式中,训练样本可为多个。基于总损失函数对学生模型的模型参数进行更新,得到训练后的目标模型,包括基于总损失函数对学生模型的模型参数进行更新,识别当前未满足模型训练结束条件,返回采用下一个训练样本继续对调整模型参数的学生模型的模型参数进行更新,直至满足模型训练结束条件,将最后一次训练得到的学生模型确定为目标模型。应说明的是,对模型训练结束条件不做过多限定,在一个实施例中,模型训练结束条件包括但不限于模型精度达到设定精度阈值、模型迭代次数达到设定次数阈值、总损失函数达到最小值等。In one implementation, there may be multiple training samples. Update the model parameters of the student model based on the total loss function to obtain the trained target model, including updating the model parameters of the student model based on the total loss function, identifying that the model training end conditions are not currently met, and returning to the next training sample to continue. The model parameters of the student model that adjust the model parameters are updated until the model training end conditions are met, and the student model obtained in the last training is determined as the target model. It should be noted that the end conditions of model training are not too limited. In one embodiment, the end conditions of model training include but are not limited to the model accuracy reaching the set accuracy threshold, the number of model iterations reaching the set number threshold, and the total loss function. reaches the minimum value etc.
本公开的实施例中,目标模型是对n个教师模型进行蒸馏学习得到的,目标模型的体积小、精度高、且所需的计算资源小于教师模型。例如,用户终端上部署有教师模型,可将用户终端上部署的教师模型替换为目标模型,有助于节省用户终端的存储空间和计算资源。In the embodiment of the present disclosure, the target model is obtained by distillation learning of n teacher models. The target model is small in size, high in accuracy, and requires less computing resources than the teacher model. For example, if a teacher model is deployed on the user terminal, the teacher model deployed on the user terminal can be replaced with the target model, which helps to save storage space and computing resources of the user terminal.
综上,根据本公开实施例的模型训练方法,将训练样本分别输入至学生模型和n个教师模型中,获取学生模型的第一输出,并获取n个教师模型的第二输出,基于训 练样本的标签和n个第二输出,确定训练样本对应的权重,基于第一输出和权重,获取学生模型的总损失函数,基于总损失函数对学生模型的模型参数进行更新,得到训练后的目标模型。由此,可综合考虑到训练样本的标签和教师模型的第二输出,来确定训练样本对应的权重,进而基于学生模型的第一输出和权重,得到学生模型的总损失函数,可避免训练样本的标签错误的情况下导致总损失函数不准确的问题,提高了学生模型的总损失函数的准确性,进而提高了模型训练的精度。In summary, according to the model training method of the embodiment of the present disclosure, the training samples are input into the student model and n teacher models respectively, the first output of the student model is obtained, and the second output of the n teacher models is obtained, based on the training samples labels and n second outputs, determine the weight corresponding to the training sample, obtain the total loss function of the student model based on the first output and weight, update the model parameters of the student model based on the total loss function, and obtain the target model after training . From this, the label of the training sample and the second output of the teacher model can be comprehensively considered to determine the weight corresponding to the training sample, and then based on the first output and weight of the student model, the total loss function of the student model can be obtained, which can avoid training samples The problem of inaccurate total loss function caused by label errors improves the accuracy of the total loss function of the student model, thereby improving the accuracy of model training.
在上述任一实施例的基础上,训练样本对应的权重包括第一权重和教师模型对应的第二权重。应说明的是,一个训练样本可对应一个第一权重,每个教师模型可对应一个第二权重,即n个教师模型可对应n个第二权重。Based on any of the above embodiments, the weight corresponding to the training sample includes a first weight and a second weight corresponding to the teacher model. It should be noted that one training sample can correspond to a first weight, and each teacher model can correspond to a second weight, that is, n teacher models can correspond to n second weights.
图2是根据本公开第二实施例的模型训练方法的流程示意图。Figure 2 is a schematic flowchart of a model training method according to the second embodiment of the present disclosure.
如图2所示,本公开第二实施例的模型训练方法,包括:S201-S207。As shown in Figure 2, the model training method according to the second embodiment of the present disclosure includes: S201-S207.
S201,将训练样本分别输入至学生模型和n个教师模型中,获取学生模型的第一输出,并获取n个教师模型的第二输出,其中,n为正整数。S201, input the training samples into the student model and n teacher models respectively, obtain the first output of the student model, and obtain the second output of the n teacher models, where n is a positive integer.
步骤S201的相关内容可参见上述实施例,这里不再赘述。The relevant content of step S201 can be found in the above embodiments and will not be described again here.
S202,按照训练样本的标签的数据格式,将n个第二输出转换为标签的数据格式的n个第二转换输出。S202: According to the data format of the label of the training sample, convert n second outputs into n second conversion outputs in the data format of the label.
可以理解的是,标签的数据格式、第二输出的数据格式可能不同,在一个实施例中,标签的数据格式可为自然语言构成的文本,第二输出的数据格式可为非自然语言构成的文本。在一个实施例中,标签的数据格式可为中文文本,第二输出的数据格式可为向量。It can be understood that the data format of the label and the data format of the second output may be different. In one embodiment, the data format of the label may be text composed of natural language, and the data format of the second output may be composed of non-natural language. text. In one embodiment, the data format of the label may be Chinese text, and the data format of the second output may be a vector.
本公开的实施例中,可将n个第二输出转换为标签的数据格式的n个第二转换输出,即n个第二转换输出的数据格式与标签的数据格式相同。In embodiments of the present disclosure, the n second outputs may be converted into n second conversion outputs in the data format of the tag, that is, the data format of the n second conversion outputs is the same as the data format of the tag.
在一个实施例中,在学生模型、教师模型为图像分类模型时,训练样本的标签包括“道路”,第二输出包括(1,0,0)、(0,1,0)、(0,0,1),可将(1,0,0)、(0,1,0)、(0,0,1)分别转换为“道路”、“风景”、“建筑物”。In one embodiment, when the student model and teacher model are image classification models, the labels of the training samples include "road", and the second output includes (1,0,0), (0,1,0), (0, 0,1), (1,0,0), (0,1,0), (0,0,1) can be converted into "road", "landscape" and "building" respectively.
S203,基于标签和n个第二转换输出,确定训练样本的权重。S203: Determine the weight of the training sample based on the label and n second conversion outputs.
步骤S203的相关内容可参见步骤S102的相关内容,这里不再赘述。For the relevant content of step S203, please refer to the relevant content of step S102, which will not be described again here.
由此,该方法中可按照标签的数据格式,对n个第二输出进行数据转换,得到n个第二转换输出,并基于标签和n个第二转换输出,确定权重,标签和n个第二转换数据的数据格式相同,便于确定权重。Therefore, in this method, n second outputs can be data converted according to the data format of the tags to obtain n second converted outputs, and based on the tags and the n second converted outputs, the weights, the tags and the n second converted outputs can be determined. The data format of the two converted data is the same, making it easy to determine the weight.
S204,基于第一输出、标签和第一权重,获取学生模型的第一损失函数。S204: Obtain the first loss function of the student model based on the first output, label and first weight.
在一种实施方式中,基于第一输出、标签和第一权重,获取学生模型的第一损失 函数,包括获取第一模板损失函数,将第一输出、标签和第一权重代入第一模板损失函数,以得到第一损失函数。应说明的是,对第一模板损失函数不做过多限定,在一个实施例中,第一模板损失函数可预先设置。In one implementation, based on the first output, label, and first weight, obtaining the first loss function of the student model includes obtaining the first template loss function, and substituting the first output, label, and first weight into the first template loss. function to get the first loss function. It should be noted that the first template loss function is not too limited. In one embodiment, the first template loss function can be set in advance.
在一种实施方式中,基于第一输出、标签和第一权重,获取学生模型的第一损失函数,包括基于第一输出和标签,获取学生模型的第一初始损失函数,基于第一初始损失函数和第一权重,获取第一损失函数。由此,该方法中可基于第一输出和标签,得到第一初始损失函数,并基于第一初始损失函数和第一权重,得到第一损失函数。In one implementation, obtaining the first loss function of the student model based on the first output, the label and the first weight includes obtaining the first initial loss function of the student model based on the first output and the label, based on the first initial loss function and the first weight to obtain the first loss function. Therefore, in this method, the first initial loss function can be obtained based on the first output and the label, and the first loss function can be obtained based on the first initial loss function and the first weight.
在一种实施方式中,基于第一初始损失函数和第一权重,获取第一损失函数,包括获取第一初始损失函数和第一权重的第一乘积,并将第一乘积确定为第一损失函数。In one implementation, based on the first initial loss function and the first weight, obtaining the first loss function includes obtaining the first product of the first initial loss function and the first weight, and determining the first product as the first loss function.
在一个实施例中,第一损失函数的公式如下:In one embodiment, the formula of the first loss function is as follows:
Lgt=F1(Os,GTx)·BLgt=F1(Os,GTx)·B
其中,Lgt为第一损失函数,Os为第一输出,GTx为标签,F1(Os,GTx)为第一初始损失函数,B为第一权重。Among them, Lgt is the first loss function, Os is the first output, GTx is the label, F1(Os,GTx) is the first initial loss function, and B is the first weight.
S205,基于第一输出、n个第二输出和第二权重,获取学生模型的第二损失函数。S205: Obtain the second loss function of the student model based on the first output, n second outputs and the second weight.
在一种实施方式中,基于第一输出、n个第二输出和第二权重,获取学生模型的第二损失函数,包括获取第二模板损失函数,将第一输出、n个第二输出和第二权重代入第二模板损失函数,以得到第二损失函数。应说明的是,对第二模板损失函数不做过多限定,在一个实施例中,第二模板损失函数可预先设置。In one implementation, based on the first output, n second outputs and second weights, obtaining the second loss function of the student model includes obtaining the second template loss function, and adding the first output, n second outputs and The second weight is substituted into the second template loss function to obtain the second loss function. It should be noted that the second template loss function is not too limited. In one embodiment, the second template loss function can be preset.
在一种实施方式中,基于第一输出、n个第二输出和第二权重,获取学生模型的第二损失函数,包括基于第一输出和第i个教师模型的第二输出,获取学生模型的第i个第三初始损失函数,其中,1≤i≤n,i为正整数,基于第i个第三初始损失函数和第i个教师模型对应的第二权重,获取学生模型的第i个第三损失函数,基于学生模型的n个第三损失函数,获取第二损失函数。由此,该方法中可基于第一输出和第二输出,得到第三初始损失函数,并基于第三初始损失函数和第二权重,得到第三损失函数,并基于n个第三损失函数,得到第二损失函数。In one implementation, obtaining the second loss function of the student model based on the first output, n second outputs and the second weight includes obtaining the student model based on the first output and the second output of the i-th teacher model. The i-th third initial loss function, where 1≤i≤n, i is a positive integer, based on the i-th third initial loss function and the second weight corresponding to the i-th teacher model, obtain the i-th student model n third loss functions, based on n third loss functions of the student model, obtain the second loss function. Therefore, in this method, a third initial loss function can be obtained based on the first output and the second output, and a third loss function can be obtained based on the third initial loss function and the second weight, and based on n third loss functions, Get the second loss function.
在一种实施方式中,基于第i个第三初始损失函数和第i个教师模型对应的第二权重,获取学生模型的第i个第三损失函数,包括获取第i个第三初始损失函数和第i个教师模型对应的第二权重的第二乘积,并将第二乘积确定为第i个第三损失函数。In one implementation, based on the i-th third initial loss function and the second weight corresponding to the i-th teacher model, obtaining the i-th third loss function of the student model includes obtaining the i-th third initial loss function. and the second product of the second weight corresponding to the i-th teacher model, and the second product is determined as the i-th third loss function.
在一种实施方式中,基于学生模型的n个第三损失函数,获取第二损失函数,包括获取学生模型的n个第三损失函数的平均值,并将平均值确定为第二损失函数。由此,该方法中可将n个第三损失函数的平均值确定为第二损失函数。In one implementation, obtaining the second loss function based on n third loss functions of the student model includes obtaining an average value of the n third loss functions of the student model, and determining the average value as the second loss function. Therefore, in this method, the average value of n third loss functions can be determined as the second loss function.
在一个实施例中,第二损失函数的公式如下:In one embodiment, the formula of the second loss function is as follows:
Figure PCTCN2022139730-appb-000001
Figure PCTCN2022139730-appb-000001
其中,Ldist为第二损失函数,Os为第一输出,Oti为第i个教师模型的第二输出,1≤i≤n,F2(Os,Oti)为第i个第三初始损失函数,C i为第i个教师模型对应的第二权重,F2(Os,Oti)·C i为第i个第三损失函数。 Among them, Ldist is the second loss function, Os is the first output, Oti is the second output of the i-th teacher model, 1≤i≤n, F2(Os,Oti) is the i-th third initial loss function, C i is the second weight corresponding to the i-th teacher model, and F2(Os, Oti)·C i is the i-th third loss function.
S206,基于第一损失函数和第二损失函数,获取总损失函数。S206: Obtain the total loss function based on the first loss function and the second loss function.
在一种实施方式中,基于第一损失函数和第二损失函数,获取总损失函数,可包括对第一损失函数和第二损失函数进行加权求和,获取总损失函数。应说明的是,对第一损失函数、第二损失函数的权重均不做过多限定,在一个实施例中,第一损失函数、第二损失函数的权重可预先设置。由此,该方法中可对第一损失函数和第二损失函数进行加权求和,获取总损失函数,提高了学生模型的总损失函数的灵活性。In one implementation, obtaining the total loss function based on the first loss function and the second loss function may include performing a weighted sum of the first loss function and the second loss function to obtain the total loss function. It should be noted that the weights of the first loss function and the second loss function are not too limited. In one embodiment, the weights of the first loss function and the second loss function can be preset. Therefore, in this method, the first loss function and the second loss function can be weighted and summed to obtain the total loss function, which improves the flexibility of the total loss function of the student model.
作为另一种可能的实施方式,还可直接将第一损失函数或者第二损失函数确定为总损失函数。As another possible implementation, the first loss function or the second loss function can also be directly determined as the total loss function.
S207,基于总损失函数对学生模型的模型参数进行更新,得到训练后的目标模型。S207: Update the model parameters of the student model based on the total loss function to obtain the trained target model.
步骤S207的相关内容可参见上述实施例,这里不再赘述。The relevant content of step S207 can be found in the above embodiments and will not be described again here.
综上,根据本公开实施例的模型训练方法,可基于第一输出、标签和第一权重,获取学生模型的第一损失函数,基于第一输出、n个第二输出和第二权重,获取学生模型的第二损失函数,基于第一损失函数和第二损失函数,获取总损失函数。由此,可综合考虑到学生模型相对于标签的第一损失函数、学生模型相对于教师模型的第二损失函数,来得到学生模型的总损失函数,提高了学生模型的总损失函数的准确性,进而提高了模型训练的精度。In summary, according to the model training method of the embodiment of the present disclosure, the first loss function of the student model can be obtained based on the first output, the label and the first weight, and the first loss function of the student model can be obtained based on the first output, n second outputs and the second weight. The second loss function of the student model obtains the total loss function based on the first loss function and the second loss function. From this, the first loss function of the student model relative to the label and the second loss function of the student model relative to the teacher model can be comprehensively considered to obtain the total loss function of the student model, which improves the accuracy of the total loss function of the student model. , thereby improving the accuracy of model training.
图3是根据本公开第三实施例的模型训练方法的流程示意图。Figure 3 is a schematic flowchart of a model training method according to the third embodiment of the present disclosure.
如图3所示,本公开第三实施例的模型训练方法,包括:S301-S311。As shown in Figure 3, the model training method according to the third embodiment of the present disclosure includes: S301-S311.
S301,将训练样本分别输入至学生模型和n个教师模型中,获取学生模型的第一输出,并获取n个教师模型的第二输出,其中,n为正整数。S301: Input the training samples into the student model and n teacher models respectively, obtain the first output of the student model, and obtain the second output of the n teacher models, where n is a positive integer.
步骤S301的相关内容可参见上述实施例,这里不再赘述。The relevant content of step S301 can be found in the above embodiments and will not be described again here.
S302,将标签和n个第二输出进行比对,获取与标签一致的第二输出的目标数量。S302: Compare the label with n second outputs to obtain the target number of second outputs consistent with the label.
可以理解的是,目标数量为自然数,目标数量的取值范围为[0,n]。在一个实施例 中,目标数量为0时,表明标签和n个第二输出均不一致,目标数量大于或者等于1时,表明n个第二输出中存在与标签一致的第二输出,目标数量为n时,表明标签和n个第二输出均一致。It can be understood that the target quantity is a natural number, and the value range of the target quantity is [0,n]. In one embodiment, when the target number is 0, it indicates that the label is inconsistent with the n second outputs. When the target number is greater than or equal to 1, it indicates that there is a second output consistent with the label among the n second outputs. The target number is When n, it indicates that the label and n second outputs are consistent.
S303,基于目标数量,确定第一权重,其中,第一权重与目标数量正相关。S303: Determine the first weight based on the number of targets, where the first weight is positively related to the number of targets.
需要说明的是,第一权重的取值范围为[0,1]。It should be noted that the value range of the first weight is [0,1].
本公开的实施例中,第一权重与目标数量正相关,即若与标签一致的第二输出的目标数量越大,表明标签正确的可能性越大,第一权重也越大,反之,若与标签一致的第二输出的目标数量越小,表明标签正确的可能性越小,第一权重也越小。In the embodiment of the present disclosure, the first weight is positively related to the number of targets. That is, if the number of targets in the second output that is consistent with the label is larger, it indicates that the possibility of the label being correct is greater, and the first weight is also larger. On the contrary, if The smaller the number of targets in the second output that is consistent with the label, the smaller the probability that the label is correct, and the smaller the first weight.
在一个实施例中,第一权重B=k/n。其中,k为目标数量,n为教师模型的数量(即第二输出的数量)。In one embodiment, the first weight B=k/n. Among them, k is the number of targets, and n is the number of teacher models (that is, the number of second outputs).
S304,基于第一输出、标签和第一权重,获取学生模型的第一损失函数。S304: Obtain the first loss function of the student model based on the first output, label and first weight.
步骤S304的相关内容可参见上述实施例,这里不再赘述。The relevant content of step S304 can be found in the above embodiments and will not be described again here.
S305,将标签和第i个教师模型的第二输出进行比对。S305: Compare the label with the second output of the i-th teacher model.
S306,响应于标签与第i个教师模型的第二输出一致,确定第i个教师模型对应的第二权重为1。S306: In response to the label being consistent with the second output of the i-th teacher model, determine the second weight corresponding to the i-th teacher model to be 1.
本公开的实施例中,在标签与第i个教师模型的第二输出一致的情况下,可确定第i个教师模型对应的第二权重Ci=1。In the embodiment of the present disclosure, when the label is consistent with the second output of the i-th teacher model, the second weight Ci=1 corresponding to the i-th teacher model can be determined.
S307,响应于标签与第i个教师模型的第二输出不一致,获取与标签一致的第二输出的目标数量。S307: In response to the label being inconsistent with the second output of the i-th teacher model, obtain a target number of second outputs that are consistent with the label.
S308,基于目标数量,确定第i个教师模型对应的第二权重,其中,第二权重与目标数量正相关。S308: Based on the number of targets, determine the second weight corresponding to the i-th teacher model, where the second weight is positively related to the number of targets.
需要说明的是,目标数量的相关内容,可参见上述实施例,这里不再赘述。It should be noted that, for the relevant content of the target quantity, please refer to the above embodiments and will not be described again here.
本公开的实施例中,在标签与第i个教师模型的第二输出不一致的情况下,第i个教师模型对应的第二权重的取值范围为[0,1),即第二权重为任一小于1且大于或者等于0的数值。In the embodiment of the present disclosure, when the label is inconsistent with the second output of the i-th teacher model, the value range of the second weight corresponding to the i-th teacher model is [0,1), that is, the second weight is Any number less than 1 and greater than or equal to 0.
本公开的实施例中,在标签与第i个教师模型的第二输出不一致的情况下,第i个教师模型对应的第二权重与目标数量正相关,即若与标签一致的第二输出的目标数量越大,表明标签正确的可能性越大,第二权重也越大,反之,若与标签一致的第二输出的目标数量越小,表明标签正确的可能性越小,第二权重也越小。In the embodiment of the present disclosure, when the label is inconsistent with the second output of the i-th teacher model, the second weight corresponding to the i-th teacher model is positively related to the target number, that is, if the second output of the i-th teacher model is consistent with the label, The larger the number of targets, the greater the possibility that the label is correct, and the greater the second weight. On the contrary, if the number of targets in the second output that is consistent with the label is smaller, it indicates that the possibility of the label being correct is smaller, and the second weight is also The smaller.
在一个实施例中,在标签与第i个教师模型的第二输出不一致的情况下,第i个教师模型对应的第二权重Ci=k/n。其中,k为目标数量,n为教师模型的数量(即第二输出的数量)。In one embodiment, when the label is inconsistent with the second output of the i-th teacher model, the second weight Ci=k/n corresponding to the i-th teacher model. Among them, k is the number of targets, and n is the number of teacher models (that is, the number of second outputs).
S309,基于第一输出、n个第二输出和第二权重,获取学生模型的第二损失函数。S309: Obtain the second loss function of the student model based on the first output, n second outputs and the second weight.
S310,基于第一损失函数和第二损失函数,获取总损失函数。S310: Obtain the total loss function based on the first loss function and the second loss function.
S311,基于总损失函数对学生模型的模型参数进行更新,得到训练后的目标模型。S311. Update the model parameters of the student model based on the total loss function to obtain the trained target model.
步骤S309-S311的相关内容可参见上述实施例,这里不再赘述。The relevant content of steps S309-S311 can be found in the above embodiments and will not be described again here.
综上,根据本公开实施例的模型训练方法,可基于与标签一致的第二输出的目标数量,确定第一权重,并在标签与第i个教师模型的第二输出一致的情况下,确定第i个教师模型对应的第二权重为1,或者,在标签与第i个教师模型的第二输出不一致的情况下,基于与标签一致的第二输出的目标数量,确定第i个教师模型对应的第二权重。In summary, according to the model training method of the embodiment of the present disclosure, the first weight can be determined based on the target number of the second output consistent with the label, and when the label is consistent with the second output of the i-th teacher model, the first weight can be determined The second weight corresponding to the i-th teacher model is 1, or, in the case where the label is inconsistent with the second output of the i-th teacher model, the i-th teacher model is determined based on the target number of second outputs consistent with the label. The corresponding second weight.
根据本公开的实施例,本公开还提供了一种图像处理方法。According to an embodiment of the present disclosure, the present disclosure also provides an image processing method.
图4是根据本公开第一实施例的图像处理方法的流程示意图。FIG. 4 is a schematic flowchart of an image processing method according to the first embodiment of the present disclosure.
如图4所示,本公开第一实施例的图像处理方法,包括:S401-S402。As shown in Figure 4, the image processing method according to the first embodiment of the present disclosure includes: S401-S402.
S401,获取待处理图像。S401, obtain the image to be processed.
需要说明的是,本公开实施例的图像处理方法的执行主体可为具有数据信息处理能力的硬件设备和/或驱动该硬件设备工作所需必要的软件。在一些实施例中,执行主体可包括工作站、服务器,计算机、用户终端及其他智能设备。其中,用户终端包括但不限于手机、电脑、智能语音交互设备、智能家电、车载终端等。It should be noted that the execution subject of the image processing method in the embodiment of the present disclosure may be a hardware device with data information processing capabilities and/or the necessary software required to drive the hardware device to work. In some embodiments, execution subjects may include workstations, servers, computers, user terminals and other intelligent devices. Among them, user terminals include but are not limited to mobile phones, computers, intelligent voice interaction devices, smart home appliances, vehicle-mounted terminals, etc.
需要说明的是,对待处理图像不做过多限定。在一个实施例中,待处理图像包括但不限于二维图像、三维图像等。It should be noted that there are no excessive restrictions on the images to be processed. In one embodiment, the images to be processed include but are not limited to two-dimensional images, three-dimensional images, etc.
在一种实施方式中,以执行主体为用户终端为例,用户终端可从自身的存储空间中获取待处理图像,和/或,可通过相机进行拍摄,来获取待处理图像,和/或,从网页、APP(Application,应用程序)上获取待处理图像。In one implementation, taking the execution subject as a user terminal as an example, the user terminal can obtain the image to be processed from its own storage space, and/or can obtain the image to be processed by shooting with a camera, and/or, Obtain images to be processed from web pages and APPs (Applications).
S402,将待处理图像输入目标图像模型中,由目标图像模型输出待处理图像的处理结果,其中,目标图像模型采用模型训练方法得到。S402: Input the image to be processed into the target image model, and the target image model outputs the processing result of the image to be processed, where the target image model is obtained using a model training method.
需要说明的是,目标图像模型可采用图1至图3所述的模型训练方法得到,这里不再赘述。It should be noted that the target image model can be obtained by using the model training method described in Figures 1 to 3, which will not be described again here.
需要说明的是,对处理结果不做过多限定。在一个实施例中,处理结果包括但不限于动作识别结果,图像分类结果、人脸识别结果,文本识别结果等。It should be noted that there are no excessive limitations on the processing results. In one embodiment, the processing results include but are not limited to action recognition results, image classification results, face recognition results, text recognition results, etc.
本公开的实施例中,目标图像模型是对n个教师模型进行蒸馏学习得到的,目标图像模型的体积小、精度高、且所需的计算资源小于教师模型。例如,用户终端上部署有教师模型,可将用户终端上部署的教师模型替换为目标图像模型,有助于节省用户终端的存储空间和计算资源。In the embodiment of the present disclosure, the target image model is obtained by distillation learning of n teacher models. The target image model is small in size, high in accuracy, and requires less computing resources than the teacher model. For example, if a teacher model is deployed on the user terminal, the teacher model deployed on the user terminal can be replaced with the target image model, which helps to save storage space and computing resources of the user terminal.
综上,根据本公开实施例的图像处理方法,将待处理图像输入目标图像模型中, 由目标图像模型输出待处理图像的处理结果,目标图像模型采用模型训练方法得到,目标图像模型的体积小、精度高、且所需的计算资源小,有助于提升图像处理性能。In summary, according to the image processing method of the embodiment of the present disclosure, the image to be processed is input into the target image model, and the target image model outputs the processing result of the image to be processed. The target image model is obtained using a model training method, and the target image model is small in size. , high precision, and small computing resources required, which helps improve image processing performance.
需要说明的是,本公开实施例的模型训练方法还可应用于语音模型、语言模型等。It should be noted that the model training method of the embodiment of the present disclosure can also be applied to speech models, language models, etc.
在一个实施例中,可获取待处理语音,将待处理语音输入目标语音模型,由目标语音模型输出待处理语音的语音处理结果,目标语音模型采用模型训练方法得到,有助于提升语音处理性能。In one embodiment, the speech to be processed can be obtained, the speech to be processed is input into a target speech model, and the target speech model outputs the speech processing result of the speech to be processed. The target speech model is obtained using a model training method, which helps to improve speech processing performance. .
在一个实施例中,可获取待处理文本,将待处理文本输入目标语言模型,由目标语言模型输出待处理文本的文本处理结果,目标语言模型采用模型训练方法得到,有助于提升文本处理性能。In one embodiment, the text to be processed can be obtained, the text to be processed is input into the target language model, and the target language model outputs the text processing result of the text to be processed. The target language model is obtained using a model training method, which helps to improve text processing performance. .
本公开的技术方案中,所涉及的用户个人信息的收集、存储、使用、加工、传输、提供和公开等处理,均符合相关法律法规的规定,且不违背公序良俗。In the technical solution of this disclosure, the collection, storage, use, processing, transmission, provision and disclosure of user personal information are in compliance with relevant laws and regulations and do not violate public order and good customs.
根据本公开的实施例,本公开还提供了一种模型训练装置,用于实现上述的模型训练方法。According to an embodiment of the present disclosure, the present disclosure also provides a model training device for implementing the above model training method.
图5是根据本公开第一实施例的模型训练装置的框图。FIG. 5 is a block diagram of a model training device according to the first embodiment of the present disclosure.
如图5所示,本公开实施例的模型训练装置500,包括:第一获取模块501、确定模块502、第二获取模块503和训练模块504。As shown in Figure 5, the model training device 500 of the embodiment of the present disclosure includes: a first acquisition module 501, a determination module 502, a second acquisition module 503 and a training module 504.
第一获取模块501用于将训练样本分别输入至学生模型和n个教师模型中,获取所述学生模型的第一输出,并获取n个所述教师模型的第二输出,其中,n为正整数。The first acquisition module 501 is used to input training samples into the student model and n teacher models respectively, obtain the first output of the student model, and obtain the second output of the n teacher models, where n is positive integer.
确定模块502用于基于所述训练样本的标签和n个所述第二输出,确定所述训练样本对应的权重。The determining module 502 is configured to determine the weight corresponding to the training sample based on the label of the training sample and the n second outputs.
第二获取模块503用于基于所述第一输出和所述权重,获取所述学生模型的总损失函数。The second obtaining module 503 is used to obtain the total loss function of the student model based on the first output and the weight.
训练模块504用于基于所述总损失函数对所述学生模型的模型参数进行更新,得到训练后的目标模型。The training module 504 is used to update the model parameters of the student model based on the total loss function to obtain a trained target model.
在本公开的一个实施例中,所述权重包括第一权重和所述教师模型对应的第二权重;所述第二获取模块503还用于:基于所述第一输出、所述标签和所述第一权重,获取所述学生模型的第一损失函数;基于所述第一输出、n个所述第二输出和所述第二权重,获取所述学生模型的第二损失函数;基于所述第一损失函数和所述第二损失函数,获取所述总损失函数。In one embodiment of the present disclosure, the weight includes a first weight and a second weight corresponding to the teacher model; the second acquisition module 503 is further configured to: based on the first output, the label and the According to the first weight, the first loss function of the student model is obtained; based on the first output, n second outputs and the second weight, the second loss function of the student model is obtained; based on the The first loss function and the second loss function are used to obtain the total loss function.
在本公开的一个实施例中,所述第二获取模块503还用于:基于所述第一输出和所述标签,获取所述学生模型的第一初始损失函数;基于所述第一初始损失函数和所述第一权重,获取所述第一损失函数。In one embodiment of the present disclosure, the second acquisition module 503 is further configured to: acquire a first initial loss function of the student model based on the first output and the label; based on the first initial loss function and the first weight to obtain the first loss function.
在本公开的一个实施例中,所述第二获取模块503还用于:基于所述第一输出和第i个教师模型的第二输出,获取所述学生模型的第i个第三初始损失函数,其中,1≤i≤n,i为正整数;基于所述第i个第三初始损失函数和所述第i个教师模型对应的第二权重,获取所述学生模型的第i个第三损失函数;基于所述学生模型的n个第三损失函数,获取所述第二损失函数。In one embodiment of the present disclosure, the second acquisition module 503 is further configured to: acquire the i-th third initial loss of the student model based on the first output and the second output of the i-th teacher model. function, where 1≤i≤n, i is a positive integer; based on the i-th third initial loss function and the second weight corresponding to the i-th teacher model, obtain the i-th student model Three loss functions; based on n third loss functions of the student model, the second loss function is obtained.
在本公开的一个实施例中,所述第二获取模块503还用于:获取所述学生模型的n个第三损失函数的平均值,并将所述平均值确定为所述第二损失函数。In one embodiment of the present disclosure, the second acquisition module 503 is also used to: acquire the average value of n third loss functions of the student model, and determine the average value as the second loss function .
在本公开的一个实施例中,所述确定模块502还用于:将所述标签和n个所述第二输出进行比对,获取与所述标签一致的第二输出的目标数量;基于所述目标数量,确定所述第一权重,其中,所述第一权重与所述目标数量正相关。In one embodiment of the present disclosure, the determination module 502 is further configured to: compare the label with n second outputs, and obtain a target number of second outputs consistent with the label; based on the The first weight is determined according to the target quantity, wherein the first weight is positively related to the target quantity.
在本公开的一个实施例中,所述确定模块502还用于:将所述标签和第i个教师模型的第二输出进行比对;响应于所述标签与所述第i个教师模型的第二输出一致,确定所述第i个教师模型对应的第二权重为1;或者,响应于所述标签与所述第i个教师模型的第二输出不一致,获取与所述标签一致的第二输出的目标数量;基于所述目标数量,确定所述第i个教师模型对应的第二权重,其中,所述第二权重与所述目标数量正相关。In one embodiment of the present disclosure, the determination module 502 is further configured to: compare the label with the second output of the i-th teacher model; respond to the The second output is consistent, and the second weight corresponding to the i-th teacher model is determined to be 1; or, in response to the label being inconsistent with the second output of the i-th teacher model, obtaining the second weight that is consistent with the label. 2. The target number of outputs; based on the target number, determine the second weight corresponding to the i-th teacher model, wherein the second weight is positively related to the target number.
在本公开的一个实施例中,所述确定模块502还用于:按照所述标签的数据格式,将n个所述第二输出转换为所述标签的数据格式的n个第二转换输出;基于所述标签和n个所述第二转换输出,确定所述权重。In one embodiment of the present disclosure, the determination module 502 is further configured to: convert n second outputs into n second conversion outputs in the data format of the tag according to the data format of the tag; The weight is determined based on the label and n of the second transformation outputs.
在本公开的一个实施例中,所述第二获取模块504还用于:对所述第一损失函数和所述第二损失函数进行加权求和,获取所述总损失函数。In one embodiment of the present disclosure, the second acquisition module 504 is further configured to: perform a weighted sum of the first loss function and the second loss function to obtain the total loss function.
综上,本公开实施例的模型训练装置,将训练样本分别输入至学生模型和n个教师模型中,获取学生模型的第一输出,并获取n个教师模型的第二输出,基于训练样本的标签和n个第二输出,确定训练样本对应的权重,基于第一输出和权重,获取学生模型的总损失函数,基于总损失函数对学生模型的模型参数进行更新,得到训练后的目标模型。由此,可综合考虑到训练样本的标签和教师模型的第二输出,来确定训练样本对应的权重,进而基于学生模型的第一输出和权重,得到学生模型的总损失函数,可避免训练样本的标签错误的情况下导致总损失函数不准确的问题,提高了学生模型的总损失函数的准确性,进而提高了模型训练的精度。In summary, the model training device of the embodiment of the present disclosure inputs training samples into the student model and n teacher models respectively, obtains the first output of the student model, and obtains the second output of the n teacher models. Based on the training samples Label and n second outputs, determine the weight corresponding to the training sample, obtain the total loss function of the student model based on the first output and weight, update the model parameters of the student model based on the total loss function, and obtain the target model after training. From this, the label of the training sample and the second output of the teacher model can be comprehensively considered to determine the weight corresponding to the training sample, and then based on the first output and weight of the student model, the total loss function of the student model can be obtained, which can avoid training samples The problem of inaccurate total loss function caused by label errors improves the accuracy of the total loss function of the student model, thereby improving the accuracy of model training.
根据本公开的实施例,本公开还提供了一种图像处理装置,用于实现上述的图像处理方法。According to an embodiment of the present disclosure, the present disclosure also provides an image processing device for implementing the above image processing method.
图6是根据本公开第一实施例的图像处理装置的框图。6 is a block diagram of an image processing device according to the first embodiment of the present disclosure.
如图6所示,本公开实施例的图像处理装置600,包括:获取模块601和处理模块602。As shown in FIG. 6 , the image processing device 600 according to the embodiment of the present disclosure includes: an acquisition module 601 and a processing module 602 .
获取模块601用于获取待处理图像。The acquisition module 601 is used to acquire images to be processed.
处理模块602用于将所述待处理图像输入目标图像模型中,由所述目标图像模型输出所述待处理图像的处理结果,其中,所述目标图像模型采用模型训练方法得到。The processing module 602 is configured to input the image to be processed into a target image model, and the target image model outputs the processing result of the image to be processed, wherein the target image model is obtained using a model training method.
综上,本公开实施例的图像处理装置,将待处理图像输入目标图像模型中,由目标图像模型输出待处理图像的处理结果,目标图像模型采用模型训练方法得到,目标图像模型的体积小、精度高、且所需的计算资源小,有助于提升图像处理性能。In summary, the image processing device of the embodiment of the present disclosure inputs the image to be processed into the target image model, and the target image model outputs the processing result of the image to be processed. The target image model is obtained by using a model training method. The target image model is small in size and It has high accuracy and requires small computing resources, which helps improve image processing performance.
根据本公开的实施例,本公开还提供了一种电子设备、一种可读存储介质、一种计算机程序产品和计算机程序。According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium, a computer program product, and a computer program.
根据本公开的实施例,电子设备,包括:至少一个处理器;以及与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行前述任一实施例中所述的图像处理方法。According to an embodiment of the present disclosure, an electronic device includes: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions that can be executed by the at least one processor, so The instructions are executed by the at least one processor, so that the at least one processor can execute the image processing method described in any of the preceding embodiments.
图7示出了可以用来实施本公开的实施例的示例电子设备700的示意性框图。电子设备旨在表示各种形式的数字计算机,诸如,膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示各种形式的移动装置,诸如,个人数字处理、蜂窝电话、智能电话、可穿戴设备和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例,并且不意在限制本文中描述的和/或者要求的本公开的实现。7 illustrates a schematic block diagram of an example electronic device 700 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to refer to various forms of digital computers, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. Electronic devices may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions are examples only and are not intended to limit implementations of the disclosure described and/or claimed herein.
如图7所示,电子设备700包括计算单元701,其可以根据存储在只读存储器(ROM)702中的计算机程序或者从存储单元708加载到随机访问存储器(RAM)703中的计算机程序,来执行各种适当的动作和处理。在RAM703中,还可存储电子设备700操作所需的各种程序和数据。计算单元701、ROM702以及RAM703通过总线704彼此相连。输入/输出(I/O)接口705也连接至总线704。As shown in FIG. 7 , the electronic device 700 includes a computing unit 701 that can perform calculations according to a computer program stored in a read-only memory (ROM) 702 or loaded from a storage unit 708 into a random access memory (RAM) 703 . Perform various appropriate actions and processing. In the RAM 703, various programs and data required for the operation of the electronic device 700 can also be stored. Computing unit 701, ROM 702 and RAM 703 are connected to each other via bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
电子设备700中的多个部件连接至I/O接口705,包括:输入单元706,例如键盘、鼠标等;输出单元707,例如各种类型的显示器、扬声器等;存储单元708,例如磁盘、光盘等;以及通信单元709,例如网卡、调制解调器、无线通信收发机等。通信单元709允许电子设备700通过诸如因特网的计算机网络和/或各种电信网络与其他设备交换信息/数据。Multiple components in the electronic device 700 are connected to the I/O interface 705, including: an input unit 706, such as a keyboard, a mouse, etc.; an output unit 707, such as various types of displays, speakers, etc.; a storage unit 708, such as a magnetic disk, an optical disk, etc. etc.; and a communication unit 709, such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the electronic device 700 to exchange information/data with other devices through a computer network such as the Internet and/or various telecommunications networks.
计算单元701可以是各种具有处理和计算能力的通用和/或专用处理组件。计算单元701的一些示例包括但不限于中央处理单元(CPU)、图形处理单元(GPU)、各 种专用的人工智能(AI)计算芯片、各种运行机器学习模型算法的计算单元、数字信号处理器(DSP)、以及任何适当的处理器、控制器、微控制器等。计算单元701执行上文所描述的各个方法和处理,例如图1至图3所述的模型训练方法,例如图4所述的图像处理方法。例如,在一些实施例中,模型训练方法或者图像处理方法可被实现为计算机软件程序,其被有形地包含于机器可读介质,例如存储单元708。在一些实施例中,计算机程序的部分或者全部可以经由ROM702和/或通信单元709而被载入和/或安装到电子设备700上。当计算机程序加载到RAM703并由计算单元701执行时,可以执行上文描述的模型训练方法的一个或多个步骤,或者执行上文描述的图像处理方法的一个或多个步骤。备选地,在其他实施例中,计算单元701可以通过其他任何适当的方式(例如,借助于固件)而被配置为执行模型训练方法,或者被配置为执行图像处理方法。 Computing unit 701 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 701 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various dedicated artificial intelligence (AI) computing chips, various computing units running machine learning model algorithms, digital signal processing processor (DSP), and any appropriate processor, controller, microcontroller, etc. The computing unit 701 performs various methods and processes described above, such as the model training method described in FIGS. 1 to 3 , such as the image processing method described in FIG. 4 . For example, in some embodiments, the model training method or the image processing method may be implemented as a computer software program, which is tangibly included in a machine-readable medium, such as the storage unit 708. In some embodiments, part or all of the computer program may be loaded and/or installed onto electronic device 700 via ROM 702 and/or communication unit 709 . When the computer program is loaded into the RAM 703 and executed by the computing unit 701, one or more steps of the model training method described above may be performed, or one or more steps of the image processing method described above may be performed. Alternatively, in other embodiments, the computing unit 701 may be configured to perform the model training method in any other suitable manner (eg, by means of firmware), or configured to perform the image processing method.
本文中以上描述的系统和技术的各种实施方式可以在数字电子电路系统、集成电路系统、场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、芯片上系统的系统(SOC)、负载可编程逻辑设备(CPLD)、计算机硬件、固件、软件、和/或它们的组合中实现。这些各种实施方式可以包括:实施在一个或者多个计算机程序中,该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程系统上执行和/或解释,该可编程处理器可以是专用或者通用可编程处理器,可以从存储系统、至少一个输入装置、和至少一个输出装置接收数据和指令,并且将数据和指令传输至该存储系统、该至少一个输入装置、和该至少一个输出装置。Various implementations of the systems and techniques described above may be implemented in digital electronic circuit systems, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on a chip implemented in a system (SOC), load programmable logic device (CPLD), computer hardware, firmware, software, and/or a combination thereof. These various embodiments may include implementation in one or more computer programs executable and/or interpreted on a programmable system including at least one programmable processor, the programmable processor The processor, which may be a special purpose or general purpose programmable processor, may receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device. An output device.
用于实施本公开的方法的程序代码可以采用一个或多个编程语言的任何组合来编写。这些程序代码可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理器或控制器,使得程序代码当由处理器或控制器执行时使流程图和/或框图中所规定的功能/操作被实施。程序代码可以完全在机器上执行、部分地在机器上执行,作为独立软件包部分地在机器上执行且部分地在远程机器上执行或完全在远程机器或服务器上执行。Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general-purpose computer, special-purpose computer, or other programmable data processing device, such that the program codes, when executed by the processor or controller, cause the functions specified in the flowcharts and/or block diagrams/ The operation is implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
根据据本公开的实施例,存储介质为存储有计算机指令的非瞬时计算机可读存储介质,所述计算机指令用于使所述计算机执行前述任一实施例中所述的图像处理方法。According to an embodiment of the present disclosure, the storage medium is a non-transitory computer-readable storage medium storing computer instructions, and the computer instructions are used to cause the computer to execute the image processing method described in any of the foregoing embodiments.
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多 个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。In the context of this disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, laptop disks, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
根据本公开的实施例,计算机程序产品包括计算机程序,所述计算机程序在被处理器执行时实现前述任一实施例中所述的图像处理方法。According to an embodiment of the present disclosure, a computer program product includes a computer program that, when executed by a processor, implements the image processing method described in any of the foregoing embodiments.
根据本公开的实施例,计算机程序包括计算机程序代码,当所述计算机程序代码在计算机上运行时,使得所述计算机执行前述任一实施例中所述的图像处理方法。According to an embodiment of the present disclosure, the computer program includes computer program code. When the computer program code is run on a computer, it causes the computer to execute the image processing method described in any of the foregoing embodiments.
需要说明的是,前述对图像处理方法实施例的解释说明也适用于实施例中图像处理装置、存储介质、电子设备、计算机程序产品和计算机程序,此处不再赘述。It should be noted that the foregoing explanations of the image processing method embodiments also apply to the image processing devices, storage media, electronic equipment, computer program products and computer programs in the embodiments, and will not be described again here.
为了提供与用户的交互,可以在计算机上实施此处描述的系统和技术,该计算机具有:用于向用户显示信息的显示装置(例如,CRT(阴极射线管)或者LCD(液晶显示器)监视器);以及键盘和指向装置(例如,鼠标或者轨迹球),用户可以通过该键盘和该指向装置来将输入提供给计算机。其它种类的装置还可以用于提供与用户的交互;例如,提供给用户的反馈可以是任何形式的传感反馈(例如,视觉反馈、听觉反馈、或者触觉反馈);并且可以用任何形式(包括声输入、语音输入或者、触觉输入)来接收来自用户的输入。To provide interaction with a user, the systems and techniques described herein may be implemented on a computer having a display device (eg, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user ); and a keyboard and pointing device (eg, a mouse or a trackball) through which a user can provide input to the computer. Other kinds of devices may also be used to provide interaction with the user; for example, the feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and may be provided in any form, including Acoustic input, voice input or tactile input) to receive input from the user.
可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如,作为数据服务器)、或者包括中间件部件的计算系统(例如,应用服务器)、或者包括前端部件的计算系统(例如,具有图形用户界面或者网络浏览器的用户计算机,用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如,通信网络)来将系统的部件相互连接。通信网络的示例包括:局域网(LAN)、广域网(WAN)和互联网。The systems and techniques described herein may be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., A user's computer having a graphical user interface or web browser through which the user can interact with implementations of the systems and technologies described herein), or including such backend components, middleware components, or any combination of front-end components in a computing system. The components of the system may be interconnected by any form or medium of digital data communication (eg, a communications network). Examples of communication networks include: local area network (LAN), wide area network (WAN), and the Internet.
计算机系统可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。服务器可以是云服务器,又称为云计算服务器或云主机,是云计算服务体系中的一项主机产品,以解决了传统物理主机与VPS服务("Virtual Private Server",或简称"VPS")中,存在的管理难度大,业务扩展性弱的缺陷。服务器也可以为分布式系统的服务器,或者是结合了区块链的服务器。Computer systems may include clients and servers. Clients and servers are generally remote from each other and typically interact over a communications network. The relationship of client and server is created by computer programs running on corresponding computers and having a client-server relationship with each other. The server can be a cloud server, also known as cloud computing server or cloud host. It is a host product in the cloud computing service system to solve the problem of traditional physical host and VPS service ("Virtual Private Server", or "VPS" for short) Among them, there are defects such as difficult management and weak business scalability. The server can also be a distributed system server or a server combined with a blockchain.
应该理解,可以使用上面所示的各种形式的流程,重新排序、增加或删除步骤。例如,本发公开中记载的各步骤可以并行地执行也可以顺序地执行也可以不同的次序执行,只要能够实现本公开公开的技术方案所期望的结果,本文在此不进行限制。It should be understood that various forms of the process shown above may be used, with steps reordered, added or deleted. For example, each step described in the present disclosure can be executed in parallel, sequentially, or in a different order. As long as the desired results of the technical solution disclosed in the present disclosure can be achieved, there is no limitation here.
上述具体实施方式,并不构成对本公开保护范围的限制。本领域技术人员应该明白的 是,根据设计要求和其他因素,可以进行各种修改、组合、子组合和替代。任何在本公开的精神和原则之内所作的修改、等同替换和改进等,均应包含在本公开保护范围之内。The above-mentioned specific embodiments do not constitute a limitation on the scope of the present disclosure. It will be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions are possible depending on design requirements and other factors. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of this disclosure shall be included in the protection scope of this disclosure.
本公开所有实施例均可以单独被执行,也可以与其他实施例相结合被执行,均视为本公开要求的保护范围。All embodiments of the present disclosure can be executed alone or in combination with other embodiments, which are considered to be within the scope of protection claimed by the present disclosure.

Claims (16)

  1. 一种图像处理方法,包括:An image processing method including:
    获取待处理图像;和Get the image to be processed; and
    将所述待处理图像输入目标图像模型中,由所述目标图像模型输出所述待处理图像的处理结果,The image to be processed is input into a target image model, and the target image model outputs the processing result of the image to be processed,
    其中,通过下述步骤训练所述目标图像模型:Among them, the target image model is trained through the following steps:
    将训练样本分别输入至学生模型和n个教师模型中,获取所述学生模型的第一输出,并获取n个所述教师模型的第二输出,其中,n为正整数;Input the training samples into the student model and n teacher models respectively, obtain the first output of the student model, and obtain the second output of the n teacher models, where n is a positive integer;
    基于所述训练样本的标签和n个所述第二输出,确定所述训练样本对应的权重;Based on the label of the training sample and n second outputs, determine the weight corresponding to the training sample;
    基于所述第一输出和所述权重,获取所述学生模型的总损失函数;和Obtaining a total loss function of the student model based on the first output and the weight; and
    基于所述总损失函数对所述学生模型的模型参数进行更新,得到训练后的目标模型,Update the model parameters of the student model based on the total loss function to obtain the trained target model,
    其中,所述权重包括第一权重和所述教师模型对应的第二权重;Wherein, the weight includes a first weight and a second weight corresponding to the teacher model;
    其中,所述基于所述第一输出和所述权重,获取所述学生模型的总损失函数,包括:Wherein, obtaining the total loss function of the student model based on the first output and the weight includes:
    基于所述第一输出、所述标签和所述第一权重,获取所述学生模型的第一损失函数;Obtain a first loss function of the student model based on the first output, the label, and the first weight;
    基于所述第一输出、n个所述第二输出和所述第二权重,获取所述学生模型的第二损失函数;和Obtaining a second loss function of the student model based on the first output, n second outputs and the second weight; and
    基于所述第一损失函数和所述第二损失函数,获取所述总损失函数,Based on the first loss function and the second loss function, the total loss function is obtained,
    其中,所述基于所述训练样本的标签和n个所述第二输出,确定所述训练样本对应的权重,包括:Wherein, determining the weight corresponding to the training sample based on the label of the training sample and n second outputs includes:
    将所述标签和n个所述第二输出进行比对,获取与所述标签一致的第二输出的目标数量;和Compare the label with n second outputs to obtain a target number of second outputs consistent with the label; and
    基于所述目标数量,确定所述第一权重,其中,所述第一权重与所述目标数量正相关,The first weight is determined based on the target quantity, wherein the first weight is positively related to the target quantity,
    其中,所述基于所述训练样本的标签和n个所述第二输出,确定所述训练样本对应的权重,包括:Wherein, determining the weight corresponding to the training sample based on the label of the training sample and n second outputs includes:
    将所述标签和第i个教师模型的第二输出进行比对;Compare the label with the second output of the i-th teacher model;
    响应于所述标签与所述第i个教师模型的第二输出一致,确定所述第i个教师 模型对应的第二权重为1;或者,In response to the label being consistent with the second output of the i-th teacher model, it is determined that the second weight corresponding to the i-th teacher model is 1; or,
    响应于所述标签与所述第i个教师模型的第二输出不一致,获取与所述标签一致的第二输出的目标数量;In response to the label being inconsistent with the second output of the i-th teacher model, obtaining a target number of second outputs that are consistent with the label;
    基于所述目标数量,确定所述第i个教师模型对应的第二权重,其中,所述第二权重与所述目标数量正相关。Based on the target number, a second weight corresponding to the i-th teacher model is determined, where the second weight is positively related to the target number.
  2. 根据权利要求1所述的方法,其中,所述基于所述第一输出、所述标签和所述第一权重,获取所述学生模型的第一损失函数,包括:The method according to claim 1, wherein said obtaining the first loss function of the student model based on the first output, the label and the first weight includes:
    基于所述第一输出和所述标签,获取所述学生模型的第一初始损失函数;和Obtaining a first initial loss function for the student model based on the first output and the label; and
    基于所述第一初始损失函数和所述第一权重,获取所述第一损失函数。The first loss function is obtained based on the first initial loss function and the first weight.
  3. 根据权利要求1或2所述的方法,其中,所述基于所述第一输出、n个所述第二输出和所述第二权重,获取所述学生模型的第二损失函数,包括:The method according to claim 1 or 2, wherein obtaining the second loss function of the student model based on the first output, n second outputs and the second weight includes:
    基于所述第一输出和第i个教师模型的第二输出,获取所述学生模型的第i个第三初始损失函数,其中,1≤i≤n,i为正整数;Based on the first output and the second output of the i-th teacher model, obtain the i-th third initial loss function of the student model, where 1≤i≤n, i is a positive integer;
    基于所述第i个第三初始损失函数和所述第i个教师模型对应的第二权重,获取所述学生模型的第i个第三损失函数;和Based on the i-th third initial loss function and the second weight corresponding to the i-th teacher model, obtain the i-th third loss function of the student model; and
    基于所述学生模型的n个第三损失函数,获取所述第二损失函数。The second loss function is obtained based on n third loss functions of the student model.
  4. 根据权利要求3所述的方法,其中,所述基于所述学生模型的n个第三损失函数,获取所述第二损失函数,包括:The method according to claim 3, wherein said obtaining the second loss function based on n third loss functions of the student model includes:
    获取所述学生模型的n个第三损失函数的平均值,并将所述平均值确定为所述第二损失函数。Obtain the average value of n third loss functions of the student model, and determine the average value as the second loss function.
  5. 根据权利要求1至4中任一项所述的方法,其中,所述基于所述训练样本的标签和n个所述第二输出,确定所述训练样本对应的权重,包括:The method according to any one of claims 1 to 4, wherein determining the weight corresponding to the training sample based on the label of the training sample and n second outputs includes:
    按照所述标签的数据格式,将n个所述第二输出转换为所述标签的数据格式的n个第二转换输出;Convert n second outputs into n second conversion outputs in the data format of the tag according to the data format of the tag;
    基于所述标签和n个所述第二转换输出,确定所述权重。The weight is determined based on the label and n of the second transformation outputs.
  6. 根据权利要求1至5中任一项所述的方法,其中,所述基于所述第一损失函数和所述第二损失函数,获取所述总损失函数,包括:The method according to any one of claims 1 to 5, wherein said obtaining the total loss function based on the first loss function and the second loss function includes:
    对所述第一损失函数和所述第二损失函数进行加权求和,获取所述总损失函数。Perform a weighted sum of the first loss function and the second loss function to obtain the total loss function.
  7. 一种图像处理装置,包括:An image processing device, including:
    获取模块,用于获取待处理图像;Acquisition module, used to obtain images to be processed;
    处理模块,用于将所述待处理图像输入目标图像模型中,由所述目标图像模型输出所述待处理图像的处理结果,其中,通过以下模块训练所述目标图像模型:第一获取模块,用于将训练样本分别输入至学生模型和n个教师模型中,获取所述学生模型的第一输出,并获取n个所述教师模型的第二输出,其中,n为正整数;A processing module, configured to input the image to be processed into a target image model, and the target image model to output the processing result of the image to be processed, wherein the target image model is trained through the following modules: a first acquisition module, Used to input training samples into the student model and n teacher models respectively, obtain the first output of the student model, and obtain the second output of the n teacher models, where n is a positive integer;
    确定模块,用于基于所述训练样本的标签和n个所述第二输出,确定所述训练样本对应的权重;A determination module configured to determine the weight corresponding to the training sample based on the label of the training sample and n second outputs;
    第二获取模块,用于基于所述第一输出和所述权重,获取所述学生模型的总损失函数;和A second acquisition module, configured to acquire the total loss function of the student model based on the first output and the weight; and
    训练模块,用于基于所述总损失函数对所述学生模型的模型参数进行更新,得到训练后的目标模型,A training module, used to update the model parameters of the student model based on the total loss function to obtain the trained target model,
    其中,所述权重包括第一权重和所述教师模型对应的第二权重;Wherein, the weight includes a first weight and a second weight corresponding to the teacher model;
    其中,所述第二获取模块,还用于:Wherein, the second acquisition module is also used for:
    基于所述第一输出、所述标签和所述第一权重,获取所述学生模型的第一损失函数;Obtain a first loss function of the student model based on the first output, the label, and the first weight;
    基于所述第一输出、n个所述第二输出和所述第二权重,获取所述学生模型的第二损失函数;Obtain a second loss function of the student model based on the first output, n second outputs and the second weight;
    基于所述第一损失函数和所述第二损失函数,获取所述总损失函数;Based on the first loss function and the second loss function, obtain the total loss function;
    其中,所述确定模块,还用于:Among them, the determination module is also used to:
    将所述标签和n个所述第二输出进行比对,获取与所述标签一致的第二输出的目标数量;Compare the label with n second outputs to obtain a target number of second outputs consistent with the label;
    基于所述目标数量,确定所述第一权重,其中,所述第一权重与所述目标数量正相关;Based on the target quantity, the first weight is determined, wherein the first weight is positively related to the target quantity;
    其中,所述确定模块,还用于:Among them, the determination module is also used to:
    将所述标签和第i个教师模型的第二输出进行比对;Compare the label with the second output of the i-th teacher model;
    响应于所述标签与所述第i个教师模型的第二输出一致,确定所述第i个教师模型对应的第二权重为1;或者,In response to the label being consistent with the second output of the i-th teacher model, determine the second weight corresponding to the i-th teacher model to be 1; or,
    响应于所述标签与所述第i个教师模型的第二输出不一致,获取与所述标签一 致的第二输出的目标数量;In response to the label being inconsistent with the second output of the i-th teacher model, obtaining a target number of second outputs consistent with the label;
    基于所述目标数量,确定所述第i个教师模型对应的第二权重,其中,所述第二权重与所述目标数量正相关。Based on the target number, a second weight corresponding to the i-th teacher model is determined, where the second weight is positively related to the target number.
  8. 根据权利要求7所述的装置,其中,所述第二获取模块,还用于:The device according to claim 7, wherein the second acquisition module is also used for:
    基于所述第一输出和所述标签,获取所述学生模型的第一初始损失函数;Obtaining a first initial loss function of the student model based on the first output and the label;
    基于所述第一初始损失函数和所述第一权重,获取所述第一损失函数。The first loss function is obtained based on the first initial loss function and the first weight.
  9. 根据权利要求7或8所述的装置,其中,所述第二获取模块,还用于:The device according to claim 7 or 8, wherein the second acquisition module is also used for:
    基于所述第一输出和第i个教师模型的第二输出,获取所述学生模型的第i个第三初始损失函数,其中,1≤i≤n,i为正整数;Based on the first output and the second output of the i-th teacher model, obtain the i-th third initial loss function of the student model, where 1≤i≤n, i is a positive integer;
    基于所述第i个第三初始损失函数和所述第i个教师模型对应的第二权重,获取所述学生模型的第i个第三损失函数;和Based on the i-th third initial loss function and the second weight corresponding to the i-th teacher model, obtain the i-th third loss function of the student model; and
    基于所述学生模型的n个第三损失函数,获取所述第二损失函数。The second loss function is obtained based on n third loss functions of the student model.
  10. 根据权利要求9所述的装置,其中,所述第二获取模块,还用于:The device according to claim 9, wherein the second acquisition module is also used for:
    获取所述学生模型的n个第三损失函数的平均值,并将所述平均值确定为所述第二损失函数。Obtain the average value of n third loss functions of the student model, and determine the average value as the second loss function.
  11. 根据权利要求7至10中任一项所述的装置,其中,所述确定模块,还用于:The device according to any one of claims 7 to 10, wherein the determining module is also used to:
    按照所述标签的数据格式,将n个所述第二输出转换为所述标签的数据格式的n个第二转换输出;和converting n second outputs into n second conversion outputs in the data format of the tag according to the data format of the tag; and
    基于所述标签和n个所述第二转换输出,确定所述权重。The weight is determined based on the label and n of the second transformation outputs.
  12. 根据权利要求7至11中任一项所述的装置,其中,所述第二获取模块,还用于:The device according to any one of claims 7 to 11, wherein the second acquisition module is also used to:
    对所述第一损失函数和所述第二损失函数进行加权求和,获取所述总损失函数。Perform a weighted sum of the first loss function and the second loss function to obtain the total loss function.
  13. 一种电子设备,包括:An electronic device including:
    至少一个处理器;以及at least one processor; and
    与所述至少一个处理器通信连接的存储器;其中,a memory communicatively connected to the at least one processor; wherein,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如权利要求1-6中任一项所述的图像处理方法。The memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to perform as claimed in any one of claims 1-6. The image processing method described above.
  14. 一种存储有计算机指令的非瞬时计算机可读存储介质,其中,所述计算机指令用于使所述计算机执行如权利要求1-6中任一项所述的图像处理方法。A non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause the computer to execute the image processing method according to any one of claims 1-6.
  15. 一种计算机程序产品,包括计算机程序,所述计算机程序在被处理器执行时实现如权利要求1-6中任一项所述的图像处理方法。A computer program product, including a computer program that implements the image processing method according to any one of claims 1-6 when executed by a processor.
  16. 一种计算机程序,包括计算机程序代码,当所述计算机程序代码在计算机上运行时,使得所述计算机执行如权利要求1-6中任一项所述的图像处理方法。A computer program, including computer program code, when the computer program code is run on a computer, causes the computer to perform the image processing method according to any one of claims 1-6.
PCT/CN2022/139730 2022-08-16 2022-12-16 Image processing method and apparatus, and electronic device and storage medium WO2024036847A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210981983.6A CN115063875B (en) 2022-08-16 2022-08-16 Model training method, image processing method and device and electronic equipment
CN202210981983.6 2022-08-16

Publications (1)

Publication Number Publication Date
WO2024036847A1 true WO2024036847A1 (en) 2024-02-22

Family

ID=83207480

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/139730 WO2024036847A1 (en) 2022-08-16 2022-12-16 Image processing method and apparatus, and electronic device and storage medium

Country Status (2)

Country Link
CN (1) CN115063875B (en)
WO (1) WO2024036847A1 (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115063875B (en) * 2022-08-16 2022-12-16 北京百度网讯科技有限公司 Model training method, image processing method and device and electronic equipment
CN115578614B (en) * 2022-10-21 2024-03-12 北京百度网讯科技有限公司 Training method of image processing model, image processing method and device
CN116416500B (en) * 2023-03-24 2024-04-05 北京百度网讯科技有限公司 Image recognition model training method, image recognition device and electronic equipment
CN116361658A (en) * 2023-04-07 2023-06-30 北京百度网讯科技有限公司 Model training method, task processing method, device, electronic equipment and medium
CN117350354B (en) * 2023-09-21 2024-06-18 摩尔线程智能科技(北京)有限责任公司 Training method and device for large model, electronic equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111126573A (en) * 2019-12-27 2020-05-08 深圳力维智联技术有限公司 Model distillation improvement method and device based on individual learning and storage medium
CN111582500A (en) * 2020-05-07 2020-08-25 支付宝(杭州)信息技术有限公司 Method and system for improving model training effect
CN112749728A (en) * 2020-08-13 2021-05-04 腾讯科技(深圳)有限公司 Student model training method and device, computer equipment and storage medium
US20210158126A1 (en) * 2019-11-25 2021-05-27 Beijing Xiaomi Intelligent Technology Co., Ltd. Method and device for compressing a neural network model for machine translation and storage medium
CN113705362A (en) * 2021-08-03 2021-11-26 北京百度网讯科技有限公司 Training method and device of image detection model, electronic equipment and storage medium
CN114037052A (en) * 2021-10-29 2022-02-11 北京百度网讯科技有限公司 Training method and device for detection model, electronic equipment and storage medium
CN115063875A (en) * 2022-08-16 2022-09-16 北京百度网讯科技有限公司 Model training method, image processing method, device and electronic equipment

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20200128938A (en) * 2019-05-07 2020-11-17 삼성전자주식회사 Model training method and apparatus, and data recognizing method
CN110826344B (en) * 2019-10-24 2022-03-01 北京小米智能科技有限公司 Neural network model compression method, corpus translation method and apparatus thereof
CN111090756B (en) * 2020-03-24 2020-07-17 腾讯科技(深圳)有限公司 Artificial intelligence-based multi-target recommendation model training method and device
CN113361572B (en) * 2021-05-25 2023-06-27 北京百度网讯科技有限公司 Training method and device for image processing model, electronic equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210158126A1 (en) * 2019-11-25 2021-05-27 Beijing Xiaomi Intelligent Technology Co., Ltd. Method and device for compressing a neural network model for machine translation and storage medium
CN111126573A (en) * 2019-12-27 2020-05-08 深圳力维智联技术有限公司 Model distillation improvement method and device based on individual learning and storage medium
CN111582500A (en) * 2020-05-07 2020-08-25 支付宝(杭州)信息技术有限公司 Method and system for improving model training effect
CN112749728A (en) * 2020-08-13 2021-05-04 腾讯科技(深圳)有限公司 Student model training method and device, computer equipment and storage medium
CN113705362A (en) * 2021-08-03 2021-11-26 北京百度网讯科技有限公司 Training method and device of image detection model, electronic equipment and storage medium
CN114037052A (en) * 2021-10-29 2022-02-11 北京百度网讯科技有限公司 Training method and device for detection model, electronic equipment and storage medium
CN115063875A (en) * 2022-08-16 2022-09-16 北京百度网讯科技有限公司 Model training method, image processing method, device and electronic equipment

Also Published As

Publication number Publication date
CN115063875A (en) 2022-09-16
CN115063875B (en) 2022-12-16

Similar Documents

Publication Publication Date Title
JP7331171B2 (en) Methods and apparatus for training image recognition models, methods and apparatus for recognizing images, electronic devices, storage media, and computer programs
WO2024036847A1 (en) Image processing method and apparatus, and electronic device and storage medium
US20220253631A1 (en) Image processing method, electronic device and storage medium
US20220415072A1 (en) Image processing method, text recognition method and apparatus
EP3933708A2 (en) Model training method, identification method, device, storage medium and program product
EP4116861A2 (en) Method and apparatus for pre-training semantic representation model and electronic device
US20220391587A1 (en) Method of training image-text retrieval model, method of multimodal image retrieval, electronic device and medium
EP3955216A2 (en) Method and apparatus for recognizing image, electronic device and storage medium
CN114494784A (en) Deep learning model training method, image processing method and object recognition method
US20230177326A1 (en) Method and apparatus for compressing neural network model
WO2023093014A1 (en) Bill recognition method and apparatus, and device and storage medium
WO2022227759A1 (en) Image category recognition method and apparatus and electronic device
CN117746125A (en) Training method and device of image processing model and electronic equipment
US20230215203A1 (en) Character recognition model training method and apparatus, character recognition method and apparatus, device and storage medium
CN113657411A (en) Neural network model training method, image feature extraction method and related device
US20230081015A1 (en) Method and apparatus for acquiring information, electronic device and storage medium
US20220343662A1 (en) Method and apparatus for recognizing text, device and storage medium
KR20230133808A (en) Method and apparatus for training roi detection model, method and apparatus for detecting roi, device, and medium
US20230070966A1 (en) Method for processing question, electronic device and storage medium
CN114881227B (en) Model compression method, image processing device and electronic equipment
WO2023087667A1 (en) Sorting model training method and apparatus for intelligent recommendation, and intelligent recommendation method and apparatus
WO2023159819A1 (en) Visual processing and model training methods, device, storage medium and program product
CN114817476A (en) Language model training method and device, electronic equipment and storage medium
CN114707638A (en) Model training method, model training device, object recognition method, object recognition device, object recognition medium and product
CN115497112B (en) Form recognition method, form recognition device, form recognition equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22955614

Country of ref document: EP

Kind code of ref document: A1