WO2021042857A1 - Procédé de traitement et appareil de traitement pour modèle de segmentation d'image - Google Patents

Procédé de traitement et appareil de traitement pour modèle de segmentation d'image Download PDF

Info

Publication number
WO2021042857A1
WO2021042857A1 PCT/CN2020/100058 CN2020100058W WO2021042857A1 WO 2021042857 A1 WO2021042857 A1 WO 2021042857A1 CN 2020100058 W CN2020100058 W CN 2020100058W WO 2021042857 A1 WO2021042857 A1 WO 2021042857A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
feature extraction
image segmentation
sub
extraction sub
Prior art date
Application number
PCT/CN2020/100058
Other languages
English (en)
Chinese (zh)
Inventor
韩凯
闻长远
舒晗
陈翼翼
苏霞
王云鹤
许春景
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2021042857A1 publication Critical patent/WO2021042857A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds

Definitions

  • This application relates to the field of image processing, and more specifically, to a processing method and processing device of an image segmentation model.
  • Image segmentation also has application requirements on edge devices, such as vehicles or mobile phones.
  • edge devices such as vehicles or mobile phones.
  • the present application provides a processing method and processing device for an image segmentation model, which helps to improve the segmentation accuracy of the image segmentation model, so that image segmentation can be realized on edge devices.
  • a method for processing an image segmentation model includes a feature extraction submodel and an image segmentation submodel.
  • the feature extraction submodel is used to extract features of an image, and the image segmentation submodel uses To segment the image according to the extracted features, the method includes: adjusting the layer width of the feature extraction sub-model to obtain a first feature extraction sub-model; extracting the sub-model and the image according to the first feature The segmentation sub-model obtains the target image segmentation model.
  • the image segmentation model may be trained or untrained.
  • adjusting the layer width of the feature extraction sub-model is helpful to improve the accuracy of the feature extraction sub-model, thereby improving the segmentation accuracy of the target image segmentation model, and making the image segmentation model easier to apply on edge devices.
  • the accuracy loss caused by the feature extraction sub-model being a binarized neural network model can be reduced, thereby improving the segmentation accuracy of the image segmentation model.
  • the adjusting the layer width of the feature extraction submodel includes: increasing the number of channels of the feature extraction submodel to obtain a second feature extraction submodel;
  • the second feature extraction sub-model generates K different first binarization codes, the first binarization codes include multiple binarization values, and the multiple binarization values are the same as the second feature extraction
  • the multiple channels of the sub-model have a one-to-one correspondence, and each binarization value of the multiple binarization values is used to indicate whether the channel corresponding to each binarization value is retained or removed, and K is an integer greater than 1.
  • K third feature extraction sub-models are obtained; according to the K third feature extraction sub-models The intersection ratio and calculation amount when each third feature extraction submodel performs feature extraction on the image, select M third feature extraction submodels from the K third feature extraction submodels, and M is an integer greater than 1.
  • the probability that the j-th third feature extraction sub-model in the M third feature extraction sub-models is selected satisfies the following formula:
  • Pr(b j ) represents the probability that the j-th third feature extraction sub-model is selected, and f(b j ) satisfies the following formula:
  • mIoU(j) represents the intersection ratio when the j-th third feature extraction sub-model performs feature extraction on the image
  • N(j) represents the calculation amount when the j-th third feature extraction sub-model performs feature extraction on the image
  • is a preset parameter
  • the obtaining a target image segmentation model according to the first feature extraction submodel and the image segmentation submodel includes: extracting the first feature by using a teacher feature extraction model The model performs knowledge distillation to obtain a fifth feature extraction sub-model; the target image segmentation model is obtained according to the fifth feature extraction sub-model and the image segmentation sub-model.
  • the teacher model may include a feature extraction model trained by a conventional method.
  • knowledge distillation is performed on the feature extraction sub-models in the image segmentation model through the trained teacher model, which can reduce the accuracy loss caused by binarization, thereby reducing the accuracy loss of image segmentation.
  • the loss function of the fifth feature extraction sub-model satisfies the following relationship:
  • G ti represents the i-th scale feature among the T scale features output by the teacher feature extraction model
  • G si represents the i-th scale feature among the T scale features output by the fifth feature extraction sub-model
  • T I s a positive integer
  • the scale of the i-th scale feature output by the teacher feature extraction model is the same as the scale of the i-th scale feature output by the fifth feature extraction sub-model
  • " represents the matrix norm.
  • the obtaining the target image segmentation model according to the fifth feature extraction submodel and the image segmentation submodel includes: segmenting the image according to the teacher image segmentation model The sub-model performs knowledge distillation to obtain a target image segmentation sub-model, where the target image segmentation model includes the fifth feature extraction sub-model and the target image segmentation sub-model.
  • the teacher model may include an image segmentation model trained by a conventional method.
  • the image segmentation sub-model is knowledge distilled through the trained teacher model, which can improve the accuracy of the image segmentation model.
  • the accuracy loss caused by the feature extraction sub-model being a binarized neural network model can be reduced, thereby improving the accuracy of the image segmentation model.
  • the loss function of the target image segmentation sub-model satisfies the following relationship:
  • ⁇ (P T ) soft max(a T / ⁇ )
  • ⁇ (P S ) soft max(a S / ⁇ )
  • L ⁇ H(y,P S )+ ⁇ *H( ⁇ (P T ) , ⁇ (P S ))
  • P T represents the segmentation result of the teacher image segmentation model
  • P S represents the segmentation result of the target image segmentation sub-model
  • H represents the cross-entropy loss function
  • y is used to indicate the target image segmentation Whether the segmentation result of the sub-model is correct
  • is the preset trade-off coefficient
  • softmax represents the maximum function of flexibility.
  • the feature extraction sub-model may be a binarized neural network model, which can reduce the amount of parameters and calculations of the image segmentation model, thereby facilitating the application of the image segmentation model to edge devices.
  • an image segmentation model processing device includes a feature extraction submodel and an image segmentation submodel.
  • the feature extraction submodel is used to extract features of an image.
  • the image segmentation submodel uses To segment the image according to the extracted features, the processing device includes: a layer width adjustment module and a knowledge distillation module.
  • the layer width adjustment module is used to adjust the layer width of the feature extraction sub-model in the image segmentation model to obtain the first feature extraction sub-model;
  • the knowledge distillation module is used to extract the sub-model and the image segmentation sub-model according to the first feature The model obtains the target image segmentation model.
  • the device adjusts the layer width of the feature extraction submodel, which helps to improve the accuracy of the feature extraction submodel, thereby improving the segmentation accuracy of the image segmentation model, and further helps the image segmentation model to be applied to edge devices.
  • the accuracy loss caused by the feature extraction sub-model being a binarized neural network model can be reduced, thereby improving the segmentation accuracy of the image segmentation model.
  • the layer width adjustment module is specifically configured to: increase the number of channels of the feature extraction sub-model to obtain a second feature extraction sub-model;
  • the model generates K different first binarization codes, the first binarization codes include a plurality of binarization values, the plurality of binarization values and the plurality of channels of the second feature extraction sub-model One-to-one correspondence, each binarization value of the plurality of binarization values is used to indicate whether the channel corresponding to each binarization value is retained or removed, and K is an integer greater than 1, according to the K Different first binarization codes retain or remove the channels of the second feature extraction sub-models to obtain K third feature extraction sub-models; each third feature extraction in the K third feature extraction sub-models is extracted When the sub-models perform feature extraction on the image, the cross-union ratio and the calculation amount are selected, M third feature extraction sub-models are selected from the K third feature extraction sub-models, and M is an integer greater than 1;
  • the M first binarization codes include a pluralit
  • the probability that the j-th third feature extraction sub-model in the M third feature extraction sub-models is selected satisfies the following formula:
  • Pr(b j ) represents the probability that the j-th third feature extraction sub-model is selected, and f(b j ) satisfies the following formula:
  • mIoU(j) represents the intersection ratio when the j-th third feature extraction sub-model performs feature extraction on the image
  • N(j) represents the calculation amount when the j-th third feature extraction sub-model performs feature extraction on the image
  • is a preset parameter
  • the knowledge distillation module is specifically configured to: use the teacher feature extraction model to perform knowledge distillation on the first feature extraction model to obtain a fifth feature extraction sub-model; Five feature extraction submodels and the image segmentation submodel obtain the target image segmentation model.
  • the teacher model may include a feature extraction model trained by a conventional method.
  • knowledge distillation is performed on the feature extraction sub-models in the image segmentation model through the trained teacher model, which can improve the accuracy of image segmentation.
  • the accuracy loss caused by the feature extraction sub-model being a binarized neural network model can be reduced, thereby improving the accuracy of the image segmentation model.
  • the loss function of the fifth feature extraction sub-model satisfies the following relationship:
  • G ti represents the i-th scale feature among the T scale features output by the teacher feature extraction model
  • G si represents the i-th scale feature among the T scale features output by the fifth feature extraction sub-model
  • T I s a positive integer
  • the scale of the i-th scale feature output by the teacher feature extraction model is the same as the scale of the i-th scale feature output by the fifth feature extraction sub-model
  • " represents the matrix norm.
  • the knowledge distillation module is specifically configured to: perform knowledge distillation on the image segmentation submodel according to the teacher image segmentation model to obtain a target image segmentation submodel, wherein the target image
  • the segmentation model includes the fifth feature extraction sub-model and the target image segmentation sub-model.
  • the teacher model may include an image segmentation model trained by a conventional method.
  • knowledge distillation of the image segmentation sub-model through the trained teacher model can reduce the accuracy loss caused by the binarization, thereby reducing the accuracy loss of the image segmentation.
  • the loss function of the target image segmentation sub-model satisfies the following relationship:
  • ⁇ (P T ) soft max(a T / ⁇ )
  • ⁇ (P S ) soft max(a S / ⁇ )
  • L ⁇ H(y,P S )+ ⁇ *H( ⁇ (P T ) , ⁇ (P S ))
  • P T denotes segmentation result teacher image segmentation model
  • P S represents the division result of the image segmentation sub-target model
  • H represents a cross entropy loss function
  • y for indicating the result of dividing the target image segmentation whether submodels Correct
  • is the preset trade-off coefficient
  • softmax represents the maximum function of flexibility
  • the feature extraction sub-model may be a binarized neural network model, which can reduce the amount of parameters and calculations of the image segmentation model, thereby facilitating the application of the image segmentation model to edge devices.
  • an image segmentation model processing device includes: a memory for storing a program; a processor for executing the program stored in the memory, and when the program stored in the memory is executed, The processor is configured to execute the method in any one of the foregoing first aspect implementation manners.
  • the processor in the third aspect mentioned above can be either a central processing unit (CPU), or a combination of a CPU and a neural network processing unit.
  • the neural network processing unit here can include a graphics processing unit (graphics processing unit). unit, GPU), neural-network processing unit (NPU), tensor processing unit (TPU), and so on.
  • graphics processing unit graphics processing unit
  • NPU neural-network processing unit
  • TPU tensor processing unit
  • Google Google
  • Google fully customized artificial intelligence accelerator ASIC for machine learning.
  • a computer-readable medium stores program code for device execution, and the program code includes a method for executing any one of the implementation manners in the first aspect.
  • a computer program product containing instructions is provided.
  • the computer program product runs on a computer, the computer executes the method in any one of the above-mentioned first aspects.
  • a chip in a sixth aspect, includes a processor and a data interface.
  • the processor reads instructions stored in a memory through the data interface, and executes any one of the implementations in the first aspect. method.
  • the chip may further include a memory in which instructions are stored, and the processor is configured to execute instructions stored on the memory.
  • the processor is configured to execute the method in any one of the implementation manners in the first aspect.
  • the aforementioned chip may specifically be a field-programmable gate array (FPGA) or an application-specific integrated circuit (ASIC).
  • FPGA field-programmable gate array
  • ASIC application-specific integrated circuit
  • an electronic device which includes the processing device in the second aspect described above, or the electronic device includes the processing device in the third aspect described above.
  • Figure 1 is a schematic diagram of the deployment of the processing device of the present application.
  • Figure 2 is a schematic structural diagram of the computing device of the present application.
  • Fig. 3 is a schematic structural diagram of a processing device according to an embodiment of the present application.
  • FIG. 4 is a schematic flowchart of a processing method according to an embodiment of the present application.
  • Fig. 5 is a schematic structural diagram of a chip according to an embodiment of the present application.
  • the embodiments of the present application involve a large number of related applications of neural networks.
  • a neural network can be composed of neural units.
  • a neural unit can refer to an arithmetic unit that takes x s and intercept 1 as inputs.
  • the output of the arithmetic unit can be as shown in formula (1-1):
  • s 1, 2,...n, n is a natural number greater than 1
  • W s is the weight of x s
  • b is the bias of the neural unit.
  • f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal.
  • the output signal of the activation function can be used as the input of the next convolutional layer, and the activation function can be a sigmoid function.
  • a neural network is a network formed by connecting multiple above-mentioned single neural units together, that is, the output of one neural unit can be the input of another neural unit.
  • the input of each neural unit can be connected with the local receptive field of the previous layer to extract the characteristics of the local receptive field.
  • the local receptive field can be a region composed of several neural units.
  • Deep neural network also known as multi-layer neural network
  • the DNN is divided according to the positions of different layers.
  • the neural network inside the DNN can be divided into three categories: input layer, hidden layer, and output layer.
  • the first layer is the input layer
  • the last layer is the output layer
  • the number of layers in the middle are all hidden layers.
  • the layers are fully connected, that is to say, any neuron in the i-th layer must be connected to any neuron in the i+1th layer.
  • DNN looks complicated, it is not complicated as far as the work of each layer is concerned. Simply put, it is the following linear relationship expression: among them, Is the input vector, Is the output vector, Is the offset vector, W is the weight matrix (also called coefficient), and ⁇ () is the activation function.
  • Each layer is just the input vector After such a simple operation, the output vector is obtained Due to the large number of DNN layers, the coefficient W and the offset vector The number is also relatively large.
  • DNN The definition of these parameters in DNN is as follows: Take coefficient W as an example: Suppose in a three-layer DNN, the linear coefficients from the fourth neuron in the second layer to the second neuron in the third layer are defined as The superscript 3 represents the number of layers where the coefficient W is located, and the subscript corresponds to the output third-level index 2 and the input second-level index 4.
  • the coefficient from the kth neuron in the L-1th layer to the jth neuron in the Lth layer is defined as
  • Convolutional neural network (convolutional neuron network, CNN) is a deep neural network with a convolutional structure.
  • the convolutional neural network contains a feature extractor composed of a convolutional layer and a sub-sampling layer.
  • the feature extractor can be regarded as a filter.
  • the convolutional layer refers to the neuron layer that performs convolution processing on the input signal in the convolutional neural network.
  • a neuron can only be connected to a part of the neighboring neurons.
  • a convolutional layer usually contains several feature planes, and each feature plane can be composed of some rectangularly arranged neural units. Neural units in the same feature plane share weights, and the shared weights here are the convolution kernels.
  • Sharing weight can be understood as the way of extracting image information has nothing to do with location.
  • the convolution kernel can be initialized in the form of a matrix of random size, and the convolution kernel can obtain reasonable weights through learning during the training process of the convolutional neural network.
  • the direct benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, and at the same time reduce the risk of overfitting.
  • Important equation taking the loss function as an example, the higher the output value (loss) of the loss function, the greater the difference, then the training of the deep neural network becomes a process of reducing this loss as much as possible.
  • the neural network can use an error back propagation (BP) algorithm to correct the size of the parameters in the initial neural network model during the training process, so that the reconstruction error loss of the neural network model becomes smaller and smaller. Specifically, passing the input signal in the forward direction until the output will cause error loss, and the parameters in the initial neural network model are updated by backpropagating the error loss information, so that the error loss is converged.
  • the backpropagation algorithm is a backpropagation motion dominated by error loss, and aims to obtain the optimal parameters of the neural network model, such as the weight matrix.
  • Binarized neural network is a neural network that binarizes the weights and activation values of some or all layers into 1 or -1. Binarized neural networks usually only binarize the weights and activation values of floating-point neural networks, without changing the structure of the network.
  • the weights and activation values in the binary neural network can occupy a smaller storage space, and theoretically, the memory consumption is reduced to 1/32 times of the memory consumption of the floating-point neural network.
  • the binary neural network uses bit operations to replace the multiplication and addition operations in the floating-point neural network, which greatly reduces the computing time.
  • the binary neural network can also be called a binary neural network or a binary network.
  • a complex neural network model is a collection of several individual neural network models, or a larger and more complex network model trained under some strong constraints (such as a high random inactivation rate).
  • the method of training a smaller and simpler network model (for example, a reduced model that can be configured on the application side) based on the larger and more complex network model is called knowledge distillation.
  • the larger and more complex network model is called the teacher network
  • the smaller and simpler network model is called the student network.
  • the teacher network can provide more accurate supervision information.
  • the student network has greater computational throughput and fewer model parameters.
  • Neural network model is a kind of mathematical calculation model that imitates the structure and function of biological neural network (animal's central nervous system).
  • a neural network model can include a variety of neural network layers with different functions, and each layer includes parameters and calculation formulas. According to different calculation formulas or different functions, different layers in the neural network model have different names. For example, the layer that performs convolution calculations is called a convolutional layer.
  • the convolutional layer is often used for input signals (for example: image ) Perform feature extraction.
  • a neural network model can also be composed of a combination of multiple existing neural network models.
  • Neural network models with different structures can be used in different scenes (for example, classification, recognition or image segmentation) or provide different effects when used in the same scene.
  • Different neural network model structures specifically include one or more of the following: the number of network layers in the neural network model is different, the order of each network layer is different, and the weights, parameters or calculation formulas in each network layer are different.
  • neural network models with higher accuracy for application scenarios such as recognition or classification or image segmentation in the industry.
  • some neural network models can be trained by a specific training set to complete a task alone or combined with other neural network models (or other functional modules) to complete a task.
  • Some neural network models can also be used directly to complete a task alone or in combination with other neural network models (or other functional modules) to complete a task.
  • Edge device refers to any device with computing resources and network resources between the source of data generation and the cloud center.
  • a mobile phone is an edge device between a person and a cloud center
  • a gateway is an edge device between a smart home and a cloud center.
  • edge devices refer to devices that analyze or process data near the source of the data. Since there is no data flow, network traffic and response time are reduced.
  • the edge devices in the embodiments of this application may be mobile phones with computing capabilities, tablet personal computers (TPC), media players, smart homes, laptop computers (LC), and personal digital assistants (personal digital assistants).
  • assistant PDA
  • personal computer PC
  • camera video camera
  • smart watch wearable device
  • self-driving vehicle etc. It is understandable that the embodiment of the present application does not limit the specific form of the edge device.
  • Image processing such as image segmentation or image classification (recognition)
  • edge devices such as vehicles or mobile phones.
  • edge devices due to the limited computing performance and cache of edge devices, it is difficult to use neural networks to process images on these edge devices.
  • To use a neural network to process images on edge devices it is necessary to reduce the image processing model composed of the neural network, which will reduce the accuracy of the image processing model.
  • An example of the image processing model in the embodiment of the present application is an image segmentation model or an image classification model.
  • this application proposes an image segmentation model processing method and processing device.
  • the processing method and processing device improve the accuracy of the feature extraction model by adjusting the layer width of the feature extraction sub-model in the image segmentation model. Further, the processing method and processing device also improve the accuracy by performing knowledge distillation processing on the feature extraction sub-model and the image segmentation sub-model in the image segmentation model.
  • the processing method and processing device proposed in this application help to improve the segmentation accuracy of the image segmentation model, thereby helping to apply the image segmentation model on edge devices.
  • FIG. 1 is a schematic diagram of the deployment of a processing device provided by an embodiment of the present application.
  • the processing device can be deployed in a cloud environment, which is an entity that uses basic resources to provide cloud services to users in a cloud computing mode.
  • the cloud environment includes a cloud data center and a cloud service platform.
  • the cloud data center includes a large number of basic resources (including computing resources, storage resources, and network resources) owned by a cloud service provider.
  • the computing resources included in the cloud data center can be a large number of computing resources.
  • Device for example, server).
  • the processing device may be a server used to adjust the image segmentation model in the cloud data center; the processing device may also be a virtual machine created in the cloud data center for adjusting the image processing model; the processing device may also be deployed in A software device on a server or virtual machine in a cloud data center.
  • the software device is used to adjust the image segmentation model.
  • the software device can be distributed on multiple servers or distributed on multiple virtual machines It can be deployed on virtual machines and servers in a distributed manner.
  • the processing device can be abstracted by the cloud service provider into a cloud service that adjusts the image processing model on the cloud service platform and provided to the user.
  • the cloud environment uses the cloud service to provide The user provides the cloud service for adjusting the image processing model.
  • the user can upload the image processing model to be adjusted to the cloud environment through the application program interface (API) or through the web interface provided by the cloud service platform, and the processing device receives the image to be adjusted
  • API application program interface
  • the processing model is to adjust the image processing model to be adjusted, and the adjustment result is returned by the processing device to the edge device where the user is located, or the adjustment result is stored in the cloud environment, for example, presented on the web interface of the cloud service platform for the user to view.
  • the processing device When the processing device is a software device, the processing device may also be separately deployed on a computing device in any environment, for example, separately deployed on an edge device or separately deployed on a computing device in a data center.
  • the computing device 200 includes a bus 201, a processor 202, a communication interface 203, and a memory 204.
  • the processor 202, the memory 204, and the communication interface 203 communicate through a bus 201.
  • the processor 202 may be a central processing unit (CPU).
  • the memory 204 may include a volatile memory (volatile memory), such as a random access memory (random access memory, RAM).
  • the memory 204 may also include a non-volatile memory (2non-volatile memory, 2NVM), such as a read-only memory (2read-only memory, 2ROM), a flash memory, a hard disk drive (HDD), or a solid-state starter ( solid state disk, SSD).
  • the memory 204 stores executable code included in the detection device, and the processor 202 reads the executable code in the memory 204 to execute the video similarity detection method.
  • the memory 204 may also include an operating system and other software modules required for running processes.
  • the operating system can be LINUX TM , UNIX TM , WINDOWS TM etc.
  • FIG. 3 is a schematic structural diagram of a processing device 300 according to an embodiment of the application.
  • the processing device is used to process the image segmentation model, thereby obtaining a target image segmentation model with a relatively small amount of calculation and a small loss of accuracy. This can make the target image segmentation model more suitable for application on edge devices.
  • the processing device 300 includes a layer width adjustment module 310 and a knowledge distillation module 320.
  • the image segmentation model to be adjusted may include a feature extraction sub-model and an image segmentation sub-model.
  • the feature extraction sub-model is used to extract features of the image, and the image segmentation sub-model is used to perform the process on the image according to the extracted features. segmentation.
  • An example of the feature extraction sub-model is a convolutional neural network; an example of the image segmentation sub-model is a convolutional neural network.
  • FIG. 4 is a schematic flowchart of a processing method of an image segmentation model according to an embodiment of the present application.
  • the method shown in FIG. 4 includes S410 and S420.
  • the processing method shown in FIG. 3 is taken as an example below to introduce the processing method of the present application.
  • S410 Adjust the layer width of the feature extraction sub-model in the image segmentation model to obtain a first feature extraction sub-model. This step may be executed by the layer width processing module 310.
  • the layer width adjustment of the feature extraction submodel described here can be understood as adjusting the channels of all or part of the feature extraction submodel. For example, first add some channels, and then remove some channels according to a reasonable method. The accuracy loss of the feature extraction sub-model when extracting features is reduced, so that the segmentation accuracy of the image segmentation model can be improved.
  • the feature extraction sub-model can perform feature extraction of different scales on the image.
  • genetic algorithms can be used to adjust the layer width of the feature extraction sub-model
  • an implementation method includes the following operations.
  • the first binarized codes include multiple binarized values, and multiple binarized values are combined with The multiple channels have a one-to-one correspondence, each of the multiple binarized values is used to indicate whether the channel corresponding to each binarized value is retained or removed, and K is an integer greater than 1.
  • the channels of the second feature extraction submodel are retained or removed, and K third feature extraction submodels are obtained.
  • each third feature extraction submodel in the K third feature extraction submodels performs feature extraction on the image (for example, the image in the verification image set), from the K third feature extraction submodels Select M third feature extraction submodels, where M is a positive integer. At this time, if M is 1, that is, only one third feature extraction sub-model is selected, the third feature extraction sub-model can be regarded as the first feature extraction sub-model. If M is greater than 1, you can continue to perform the following operations.
  • S is an integer greater than 1.
  • the value of S can be set in advance based on experience.
  • the channels of the second feature extraction sub-models are retained or removed to obtain S fourth feature extraction sub-models.
  • binarizing multiple channels of the second feature extraction sub-model to obtain the first binarization code can be understood as: generating a string of values, the string of values includes multiple values, and the value of each value is equal It is any one of the two preset values, the number of this string of values is the same as the total number of the multiple channels of the second feature extraction sub-model, and multiple values in this string of values are the same as the second feature extraction sub-model There are multiple channels in one-to-one correspondence, and each value in this string of values is used to indicate whether the corresponding channel is retained or removed. This string of values is the binary code.
  • the number of repeated executions of the fourth operation to the sixth operation can be preset. When all the times are executed, if there are multiple fourth feature sub-models obtained, one of them can be selected as the first feature extraction sub-model, for example, the fourth feature sub-model with the highest probability is selected as the first feature extraction sub-model .
  • the fitness of the third feature sub-model can be calculated according to the intersection ratio and calculation amount when each third feature extraction sub-model performs feature extraction on the image, and then the third feature extractor can be calculated according to the fitness. The probability of the model being selected.
  • the fitness f(b j ) of the j-th third feature extraction sub-model in the M third feature extraction sub-models satisfies the following formula:
  • mIoU(j) represents the intersection ratio when the jth third feature extraction submodel performs feature extraction on the image
  • N(j) represents the jth third feature extraction submodel performs feature extraction on the image.
  • the calculation amount during feature extraction, ⁇ is a preset parameter, also called a hyperparameter.
  • the probability Pr(b j ) of the j-th third feature extraction submodel being selected satisfies the following formula:
  • the K third feature sub-models can be sorted in descending order of probability values, and then select the first M
  • the third feature sub-model is used as the fourth feature extraction sub-model, and the size of M can be preset based on experience.
  • any two first binarization codes are selected from the M first binarization codes.
  • a binary code and then the partial codes of the same length in the two first binarized codes are exchanged to obtain two new binarized codes.
  • one first binary code is 0101011100100101 and the other first binary code is 0101101010110110.
  • Exchange the sixth to twelfth values of the two first binary codes to obtain 0101001010110101 and 0101111100100110, respectively.
  • the value of any length is replaced with other values , So as to get a different binary code.
  • the binarization code 10010010101101010 the fourth to eleventh digit value 10010101 is selected and replaced with 01101010 to obtain the binarization code 100011010101010.
  • the binarized code obtained by mutation is called the second binarized code.
  • the M first binarization codes corresponding to the M third feature extraction submodels may be directly cross-processed to obtain M second binarization codes.
  • M is equal to S.
  • the M first binarization codes corresponding to the M third feature extraction submodels may be cross-processed, and after the S new binarization codes are obtained, the mutation process may not be performed. At this time, the S binarization codes are S second binarization codes.
  • S420 Obtain a target image segmentation model according to the first feature extraction submodel and the image segmentation submodel in the image segmentation model. The steps may be performed by the knowledge distillation module 320.
  • the image segmentation model composed of the first feature extraction submodel and the image segmentation submodel may be used as the target image segmentation model.
  • the image segmentation model can be knowledge distilled through the trained teacher image segmentation model, and the image segmentation model obtained by distillation can be used as the target image segmentation model. This implementation can improve the segmentation accuracy of the target image segmentation model.
  • the first feature extraction sub-model can be knowledge-distilled according to the feature extraction sub-model in the teacher image segmentation model, and the image to be trained can be segmented according to the image segmentation sub-model in the teacher image segmentation model
  • the image segmentation sub-model in the model performs knowledge distillation.
  • the feature extraction sub-model in the teacher model can be called the teacher feature extraction sub-model
  • the image segmentation sub-model in the teacher model can be called the teacher image segmentation sub-model.
  • the following takes the first feature extraction sub-model to perform multi-scale feature extraction on images as an example to introduce the realization of knowledge distillation of the image segmentation model by the teacher image segmentation model. Among them, knowledge distillation is also called coaching training.
  • the feature map of the i-th scale extracted by the teacher feature extraction sub-model is recorded as
  • the feature map of the i-th scale extracted by the first feature extraction sub-model is recorded as
  • c ti and c si is the channel number
  • h i and w i is a characteristic graph of the height and width
  • i is taken from 1 to 4.
  • can be any norm of the matrix.
  • the value of i is taken from 1 to 4, and four loss functions of different scales are obtained.
  • the first feature extraction sub-model is instructed to train, that is, knowledge distillation is performed to obtain the fifth feature extraction sub-model.
  • ⁇ (P T ) softmax(a T / ⁇ )
  • ⁇ (P S ) softmax(a S / ⁇ )
  • L ⁇ H(y,P S )+ ⁇ *H( ⁇ (P T ), ⁇ (P S ))
  • P T represents the segmentation result of the teacher image segmentation model
  • P S represents the segmentation result of the image segmentation sub-model
  • H represents the cross-entropy loss function
  • y is used to indicate whether the segmentation result of the image segmentation sub-model is correct
  • is the preset trade-off coefficient
  • softmax represents the maximum function of flexibility.
  • the fifth feature extraction sub-model and the target image segmentation sub-model constitute the target image segmentation model.
  • the target image segmentation model obtained by knowledge distillation can greatly improve the accuracy of the output result, that is, the accuracy of the segmentation result.
  • knowledge distillation may be performed on only the first feature extraction submodel, and the image segmentation model composed of the feature extraction submodel obtained by distillation and the image segmentation submodel can be used as the target image segmentation model.
  • This implementation can also improve the segmentation accuracy of the target image segmentation model.
  • knowledge distillation may be performed on only the image segmentation sub-model, and the image segmentation model composed of the image segmentation sub-model obtained by distillation and the first feature extraction sub-model can be used as the target image segmentation model.
  • This implementation can also improve the segmentation accuracy of the target image segmentation model.
  • the feature extraction sub-model in the embodiment of the present application may be a binary neural network model. This can reduce the parameter amount and calculation amount of the feature extraction sub-model, thereby reducing the parameter amount and calculation amount of the target image segmentation model, and is helpful for applying the image segmentation model on edge devices.
  • the binary neural network model may reduce the accuracy of the feature extraction sub-model, because the layer width of the feature extraction sub-model is adjusted in the embodiment of the present application, the loss of the accuracy can be reduced.
  • the teacher model is used to perform knowledge distillation on the image segmentation model, which can further reduce the loss of accuracy.
  • the present application also provides an image segmentation method.
  • the image segmentation method includes: using the image segmentation model obtained by the processing method in S410 to segment the image to be processed to obtain the segmentation result.
  • the image segmentation model includes: an image segmentation model obtained by using the processing method in S410.
  • the present application also provides a computing device 200 as shown in FIG. 2.
  • the processor 202 in the computing device 200 reads the executable code stored in the memory 204 to execute the processing method described in FIG. 4.
  • the present application also provides a chip 500 as shown in FIG. 5.
  • the chip 500 may include a processor 502, and the processor 502 reads executable codes stored in a memory to execute what is executed by the layer width adjustment module 310 and the knowledge distillation module 320 Steps, so as to realize the processing method described in FIG. 4.
  • the chip 500 may also include a memory 504 for storing executable codes.
  • the chip 500 may also include a communication interface 503 for inputting the image segmentation model to be trained and/or outputting the target image segmentation model. Optionally, it can also be used to input a teacher model.
  • This application also provides a processing method of a neural network model, which may be an image classification model, an image recognition model, a speech recognition model, and the like.
  • the neural network model may include a binary neural network sub-model.
  • the processing method includes: adjusting the layer width of the neural network model.
  • the method of adjusting the layer width of the neural network model can refer to S410.
  • processing method may further include performing knowledge distillation on the neural network model according to the teacher model. Refer to S420 for the way to realize knowledge distillation.
  • the present application also provides a computing device similar to the computing device 200.
  • the processor in the computing device reads executable codes stored in the memory to execute the aforementioned neural network model processing method.
  • the present application also provides a processing device similar to the processing device 300, and the processing device is used to execute the aforementioned neural network model processing method.
  • the present application also provides a chip similar to the chip 500, which is used to execute the aforementioned neural network model processing method.
  • the computer program product for video similarity detection includes one or more computer instructions for video similarity detection. When these computer program instructions are loaded and executed on the computer, the process or function according to FIG. 4 of the embodiment of the present application is generated in whole or in part.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer instructions may be transmitted from a website, computer, server, or data center. Transmission to another website, computer, server, or data center via wired (such as coaxial cable, optical fiber, digital subscriber line, or wireless (such as infrared, wireless, microwave, etc.)).
  • the computer-readable storage medium stores the video A readable storage medium of similarly detected computer program instructions.
  • the computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc., integrated with one or more available media.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, an SSD).

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

La présente invention concerne une technologie de segmentation d'image dans le domaine de l'intelligence artificielle, et concerne un procédé de traitement et un appareil de traitement pour un modèle de segmentation d'image. Le modèle de segmentation d'image comprend un sous-modèle d'extraction de caractéristique et un sous-modèle de segmentation d'image, le sous-modèle d'extraction de caractéristique étant utilisé pour extraire une caractéristique d'une image, et le sous-modèle de segmentation d'image étant utilisé pour segmenter l'image en fonction de la caractéristique extraite. Le procédé de traitement comprend les étapes consistant à : effectuer un ajustement de largeur de couche sur le sous-modèle d'extraction de caractéristique pour obtenir un premier sous-modèle d'extraction de caractéristique (S410) ; et obtenir un modèle de segmentation d'image cible en fonction du premier sous-modèle d'extraction de caractéristique et du sous-modèle de segmentation d'image (S420). Le procédé de traitement et l'appareil de traitement pour le modèle de segmentation d'image selon la présente invention facilitent l'amélioration de la précision de segmentation du modèle de segmentation d'image, de manière à faciliter la mise en œuvre de la technologie de segmentation d'image sur un dispositif périphérique.
PCT/CN2020/100058 2019-09-02 2020-07-03 Procédé de traitement et appareil de traitement pour modèle de segmentation d'image WO2021042857A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910845625.0 2019-09-02
CN201910845625.0A CN112446888A (zh) 2019-09-02 2019-09-02 图像分割模型的处理方法和处理装置

Publications (1)

Publication Number Publication Date
WO2021042857A1 true WO2021042857A1 (fr) 2021-03-11

Family

ID=74732997

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/100058 WO2021042857A1 (fr) 2019-09-02 2020-07-03 Procédé de traitement et appareil de traitement pour modèle de segmentation d'image

Country Status (2)

Country Link
CN (1) CN112446888A (fr)
WO (1) WO2021042857A1 (fr)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114358206A (zh) * 2022-01-12 2022-04-15 合肥工业大学 二值神经网络模型训练方法及系统、图像处理方法及系统
CN114549296A (zh) * 2022-04-21 2022-05-27 北京世纪好未来教育科技有限公司 图像处理模型的训练方法、图像处理方法及电子设备
CN115906651A (zh) * 2022-12-06 2023-04-04 中电金信软件有限公司 二值神经网络的更新方法、装置及电子设备
CN117726541A (zh) * 2024-02-08 2024-03-19 北京理工大学 一种基于二值化神经网络的暗光视频增强方法及装置

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114120420B (zh) * 2021-12-01 2024-02-13 北京百度网讯科技有限公司 图像检测方法和装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170032285A1 (en) * 2014-04-09 2017-02-02 Entrupy Inc. Authenticating physical objects using machine learning from microscopic variations
CN108492286A (zh) * 2018-03-13 2018-09-04 成都大学 一种基于双通路u型卷积神经网络的医学图像分割方法
CN109544556A (zh) * 2017-09-21 2019-03-29 江苏华夏知识产权服务有限公司 一种图像特征提取方法
CN109741348A (zh) * 2019-01-07 2019-05-10 哈尔滨理工大学 一种糖尿病视网膜图像分割方法
CN110189334A (zh) * 2019-05-28 2019-08-30 南京邮电大学 基于注意力机制的残差型全卷积神经网络的医学图像分割方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170032285A1 (en) * 2014-04-09 2017-02-02 Entrupy Inc. Authenticating physical objects using machine learning from microscopic variations
CN109544556A (zh) * 2017-09-21 2019-03-29 江苏华夏知识产权服务有限公司 一种图像特征提取方法
CN108492286A (zh) * 2018-03-13 2018-09-04 成都大学 一种基于双通路u型卷积神经网络的医学图像分割方法
CN109741348A (zh) * 2019-01-07 2019-05-10 哈尔滨理工大学 一种糖尿病视网膜图像分割方法
CN110189334A (zh) * 2019-05-28 2019-08-30 南京邮电大学 基于注意力机制的残差型全卷积神经网络的医学图像分割方法

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114358206A (zh) * 2022-01-12 2022-04-15 合肥工业大学 二值神经网络模型训练方法及系统、图像处理方法及系统
CN114549296A (zh) * 2022-04-21 2022-05-27 北京世纪好未来教育科技有限公司 图像处理模型的训练方法、图像处理方法及电子设备
CN114549296B (zh) * 2022-04-21 2022-07-12 北京世纪好未来教育科技有限公司 图像处理模型的训练方法、图像处理方法及电子设备
CN115906651A (zh) * 2022-12-06 2023-04-04 中电金信软件有限公司 二值神经网络的更新方法、装置及电子设备
CN115906651B (zh) * 2022-12-06 2024-05-31 中电金信软件有限公司 二值神经网络的更新方法、装置及电子设备
CN117726541A (zh) * 2024-02-08 2024-03-19 北京理工大学 一种基于二值化神经网络的暗光视频增强方法及装置

Also Published As

Publication number Publication date
CN112446888A (zh) 2021-03-05

Similar Documents

Publication Publication Date Title
WO2021042857A1 (fr) Procédé de traitement et appareil de traitement pour modèle de segmentation d'image
WO2021042828A1 (fr) Procédé et appareil de compression de modèle de réseau neuronal, ainsi que support de stockage et puce
US20210089922A1 (en) Joint pruning and quantization scheme for deep neural networks
WO2022083536A1 (fr) Procédé et appareil de construction de réseau neuronal
US20190236440A1 (en) Deep convolutional neural network architecture and system and method for building the deep convolutional neural network architecture
WO2022042713A1 (fr) Procédé d'entraînement d'apprentissage profond et appareil à utiliser dans un dispositif informatique
WO2021057056A1 (fr) Procédé de recherche d'architecture neuronale, procédé et dispositif de traitement d'image, et support de stockage
US20180018555A1 (en) System and method for building artificial neural network architectures
CN112288086B (zh) 一种神经网络的训练方法、装置以及计算机设备
EP4163831A1 (fr) Procédé et dispositif de distillation de réseau neuronal
WO2021218517A1 (fr) Procédé permettant d'acquérir un modèle de réseau neuronal et procédé et appareil de traitement d'image
CN113705769A (zh) 一种神经网络训练方法以及装置
CN112418292B (zh) 一种图像质量评价的方法、装置、计算机设备及存储介质
CN111898703B (zh) 多标签视频分类方法、模型训练方法、装置及介质
CN114549913B (zh) 一种语义分割方法、装置、计算机设备和存储介质
WO2022012668A1 (fr) Procédé et appareil de traitement d'ensemble d'apprentissage
WO2021175278A1 (fr) Procédé de mise à jour de modèle et dispositif associé
CN112561028A (zh) 训练神经网络模型的方法、数据处理的方法及装置
CN111105017A (zh) 神经网络量化方法、装置及电子设备
CN114266897A (zh) 痘痘类别的预测方法、装置、电子设备及存储介质
JP2023546582A (ja) 個人化ニューラルネットワークプルーニング
CN113536970A (zh) 一种视频分类模型的训练方法及相关装置
WO2022156475A1 (fr) Procédé et appareil de formation de modèle de réseau neuronal, et procédé et appareil de traitement de données
CN115238909A (zh) 一种基于联邦学习的数据价值评估方法及其相关设备
WO2022246986A1 (fr) Procédé, appareil et dispositif de traitement de données, et support de stockage lisible par ordinateur

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20860874

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20860874

Country of ref document: EP

Kind code of ref document: A1