CN114648762A

CN114648762A - Semantic segmentation method and device, electronic equipment and computer-readable storage medium

Info

Publication number: CN114648762A
Application number: CN202210272072.6A
Authority: CN
Inventors: 聂聪冲
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-03-18
Filing date: 2022-03-18
Publication date: 2022-06-21
Anticipated expiration: 2042-03-18
Also published as: CN114648762B

Abstract

The embodiments of the present application disclose a semantic segmentation method, an apparatus, an electronic device, and a computer-readable storage medium; in the embodiments of the present application, an image to be segmented is acquired, and the to-be-segmented image is divided to obtain the to-be-segmented image. the initial area image corresponding to the image; determine the target pixel context information in the initial area image, and determine the target area image corresponding to the initial area image according to the target pixel context information and the initial area image; determine the target The target area context information between the area images, and according to the target area image and the target area context information, determine the target pixel feature of the to-be-segmented image; Split to get the split result. The embodiments of the present application can reduce computational complexity. The embodiments of the present application can be applied to various scenarios such as cloud technology, artificial intelligence, smart transportation, and automatic driving.

Description

Semantic segmentation method, apparatus, electronic device, and computer-readable storage medium

技术领域technical field

本申请涉及图像处理技术领域，具体涉及一种语义分割方法、装置、电子设备和计算机可读存储介质。The present application relates to the technical field of image processing, and in particular, to a semantic segmentation method, an apparatus, an electronic device, and a computer-readable storage medium.

背景技术Background technique

语义分割是对图像中每个像素进行分类，属于同一个类的像素被划分在同一个区域，从而实现对图像的分割。Semantic segmentation is to classify each pixel in the image, and the pixels belonging to the same class are divided into the same area, so as to realize the segmentation of the image.

为了提高图像的分割精度，在语义分割中加入了注意力机制计算图像上下文信息，该方法的计算复杂度较高。In order to improve the image segmentation accuracy, an attention mechanism is added to the semantic segmentation to calculate the image context information, and the computational complexity of this method is high.

发明内容SUMMARY OF THE INVENTION

本申请实施例提供一种语义分割方法、装置、电子设备和计算机可读存储介质，可以解决语义分割中计算复杂度较高的技术问题。The embodiments of the present application provide a semantic segmentation method, apparatus, electronic device, and computer-readable storage medium, which can solve the technical problem of high computational complexity in semantic segmentation.

一种语义分割方法，包括：A semantic segmentation method including:

获取待分割图像，并对上述待分割图像进行划分，得到上述待分割图像对应的初始区域图像；Obtaining an image to be divided, and dividing the above-mentioned image to be divided, to obtain an initial area image corresponding to the above-mentioned to-be-divided image;

确定上述初始区域图像内的目标像素上下文信息，并根据上述目标像素上下文信息以及上述初始区域图像确定上述初始区域图像对应的目标区域图像；Determine the target pixel context information in the above-mentioned initial area image, and determine the target area image corresponding to the above-mentioned initial area image according to the above-mentioned target pixel context information and the above-mentioned initial area image;

确定上述目标区域图像之间的目标区域上下文信息，并根据上述目标区域图像和上述目标区域上下文信息，确定上述待分割图像的目标像素特征；Determine the target area context information between the above-mentioned target area images, and determine the target pixel feature of the above-mentioned to-be-segmented image according to the above-mentioned target area image and the above-mentioned target area context information;

根据上述目标像素特征，对上述待分割图像进行分割，得到分割结果。According to the feature of the target pixel, the image to be segmented is segmented to obtain a segmentation result.

相应地，本申请实施例提供一种语义分割装置，包括：Correspondingly, an embodiment of the present application provides a semantic segmentation device, including:

图像获取模块，用于获取待分割图像，并对上述待分割图像进行划分，得到上述待分割图像对应的初始区域图像；an image acquisition module, configured to acquire an image to be segmented, and to divide the image to be segmented to obtain an initial region image corresponding to the image to be segmented;

第一确定模块，用于确定上述初始区域图像内的目标像素上下文信息，并根据上述目标像素上下文信息以及上述初始区域图像确定上述初始区域图像对应的目标区域图像；a first determining module, configured to determine the target pixel context information in the above-mentioned initial area image, and determine the target area image corresponding to the above-mentioned initial area image according to the above-mentioned target pixel context information and the above-mentioned initial area image;

第二确定模块，用于确定上述目标区域图像之间的目标区域上下文信息，并根据上述目标区域图像和上述目标区域上下文信息，确定上述待分割图像的目标像素特征；a second determination module, configured to determine the target area context information between the above-mentioned target area images, and determine the target pixel feature of the above-mentioned to-be-segmented image according to the above-mentioned target area image and the above-mentioned target area context information;

图像分割模块，用于根据上述目标像素特征，对上述待分割图像进行分割，得到分割结果。The image segmentation module is configured to segment the image to be segmented according to the feature of the target pixel to obtain a segmentation result.

可选地，图像获取模块具体用于执行：Optionally, the image acquisition module is specifically configured to execute:

对上述待分割图像进行特征提取，得到特征图像；Perform feature extraction on the above-mentioned to-be-segmented image to obtain a feature image;

获取预设网格，并根据上述特征图像和上述预设网格对上述预设网格的顶点进行位移预测，得到目标形变网格；Acquiring a preset grid, and performing displacement prediction on the vertices of the preset grid according to the feature image and the preset grid to obtain a target deformation grid;

根据上述目标形变网格对上述特征图像进行划分，得到上述待分割图像对应的初始区域图像。The feature image is divided according to the target deformation grid to obtain the initial region image corresponding to the image to be divided.

确定上述预设网格中顶点坐标；Determine the vertex coordinates in the above preset grid;

根据上述顶点坐标，在上述特征图像中进行顶点特征搜索，得到上述顶点坐标对应的初始特征；According to the above-mentioned vertex coordinates, perform a vertex feature search in the above-mentioned feature image, and obtain the initial feature corresponding to the above-mentioned vertex coordinates;

根据上述初始特征以及上述顶点坐标的上下文信息，确定上述顶点坐标对应的目标特征；According to the above-mentioned initial feature and the context information of the above-mentioned vertex coordinate, determine the target characteristic corresponding to the above-mentioned vertex coordinate;

根据上述目标特征预测上述顶点坐标的目标位移，并根据上述目标位移对上述顶点坐标进行移动，得到目标形变网格。The target displacement of the vertex coordinates is predicted according to the target feature, and the vertex coordinates are moved according to the target displacement to obtain a target deformation mesh.

可选地，第二确定模块具体用于执行：Optionally, the second determining module is specifically configured to execute:

根据上述目标区域图像的像素的均值，确定上述目标区域图像对应的初始区域特征；Determine the initial area feature corresponding to the target area image according to the average value of the pixels of the target area image;

根据上述初始区域特征，确定上述目标区域图像之间的目标区域上下文信息；According to the above-mentioned initial area feature, determine the target area context information between the above-mentioned target area images;

根据上述初始区域特征和上述目标区域上下文信息确定目标区域特征；Determine the target area feature according to the above-mentioned initial area characteristic and the above-mentioned target area context information;

将上述目标区域特征，映射到上述目标形变网格上，得到上述待分割图像的目标像素特征。The above-mentioned target area feature is mapped to the above-mentioned target deformation grid to obtain the above-mentioned target pixel characteristic of the to-be-segmented image.

可选地，上述语义分割装置还包括：Optionally, the above-mentioned semantic segmentation device further includes:

训练模块，用于：Training module for:

获取训练样本集，并根据上述训练样本集的训练图像中，训练像素的个数确定上述训练像素的类别对应的目标权重；Obtain a training sample set, and determine the target weight corresponding to the category of the training pixel according to the number of training pixels in the training image of the training sample set;

对上述训练图像进行划分，得到训练区域图像；Divide the above training images to obtain training area images;

通过待训练的神经网络模型中像素上下文层，确定上述训练区域图像内的初始像素上下文信息，并根据上述初始像素上下文信息以及上述训练区域图像确定上述训练图像对应的第一区域图像；Determine the initial pixel context information in the above-mentioned training area image through the pixel context layer in the neural network model to be trained, and determine the first area image corresponding to the above-mentioned training image according to the above-mentioned initial pixel context information and the above-mentioned training area image;

通过上述待训练的神经网络模型中区域上下文层，确定上述第一区域图像之间的初始区域上下文信息，并根据上述第一区域图像和上述初始区域上下文信息，确定上述训练图像的初始像素特征；Determine the initial area context information between the above-mentioned first area images through the area context layer in the above-mentioned neural network model to be trained, and determine the initial pixel features of the above-mentioned training images according to the above-mentioned first area images and the above-mentioned initial area context information;

通过上述待训练的神经网络模型中分割层，确定上述初始像素特征对应的训练像素的目标类别；Determine the target category of the training pixel corresponding to the above-mentioned initial pixel feature through the segmentation layer in the neural network model to be trained;

根据上述目标类别，上述训练像素的标签以及上述目标权重，确定目标损失值；According to the above target category, the label of the above training pixel and the above target weight, determine the target loss value;

获取上述待训练的神经网络模型的训练次数；Obtain the training times of the above-mentioned neural network model to be trained;

基于上述目标损失值和上述训练次数，对上述待训练的神经网络模型进行训练，得到上述已训练的神经网络模型。Based on the above-mentioned target loss value and the above-mentioned training times, the above-mentioned neural network model to be trained is trained to obtain the above-mentioned trained neural network model.

可选地，训练模块具体用于执行：Optionally, the training module is specifically used to execute:

通过上述待训练的神经网络模型中特征提取层，对上述训练图像进行特征提取，得到训练特征图像；Through the feature extraction layer in the neural network model to be trained, feature extraction is performed on the training image to obtain a training feature image;

通过上述待训练的神经网络模型中上述形变网络层，根据上述训练特征图像和上述预设网格对上述预设网格的顶点进行位移预测，得到初始形变网格；Through the deformation network layer in the neural network model to be trained, the displacement prediction is performed on the vertices of the preset grid according to the training feature image and the preset grid to obtain an initial deformation grid;

通过上述待训练的神经网络模型中上述像素上下文层，根据上述初始形变网格对上述训练特征图像进行划分，得到训练区域图像；Through the above-mentioned pixel context layer in the above-mentioned neural network model to be trained, the above-mentioned training feature image is divided according to the above-mentioned initial deformation grid to obtain a training area image;

根据上述目标类别，上述训练像素的标签以及上述目标权重，确定第一目标损失值；According to the above-mentioned target category, the label of the above-mentioned training pixel and the above-mentioned target weight, determine the first target loss value;

根据上述初始像素特征以及上述初始像素特征的均值确定第二目标损失值；Determine the second target loss value according to the above-mentioned initial pixel features and the mean value of the above-mentioned initial pixel features;

根据上述第一目标损失值以及上述第二目标损失值确定目标损失值；Determine the target loss value according to the first target loss value and the second target loss value;

若上述目标损失值不满足预设条件和/或上述待训练的神经网络模型的训练次数小于预设阈值，则对上述训练次数增加1，根据上述第一目标损失值更新上述待训练的神经网络模型中上述特征提取层的网络参数、上述像素上下文层的网络参数、上述区域上下文层的网络参数以及上述分割层的网络参数，根据上述第二目标损失值更新上述待训练的神经网络模型中上述形变网络层的网络参数，并返回执行通过上述待训练的神经网络模型中特征提取层，对上述训练图像进行特征提取，得到训练特征图像；If the above-mentioned target loss value does not meet the preset conditions and/or the training times of the above-mentioned neural network model to be trained is less than the preset threshold, then the above-mentioned training times are increased by 1, and the above-mentioned neural network to be trained is updated according to the above-mentioned first target loss value. In the model, the network parameters of the above-mentioned feature extraction layer, the network parameters of the above-mentioned pixel context layer, the network parameters of the above-mentioned area context layer, and the network parameters of the above-mentioned segmentation layer are updated according to the above-mentioned second target loss value. Deform the network parameters of the network layer, and return to execute the feature extraction layer in the neural network model to be trained, and perform feature extraction on the training image to obtain a training feature image;

若上述目标损失值满足预设条件和/或上述待训练的神经网络模型的训练次数等于预设阈值，则停止训练，得到上述已训练的神经网络模型。If the target loss value satisfies the preset condition and/or the training times of the neural network model to be trained is equal to the preset threshold, the training is stopped, and the trained neural network model is obtained.

确定上述初始形变网格中各个网格的子面积以及上述待分割图像的总面积；Determine the sub-area of each grid in the above-mentioned initial deformation grid and the total area of the above-mentioned to-be-segmented image;

根据上述子面积和上述总面积确定第三目标损失值；Determine the third target loss value according to the above-mentioned sub-areas and the above-mentioned total area;

根据上述第一目标损失值、上述第二目标损失值以及上述第三目标损失值，确定目标损失值；Determine the target loss value according to the first target loss value, the second target loss value, and the third target loss value;

若上述目标损失值不满足预设条件和/或上述待训练的神经网络模型的训练次数小于预设阈值，则对上述训练次数增加1，根据上述第一目标损失值更新上述待训练的神经网络模型中上述特征提取层的网络参数、上述像素上下文层的网络参数、上述区域上下文层的网络参数以及上述分割层的网络参数，根据上述第二目标损失值和上述第三目标损失值更新上述待训练的神经网络模型中上述形变网络层的网络参数，并返回执行通过上述待训练的神经网络模型中特征提取层，对上述训练图像进行特征提取，得到训练特征图像。If the above-mentioned target loss value does not meet the preset conditions and/or the training times of the above-mentioned neural network model to be trained is less than the preset threshold, then the above-mentioned training times are increased by 1, and the above-mentioned neural network to be trained is updated according to the above-mentioned first target loss value. In the model, the network parameters of the feature extraction layer, the network parameters of the pixel context layer, the network parameters of the region context layer, and the network parameters of the segmentation layer are updated according to the second target loss value and the third target loss value. The network parameters of the above-mentioned deformation network layer in the trained neural network model are returned to execute the feature extraction layer in the above-mentioned neural network model to be trained, and feature extraction is performed on the above-mentioned training image to obtain a training characteristic image.

此外，本申请实施例还提供一种电子设备，包括处理器和存储器，上述存储器存储有计算机程序，上述处理器用于运行上述存储器内的计算机程序实现本申请实施例提供的语义分割方法。In addition, an embodiment of the present application further provides an electronic device, including a processor and a memory, the memory stores a computer program, and the processor is configured to run the computer program in the memory to implement the semantic segmentation method provided by the embodiment of the present application.

此外，本申请实施例还提供一种计算机可读存储介质，上述计算机可读存储介质存储有计算机程序，上述计算机程序适于处理器进行加载，以执行本申请实施例所提供的任一种语义分割方法。In addition, the embodiments of the present application further provide a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and the computer program is adapted to be loaded by a processor to execute any semantics provided by the embodiments of the present application segmentation method.

此外，本申请实施例还提供一种计算机程序产品，包括计算机程序，所述计算机程序被处理器执行时实现本申请实施例所提供的任一种语义分割方法。In addition, the embodiments of the present application further provide a computer program product, including a computer program, which implements any one of the semantic segmentation methods provided by the embodiments of the present application when the computer program is executed by a processor.

在本申请实施例中，先获取待分割图像，并对待分割图像进行划分，得到待分割图像对应的初始区域图像。然后确定初始区域图像内的目标像素上下文信息，并根据目标像素上下文信息以及初始区域图像确定初始区域图像对应的目标区域图像。接着确定目标区域图像之间的目标区域上下文信息，并根据目标区域图像和目标区域上下文信息，确定待分割图像的目标像素特征。最后根据目标像素特征，对待分割图像进行识别并分割。In the embodiment of the present application, the image to be segmented is obtained first, and the image to be segmented is divided to obtain an initial region image corresponding to the image to be segmented. Then, the target pixel context information in the initial area image is determined, and the target area image corresponding to the initial area image is determined according to the target pixel context information and the initial area image. Next, the target area context information between the target area images is determined, and the target pixel features of the to-be-segmented image are determined according to the target area images and the target area context information. Finally, according to the target pixel features, the image to be segmented is identified and segmented.

即在本申请实施例中，先确定初始区域图像内的目标像素上下文信息，并根据目标像素上下文信息以及初始区域图像确定初始区域图像对应的目标区域图像，然后确定目标区域图像之间的目标区域上下文信息，并根据目标区域图像和目标区域上下文信息，确定待分割图像的目标像素特征，即通过确定两个目标区域图像之间的目标区域上下文信息，使得无需计算目标区域图像中每个像素与其他目标区域图像中所有像素之间的上下文信息，即使得无需计算待分割图像中每个像素与待分割图像中所有其他像素之间的上下文信息，从而降低了计算复杂度。That is, in the embodiment of the present application, the target pixel context information in the initial area image is first determined, and the target area image corresponding to the initial area image is determined according to the target pixel context information and the initial area image, and then the target area between the target area images is determined. Context information, and according to the target area image and the target area context information, determine the target pixel characteristics of the image to be segmented, that is, by determining the target area context information between the two target area images, it is unnecessary to calculate the relationship between each pixel in the target area image and the target area image. The context information between all the pixels in the other target area images makes it unnecessary to calculate the context information between each pixel in the to-be-segmented image and all other pixels in the to-be-segmented image, thereby reducing the computational complexity.

附图说明Description of drawings

为了更清楚地说明本申请实施例中的技术方案，下面将对实施例描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本申请的一些实施例，对于本领域技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to illustrate the technical solutions in the embodiments of the present application more clearly, the following briefly introduces the drawings that are used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those skilled in the art, other drawings can also be obtained from these drawings without creative effort.

图1是本申请实施例提供的语义分割过程的场景示意图；Fig. 1 is a scene schematic diagram of a semantic segmentation process provided by an embodiment of the present application;

图2是本申请实施例提供的语义分割方法的流程示意图；2 is a schematic flowchart of a semantic segmentation method provided by an embodiment of the present application;

图3是本申请实施例提供的初始区域图像的示意图；3 is a schematic diagram of an initial area image provided by an embodiment of the present application;

图4是本申请实施例提供的预设网格和目标形变网格的示意图；4 is a schematic diagram of a preset grid and a target deformation grid provided by an embodiment of the present application;

图5是本申请实施例提供的待训练的神经网络模型的训练方法的流程示意图；5 is a schematic flowchart of a training method for a neural network model to be trained provided by an embodiment of the present application;

图6是本申请实施例提供的待训练的神经网络模型的结构示意图；6 is a schematic structural diagram of a neural network model to be trained provided by an embodiment of the present application;

图7是本申请实施例提供的已训练的神经网络模型的应用方法的流程示意图；7 is a schematic flowchart of a method for applying a trained neural network model provided by an embodiment of the present application;

图8是本申请实施例提供的已训练的神经网络模型的分割结果的示意图；8 is a schematic diagram of a segmentation result of a trained neural network model provided by an embodiment of the present application;

图9是本申请实施例提供的语义分割装置的结构示意图；9 is a schematic structural diagram of a semantic segmentation device provided by an embodiment of the present application;

图10是本申请实施例提供的电子设备的结构示意图。FIG. 10 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.

具体实施方式Detailed ways

下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本申请一部分实施例，而不是全部的实施例。基于本申请中的实施例，本领域技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those skilled in the art without creative work fall within the protection scope of the present application.

本申请实施例提供一种语义分割方法、装置、电子设备和计算机可读存储介质。其中，该语义分割装置可以集成在电子设备中，该电子设备可以是服务器，也可以是终端等设备。Embodiments of the present application provide a semantic segmentation method, apparatus, electronic device, and computer-readable storage medium. Wherein, the semantic segmentation device may be integrated in an electronic device, and the electronic device may be a server or a terminal or other device.

其中，服务器可以是独立的物理服务器，也可以是多个物理服务器构成的服务器集群或者分布式系统，还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、网络加速服务(Content Delivery Network，CDN)、以及大数据和人工智能平台等基础云计算服务的云服务器。The server may be an independent physical server, or a server cluster or distributed system composed of multiple physical servers, or may provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, and cloud communications. , middleware services, domain name services, security services, network acceleration services (Content Delivery Network, CDN), and cloud servers for basic cloud computing services such as big data and artificial intelligence platforms.

并且，其中多个服务器可组成为一区块链，而服务器为区块链上的节点。Moreover, a plurality of servers can be formed into a blockchain, and the servers are nodes on the blockchain.

终端可以是智能手机、平板电脑、笔记本电脑、台式计算机、智能音箱、智能手表以及车载终端等，但并不局限于此。终端以及服务器可以通过有线或无线通信方式进行直接或间接地连接，本申请在此不做限制。The terminal may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, a vehicle terminal, etc., but is not limited thereto. The terminal and the server may be directly or indirectly connected through wired or wireless communication, which is not limited in this application.

例如，如图1所示，终端可以获取待分割图像，并对待分割图像进行划分，得到待分割图像对应的初始区域图像；确定初始区域图像内的目标像素上下文信息，并根据目标像素上下文信息以及初始区域图像确定初始区域图像对应的目标区域图像；确定目标区域图像之间的目标区域上下文信息，并根据目标区域图像和目标区域上下文信息，确定待分割图像的目标像素特征；根据目标像素特征，对待分割图像进行分割，得到分割结果。For example, as shown in FIG. 1, the terminal can obtain the image to be divided, and divide the image to be divided to obtain the initial area image corresponding to the image to be divided; The initial area image determines the target area image corresponding to the initial area image; determines the target area context information between the target area images, and determines the target pixel feature of the image to be segmented according to the target area image and the target area context information; According to the target pixel feature, Segment the image to be segmented to obtain the segmentation result.

另外，本申请实施例中的“多个”指两个或两个以上。本申请实施例中的“第一”和“第二”等用于区分描述，而不能理解为暗示相对重要性。In addition, the "plurality" in the embodiments of the present application refers to two or more. In the embodiments of the present application, "first" and "second" etc. are used to distinguish descriptions, and should not be understood as implying relative importance.

人工智能(Artificial Intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能，感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说，人工智能是计算机科学的一个综合技术，它企图了解智能的实质，并生产出一种新的能以人类智能相似的方式做出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法，使机器具有感知、推理与决策的功能。Artificial intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can respond in a similar way to human intelligence. Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.

人工智能技术是一门综合学科，涉及领域广泛，既有硬件层面的技术也有软件层面的技术。人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。Artificial intelligence technology is a comprehensive discipline, involving a wide range of fields, including both hardware-level technology and software-level technology. The basic technologies of artificial intelligence generally include technologies such as sensors, special artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics. Artificial intelligence software technology mainly includes computer vision technology, speech processing technology, natural language processing technology, and machine learning/deep learning.

计算机视觉技术(Computer Vision,CV)计算机视觉是一门研究如何使机器“看”的科学，更进一步的说，就是指用摄影机和电脑代替人眼对目标进行识别和测量等机器视觉，并进一步做图形处理，使电脑处理成为更适合人眼观察或传送给仪器检测的图像。作为一个科学学科，计算机视觉研究相关的理论和技术，试图建立能够从图像或者多维数据中获取信息的人工智能系统。计算机视觉技术通常包括图像处理、图像识别、图像语义理解、图像检索、OCR、视频处理、视频语义理解、视频内容/行为识别、三维物体重建、3D技术、虚拟现实、增强现实、同步定位与地图构建等技术，还包括常见的人脸识别、指纹识别等生物特征识别技术。Computer Vision Technology (Computer Vision, CV) Computer vision is a science that studies how to make machines "see". Further, it refers to the use of cameras and computers instead of human eyes to recognize and measure objects and other machine vision, and further. Do graphics processing to make computer processing become images more suitable for human eye observation or transmission to instruments for detection. As a scientific discipline, computer vision studies related theories and technologies, trying to build artificial intelligence systems that can obtain information from images or multidimensional data. Computer vision technology usually includes image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, 3D object reconstruction, 3D technology, virtual reality, augmented reality, simultaneous localization and mapping It also includes common biometric identification technologies such as face recognition and fingerprint recognition.

机器学习(Machine Learning,ML)是一门多领域交叉学科，涉及概率论、统计学、逼近论、凸分析、算法复杂度理论等多门学科。专门研究计算机怎样模拟或实现人类的学习行为，以获取新的知识或技能，重新组织已有的知识结构使之不断改善自身的性能。机器学习是人工智能的核心，是使计算机具有智能的根本途径，其应用遍及人工智能的各个领域。机器学习和深度学习通常包括人工神经网络、置信网络、强化学习、迁移学习、归纳学习、示教学习等技术。Machine Learning (ML) is a multi-field interdisciplinary subject involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and other disciplines. It specializes in how computers simulate or realize human learning behaviors to acquire new knowledge or skills, and to reorganize existing knowledge structures to continuously improve their performance. Machine learning is the core of artificial intelligence and the fundamental way to make computers intelligent, and its applications are in all fields of artificial intelligence. Machine learning and deep learning usually include artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, teaching learning and other techniques.

本申请实施例可以应用于各种场景，包括但不限于云技术、人工智能、自动驾驶以及遥感等。自动驾驶通常包括高精地图、环境感知、行为决策、路径规划、运动控制等技术。The embodiments of the present application can be applied to various scenarios, including but not limited to cloud technology, artificial intelligence, automatic driving, and remote sensing. Autonomous driving usually includes technologies such as high-precision maps, environmental perception, behavioral decision-making, path planning, and motion control.

以下分别进行详细说明。需要说明的是，以下实施例的描述顺序不作为对实施例优选顺序的限定。Each of them will be described in detail below. It should be noted that the description order of the following embodiments is not intended to limit the preferred order of the embodiments.

请参阅图2，图2是本申请一实施例提供的语义分割方法的流程示意图。该语义分割方法可以包括：Please refer to FIG. 2 , which is a schematic flowchart of a semantic segmentation method provided by an embodiment of the present application. The semantic segmentation method may include:

S201、获取待分割图像，并对待分割图像进行划分，得到待分割图像对应的初始区域图像。S201. Acquire an image to be segmented, and divide the image to be segmented to obtain an initial region image corresponding to the image to be segmented.

待分割图像指包含用户感兴趣区域的图像。比如，图像A中包括车辆、树木以及天空，用户想得到图像A中车辆图像，则图像A为待分割图像。An image to be segmented refers to an image that includes an area of interest for the user. For example, image A includes vehicles, trees, and the sky. If the user wants to obtain an image of the vehicle in image A, then image A is an image to be segmented.

待分割图像可以是自动驾驶场景的图像，也可以是遥感图像，对于待分割图像的种类，本实施例在此不做限定。The image to be segmented may be an image of an automatic driving scene or a remote sensing image, and the type of the image to be segmented is not limited in this embodiment.

终端可以是在接收到获取指令时，通过本身的摄像头进行拍摄，从而获取到待分割图像。或者，终端可以是在接收到获取指令时，从本地获取待分割图像。The terminal may acquire the image to be segmented by shooting through its own camera when receiving the acquiring instruction. Alternatively, the terminal may acquire the image to be segmented locally when receiving the acquiring instruction.

又或者，本终端也可以是在接收到获取指令时，将获取指令转发至其他终端，其他终端再基于获取指令从本地获取待分割图像，或基于获取指令进行拍摄，从而获取到待分割图像。然后其他终端再将待分割图像发送至本终端，本终端从而获取到待分割图像。Alternatively, the terminal may also forward the acquisition instruction to other terminals when receiving the acquisition instruction, and the other terminal acquires the to-be-segmented image locally based on the acquisition instruction, or shoots based on the acquisition instruction, thereby acquiring the to-be-segmented image. Then the other terminal sends the image to be divided to the terminal, and the terminal obtains the image to be divided.

对于终端获取待分割图像的方式，用户可以根据实际情况进行选择，本实施例在此不做限定。As for the manner in which the terminal acquires the image to be segmented, the user may select according to the actual situation, which is not limited in this embodiment.

终端在获取到待分割图像之后，可以基于预设网格对待分割图像进行划分，得到待分割图像对应的初始区域图像。After acquiring the to-be-segmented image, the terminal may divide the to-be-segmented image based on a preset grid to obtain an initial region image corresponding to the to-be-segmented image.

划分指将图像分成各个区域图像，但各个区域图像还存在连接关系。比如，待分割图像如图3所示，则初始区域图像可以如图3所示，目标指待分割图像中的事物，比如，目标可以为车辆、人或树木等。Divide refers to dividing the image into various regional images, but there is still a connection relationship between the regional images. For example, if the image to be segmented is shown in Figure 3, the initial area image may be as shown in Figure 3, and the target refers to the thing in the image to be segmented, for example, the target may be a vehicle, a person, or a tree.

由于每张待分割图像中目标在每张待分割图像中的位置是不相同的，因此，如果是基于预设网格直接对待分割图像进行划分，得到的初始区域图像并不准确，即此时，预设网格的边与待分割图像中目标的边界不契合。因此，为了提高初始区域图像的准确性，在一些实施例中，对待分割图像进行划分，得到待分割图像对应的初始区域图像，包括：Since the position of the target in each image to be divided is different, if the image to be divided is directly divided based on the preset grid, the obtained initial area image is not accurate, that is, at this time , the edge of the preset grid does not fit the boundary of the target in the image to be segmented. Therefore, in order to improve the accuracy of the initial area image, in some embodiments, the to-be-segmented image is divided to obtain the initial area image corresponding to the to-be-segmented image, including:

对待分割图像进行特征提取，得到特征图像；Perform feature extraction on the image to be segmented to obtain a feature image;

获取预设网格，并根据特征图像和预设网格对预设网格的顶点进行位移预测，得到目标形变网格；Obtain a preset mesh, and perform displacement prediction on the vertices of the preset mesh according to the feature image and the preset mesh to obtain the target deformation mesh;

根据目标形变网格对特征图像进行划分，得到待分割图像对应的初始区域图像。The feature image is divided according to the target deformation grid, and the initial region image corresponding to the image to be divided is obtained.

在本实施例中，可以对网格进行初始化，从而得到预设网格。其中，可以根据以下三个方面对网格进行初始化：In this embodiment, the grid may be initialized to obtain a preset grid. Among them, the grid can be initialized according to the following three aspects:

第一方面，形变前后网格拓扑结构保持不变，即预设网格的拓扑结构和目标形变网格的拓扑结构相同。In the first aspect, the mesh topology remains unchanged before and after deformation, that is, the topology of the preset mesh is the same as that of the target deformed mesh.

第二方面，预设网格的边具备灵活性和多样性，使得预设网格以尽可能小的位移偏差之后就可以得到目标形变网格。In the second aspect, the edges of the preset mesh are flexible and diverse, so that the target deformation mesh can be obtained after the preset mesh has as little displacement deviation as possible.

第三方面，预设网格的网格的个数恒定，方便后续对图像的结构化批量处理。In the third aspect, the number of grids of the preset grid is constant, which facilitates subsequent structured batch processing of images.

根据上述三个方面对网格进行初始化，得到的预设网格可以如图4中所示。The grid is initialized according to the above three aspects, and the obtained preset grid can be as shown in FIG. 4 .

在得到预设网格之后，终端可以根据特征图像对预设网格的顶点进行位移，得到目标形变网格，使得目标形变网格的边与特征图像中目标的边界契合，即使得特征图像中同一个目标中在目标形变网格中同一个网格中，即使得目标形变网格中同一个网格内的目标颜色同质化(颜色相同)或语义同质化(属于同一个类别)。After obtaining the preset grid, the terminal can displace the vertices of the preset grid according to the feature image to obtain the target deformable grid, so that the edges of the target deformable grid fit the boundary of the target in the feature image, that is, the feature image The same target is in the same grid in the target deformation grid, that is, the target color homogeneity (same color) or semantic homogeneity (belonging to the same category) in the same grid in the target deformation grid.

比如，对图4中的预设网格的顶点进行位移之后，得到的目标形变网格可以如图4所示。For example, after displacing the vertices of the preset mesh in FIG. 4 , the obtained target deformation mesh may be as shown in FIG. 4 .

终端得到目标形变网格之后，再根据目标形变网格对特征图像进行划分，得到待分割图像对应的初始区域图像。After obtaining the target deformation grid, the terminal divides the feature image according to the target deformation grid to obtain an initial region image corresponding to the image to be divided.

在本实施例中，在对待分割图像进行特征提取得到特征图像之后，根据特征图像和预设网格对预设网格的顶点进行位移预测，得到目标形变网格，然后根据目标形变网格对特征图像进行划分，得到待分割图像对应的初始区域图像，使得目标形变网格的网格边与图像中目标的边界相契合。In this embodiment, after feature extraction is performed on the image to be segmented to obtain the feature image, displacement prediction is performed on the vertices of the preset grid according to the feature image and the preset grid to obtain the target deformation grid, and then according to the target deformation grid The feature image is divided to obtain the initial area image corresponding to the image to be divided, so that the grid edge of the target deformation grid matches the boundary of the target in the image.

在一些实施例中，根据特征图像和预设网格对预设网格的顶点进行位移预测，得到目标形变网格，包括：In some embodiments, displacement prediction is performed on the vertices of the preset mesh according to the feature image and the preset mesh to obtain the target deformation mesh, including:

确定预设网格中顶点坐标；Determine the vertex coordinates in the preset mesh;

根据顶点坐标，在特征图像中进行顶点特征搜索，得到顶点坐标对应的初始特征；According to the vertex coordinates, perform vertex feature search in the feature image to obtain the initial features corresponding to the vertex coordinates;

根据初始特征以及顶点坐标的上下文信息，确定顶点坐标对应的目标特征；Determine the target feature corresponding to the vertex coordinate according to the initial feature and the context information of the vertex coordinate;

根据目标特征预测顶点坐标的目标位移，并根据目标位移对顶点坐标进行移动，得到目标形变网格。The target displacement of the vertex coordinates is predicted according to the target features, and the vertex coordinates are moved according to the target displacement to obtain the target deformation mesh.

在对图像的目标检测中，通过全连接层或卷积层预测检测框的中心点、左上角顶点或右下角顶点的位置的偏移。然而，目标检测中不同检测框的顶点或中心点在同一个图像上是互不干扰的。语义分割是一种密集预测任务，即各个网格的顶点并不是彼此孤立的，而是通过网格的边彼此连接，是存在上下文依赖关系的一组点。因此，在本实施例中，在得到顶点对应的初始特征之后，根据初始特征以及顶点的上下文信息，确定顶点坐标对应的目标特征，使得可以更加准确地得到目标位移，进而使得可以更加准确地得到目标形变网格。In the object detection of images, the offset of the position of the center point, the upper left corner vertex or the lower right corner vertex of the detection frame is predicted by the fully connected layer or the convolution layer. However, the vertices or center points of different detection boxes on the same image do not interfere with each other in object detection. Semantic segmentation is a dense prediction task, that is, the vertices of each grid are not isolated from each other, but are connected to each other through the edges of the grid, which are a set of points with context dependencies. Therefore, in this embodiment, after the initial feature corresponding to the vertex is obtained, the target feature corresponding to the vertex coordinate is determined according to the initial feature and the context information of the vertex, so that the target displacement can be obtained more accurately, and thus the target displacement can be obtained more accurately The target deformable mesh.

其中，根据初始特征以及顶点的上下文信息，确定顶点坐标对应的目标特征，包括：Among them, according to the initial features and the context information of the vertices, the target features corresponding to the vertex coordinates are determined, including:

对初始特征进行逐点特征变换，得到中间特征，然后根据中间特征和顶点的上下文信息确定目标特征。Perform point-by-point feature transformation on the initial features to obtain intermediate features, and then determine the target features according to the context information of the intermediate features and vertices.

需要说明的是，顶点的上下文信息可以通过已训练的神经网络中形变网络层的自注意力机制层得到，对初始特征进行逐点特征变换，可以通过已训练的神经网络中形变网络层中逐点卷积层实现，然后再通过已训练的神经网络中形变网络层的预测卷积层，根据初始特征以及顶点的上下文信息，确定顶点坐标对应的目标特征。It should be noted that the context information of the vertices can be obtained through the self-attention mechanism layer of the deformation network layer in the trained neural network, and the point-by-point feature transformation of the initial features can be obtained through the deformation network layer in the trained neural network. The point convolution layer is implemented, and then the target feature corresponding to the vertex coordinate is determined according to the initial feature and the context information of the vertex through the prediction convolution layer of the deformation network layer in the trained neural network.

并且，形变网络层可以包括多层逐点卷积层和自注意力机制层，从而实现对初始特征的多次特征变换，以及多次获取得到顶点的上下文信息。比如，形变网络层可以包括6层逐点卷积层和自注意力机制层。Moreover, the deformation network layer can include multiple point-by-point convolution layers and self-attention mechanism layers, so as to realize multiple feature transformations on the initial features, and obtain the context information of the vertices multiple times. For example, the deformation network layer can include 6 point-wise convolutional layers and self-attention mechanism layers.

为了更加准确地得到中间特征，在得到初始特征之后，还可以将顶点坐标与初始特征进行融合(比如，将顶点坐标与初始特征进行串联)，得到候选特征，然后再对候选特征进行逐点变换，得到中间特征。In order to obtain the intermediate features more accurately, after the initial features are obtained, the vertex coordinates can also be fused with the initial features (for example, the vertex coordinates are concatenated with the initial features) to obtain candidate features, and then the candidate features can be transformed point by point. , to get the intermediate features.

其中，可以通过已训练的神经网络中形变网络层的CoordConv层将顶点坐标与初始特征进行融合。Among them, the vertex coordinates and the initial features can be fused through the CoordConv layer of the deformation network layer in the trained neural network.

在另一些实施例中，对待分割图像进行特征提取，得到特征图像，包括：通过已训练的神经网络模型中特征提取层对待分割图像进行特征提取，得到特征图像。In other embodiments, performing feature extraction on the image to be segmented to obtain the feature image includes: performing feature extraction on the image to be segmented through the feature extraction layer in the trained neural network model to obtain the feature image.

其中，已训练的神经网络模型中特征提取层可以为残差神经网络(ResNet)或卷积网络。可选地，当特征提取层为残差神经网络时，为了减缓分辨率的损失的同时扩大感受野，可以将残差神经网络中部分池化层替换为扩展卷积层，即此时，特征提取层为包含扩张卷积层的残差神经网络。The feature extraction layer in the trained neural network model may be a residual neural network (ResNet) or a convolutional network. Optionally, when the feature extraction layer is a residual neural network, in order to reduce the loss of resolution and expand the receptive field, part of the pooling layer in the residual neural network can be replaced with an extended convolution layer, that is, at this time, the feature The extraction layer is a residual neural network containing dilated convolutional layers.

S202、确定初始区域图像内的目标像素上下文信息，并根据目标像素上下文信息以及初始区域图像确定初始区域图像对应的目标区域图像。S202. Determine the target pixel context information in the initial area image, and determine the target area image corresponding to the initial area image according to the target pixel context information and the initial area image.

终端在得到初始区域图像之后，获取初始区域图像内像素之间的目标像素上下文信息，然后根据目标像素上下文信息以及初始区域图像确定初始区域图像对应的目标区域图像。After obtaining the initial area image, the terminal obtains the target pixel context information between the pixels in the initial area image, and then determines the target area image corresponding to the initial area image according to the target pixel context information and the initial area image.

比如，初始区域图像a中存在像素1、像素2以及像素3，则计算像素1与像素2之间的目标像素上下文信息，计算像素1与像素3之间的目标像素上下文信息以及计算像素2与像素3之间的目标像素上下文信息。For example, if there are pixel 1, pixel 2, and pixel 3 in the initial area image a, calculate the target pixel context information between pixel 1 and pixel 2, calculate the target pixel context information between pixel 1 and pixel 3, and calculate the target pixel context information between pixel 2 and pixel 2. Target pixel context information between pixels 3.

其中，可以通过已训练的神经网络模型中像素上下文层确定初始区域图像内的目标像素上下文信息，并根据目标像素上下文信息以及初始区域图像确定初始区域图像对应的目标区域图像，像素上下文层可以为自注意力机制层。Among them, the target pixel context information in the initial area image can be determined through the pixel context layer in the trained neural network model, and the target area image corresponding to the initial area image can be determined according to the target pixel context information and the initial area image, and the pixel context layer can be Self-attention mechanism layer.

或者，可以通过欧式距离算法或余弦相似度算法，确定初始区域图像内的目标像素上下文信息，并根据目标像素上下文信息以及初始区域图像确定初始区域图像对应的目标区域图像。Alternatively, the Euclidean distance algorithm or the cosine similarity algorithm may be used to determine the target pixel context information in the initial area image, and the target area image corresponding to the initial area image may be determined according to the target pixel context information and the initial area image.

上下文信息，指不同的对象之间的相互作用信息。像素上下文信息，指同一张图像中不同像素之间的相互作用信息。Context information refers to the interaction information between different objects. Pixel context information refers to the interaction information between different pixels in the same image.

S203、确定目标区域图像之间的目标区域上下文信息，并根据目标区域图像和目标区域上下文信息，确定待分割图像的目标像素特征。S203. Determine the target area context information between the target area images, and determine the target pixel feature of the to-be-segmented image according to the target area image and the target area context information.

相关技术中，通过自注意力机制层，不但要计算初始区域图像内像素之间的目标像素上下文信息，而且需计算不同初始区域图像的像素之间的上下文信息，导致计算复杂度较高。In the related art, through the self-attention mechanism layer, not only the target pixel context information between the pixels in the initial area image is calculated, but also the context information between the pixels in different initial area images needs to be calculated, resulting in high computational complexity.

比如，初始区域图像a中存在像素1以及像素2，初始区域图像b中存在像素4以及像素5，则通过自注意力机制层，不但要计算像素1与像素2之间的目标像素上下文信息，计算像素4与像素5之间的目标像素上下文信息，而且需计算像素1与像素4之间的目标像素上下文信息，计算像素1与像素5之间的目标像素上下文信息，计算像素2与像素4之间的目标像素上下文信息以及计算像素2与像素5之间的目标像素上下文信息。For example, if there are pixel 1 and pixel 2 in the initial area image a, and pixel 4 and pixel 5 in the initial area image b, then through the self-attention mechanism layer, not only the target pixel context information between pixel 1 and pixel 2 needs to be calculated, Calculate the target pixel context information between pixel 4 and pixel 5, and calculate the target pixel context information between pixel 1 and pixel 4, calculate the target pixel context information between pixel 1 and pixel 5, calculate pixel 2 and pixel 4 The target pixel context information between and calculate the target pixel context information between pixel 2 and pixel 5.

而在本实施例中，在得到初始区域图像中像素之间的目标像素上下文之后，计算目标区域图像之间的目标区域上下文信息，然后根据目标区域图像和目标区域上下文信息，确定待分割图像的目标像素特征，不需要计算不同目标区域图像的像素之间的像素上下文信息，从而减少了计算复杂度，从而在较低的计算复杂度和计算冗余度的基础上，实现较高的语义分割精度。In this embodiment, after obtaining the target pixel context between the pixels in the initial area image, the target area context information between the target area images is calculated, and then according to the target area image and the target area context information, the image to be segmented is determined. The target pixel feature does not need to calculate the pixel context information between the pixels of different target area images, thereby reducing the computational complexity, so as to achieve higher semantic segmentation on the basis of lower computational complexity and computational redundancy precision.

比如，初始区域图像a中存在像素1以及像素2，初始区域图像b中存在像素4以及像素5，目标区域图像a1中存在像素6以及像素7，目标区域图像b1中存在像素8以及像素9，则在本实施例中，计算目标区域图像a1与目标区域图像b1之间的区域上下文信息，使得无需计算像素1与像素4之间的目标像素上下文信息、无需计算像素1与像素5之间的目标像素上下文信息、无需计算像素2与像素4之间的目标像素上下文信息以及无需计算像素2与像素5之间的目标像素上下文信息。For example, pixel 1 and pixel 2 exist in initial area image a, pixel 4 and pixel 5 exist in initial area image b, pixel 6 and pixel 7 exist in target area image a1, and pixel 8 and pixel 9 exist in target area image b1, Then in this embodiment, the area context information between the target area image a1 and the target area image b1 is calculated, so that it is not necessary to calculate the target pixel context information between the pixel 1 and the pixel 4, and the Target pixel context information, no need to calculate the target pixel context information between pixel 2 and pixel 4, and no need to calculate the target pixel context information between pixel 2 and pixel 5.

其中，可以通过已训练的神经网络模型中区域上下文层，确定目标区域图像之间的目标区域上下文信息，并根据目标区域图像和目标区域上下文信息，确定待分割图像的目标像素特征。The target area context information between the target area images can be determined through the area context layer in the trained neural network model, and the target pixel feature of the image to be segmented can be determined according to the target area image and the target area context information.

或者，可以通过欧式距离算法或余弦相似度算法，确定目标区域图像之间的目标区域上下文信息，并根据目标区域图像和目标区域上下文信息，确定待分割图像的目标像素特征。Alternatively, the Euclidean distance algorithm or the cosine similarity algorithm can be used to determine the target area context information between the target area images, and the target pixel features of the to-be-segmented image can be determined according to the target area images and the target area context information.

在一些实施例中，确定目标区域图像之间的目标区域上下文信息，并根据目标区域图像和目标区域上下文信息，确定待分割图像的目标像素特征，包括：In some embodiments, the target area context information between the target area images is determined, and according to the target area images and the target area context information, the target pixel feature of the image to be segmented is determined, including:

根据目标区域图像的像素的均值，确定目标区域图像对应的初始区域特征；According to the mean value of the pixels of the target area image, determine the initial area feature corresponding to the target area image;

根据初始区域特征，确定目标区域图像之间的目标区域上下文信息；Determine the target area context information between the target area images according to the initial area features;

根据初始区域特征和目标区域上下文信息确定目标区域特征；Determine the target area feature according to the initial area feature and the target area context information;

将目标区域特征，映射到目标形变网格上，得到待分割图像的目标像素特征。The target area features are mapped to the target deformation grid to obtain the target pixel features of the image to be segmented.

在本实施例中，将目标区域图像的像素的均值，作为目标区域图像对应的初始区域特征，然后根据目标区域图像对应的初始区域特征，计算目标区域图像之间的目标区域上下文信息，使得两个目标区域图像之间的目标区域上下文信息只需要计算一次，无需计算多次，进而大大减少了计算复杂度。In this embodiment, the average value of the pixels of the target area image is used as the initial area feature corresponding to the target area image, and then the target area context information between the target area images is calculated according to the initial area feature corresponding to the target area image, so that the two The target area context information between each target area image only needs to be calculated once, and does not need to be calculated multiple times, thereby greatly reducing the computational complexity.

当通过已训练的神经网络模型中区域上下文层确定目标区域图像之间的目标区域上下文信息，并根据目标区域图像和目标区域上下文信息，确定待分割图像的目标像素特征时，可以通过区域上下文层中区域池化层根据目标区域图像的像素的均值，确定目标区域图像对应的初始区域特征。可以通过区域上下文层中区域级自注意力机制层，根据初始区域特征，确定目标区域图像之间的目标区域上下文信息，以及根据初始区域特征和目标区域上下文信息确定目标区域特征。可以通过区域上下文层中区域反池化层将目标区域特征，映射到目标形变网格上，得到待分割图像的目标像素特征。When the target area context information between the target area images is determined through the area context layer in the trained neural network model, and the target pixel feature of the image to be segmented is determined according to the target area image and the target area context information, the area context layer can be used to determine the target pixel features of the image to be segmented. The middle region pooling layer determines the initial region feature corresponding to the target region image according to the average value of the pixels of the target region image. Through the region-level self-attention mechanism layer in the region context layer, the target region context information between the target region images can be determined according to the initial region feature, and the target region feature can be determined according to the initial region feature and the target region context information. The target region features can be mapped to the target deformation grid through the region de-pooling layer in the region context layer to obtain the target pixel features of the image to be segmented.

S204、根据目标像素特征，对待分割图像进行分割，得到分割结果。S204 , segment the image to be segmented according to the target pixel feature to obtain a segmentation result.

终端在得到目标像素特征之后，再根据目标像素特征，对待分割图像进行分割，得到分割结果。After obtaining the feature of the target pixel, the terminal further divides the image to be segmented according to the feature of the target pixel to obtain a segmentation result.

在一些实施例中，根据目标像素特征，对待分割图像进行分割，得到分割结果，包括：通过已训练的神经网络模型中分割层，根据目标像素特征，对待分割图像进行分割，得到分割结果。In some embodiments, segmenting the image to be segmented according to the feature of the target pixel to obtain the segmentation result includes: segmenting the image to be segmented according to the feature of the target pixel through the segmentation layer in the trained neural network model to obtain the segmentation result.

可选地，由于目标区域图像包含细节信息，特征图像包括全部信息，因此，为了分割地更加准确，可以通过已训练的神经网络模型中分割层，根据特征图像、目标区域图像以及目标像素特征，对待分割图像进行分割，得到分割结果。Optionally, since the target area image contains detailed information, and the feature image includes all information, in order to make the segmentation more accurate, you can use the segmentation layer in the trained neural network model, according to the feature image, the target area image and the target pixel features, Segment the image to be segmented to obtain the segmentation result.

在另一些实施例中，当目标像素上下文信息和目标区域图像通过已训练的神经网络模型中像素上下文层确定，目标区域上下文信息和目标像素特征通过已训练的神经网络模型中区域上下文层确定，分割结果通过已训练的神经网络模型中分割层确定时，该方法还包括：In other embodiments, when the target pixel context information and the target area image are determined by the pixel context layer in the trained neural network model, the target area context information and the target pixel feature are determined by the area context layer in the trained neural network model, When the segmentation result is determined by the segmentation layer in the trained neural network model, the method further includes:

获取训练样本集，并根据训练样本集的训练图像中，训练像素的个数确定训练像素的类别对应的目标权重；Obtain the training sample set, and determine the target weight corresponding to the category of the training pixel according to the number of training pixels in the training image of the training sample set;

对训练图像进行划分，得到训练区域图像；Divide the training image to obtain the training area image;

通过待训练的神经网络模型中像素上下文层，确定训练区域图像内的初始像素上下文信息，并根据初始像素上下文信息以及训练区域图像确定训练图像对应的第一区域图像；Determine the initial pixel context information in the training area image through the pixel context layer in the neural network model to be trained, and determine the first area image corresponding to the training image according to the initial pixel context information and the training area image;

通过待训练的神经网络模型中区域上下文层，确定第一区域图像之间的初始区域上下文信息，并根据第一区域图像和初始区域上下文信息，确定训练图像的初始像素特征；Determine the initial area context information between the first area images through the area context layer in the neural network model to be trained, and determine the initial pixel feature of the training image according to the first area image and the initial area context information;

通过待训练的神经网络模型中分割层，确定初始像素特征对应的训练像素的目标类别；Determine the target category of the training pixels corresponding to the initial pixel features through the segmentation layer in the neural network model to be trained;

根据目标类别，训练像素的标签以及目标权重，确定目标损失值；Determine the target loss value according to the target category, the label of the training pixel and the target weight;

获取待训练的神经网络模型的训练次数；Obtain the training times of the neural network model to be trained;

基于目标损失值和训练次数，对待训练的神经网络模型进行训练，得到已训练的神经网络模型。Based on the target loss value and the number of training times, the neural network model to be trained is trained to obtain the trained neural network model.

其中，可以将训练样本集的训练图像中，训练像素的个数代入以下公式中，得到训练像素的类别对应的目标权重：Among them, the number of training pixels in the training images of the training sample set can be substituted into the following formula to obtain the target weights corresponding to the categories of the training pixels:

其中，i表示训练像素的第i个类别，n_i表示第i类别的像素的个数，z表示类别的数量，q_i表示第i类别的像素对应的频率，q_m表示频率的中位数，w_i表示第i类别的像素对应的目标权重。Among them, _i represents the ith category of training pixels, ni represents the number of pixels of the ith category, z represents the number of categories, q _i represents the frequency corresponding to the pixel of the ith category, and q _m represents the median of the frequency , w _i represents the target weight corresponding to the pixel of the i-th category.

将目标类别，训练像素的标签以及目标权重代入以下公式中，得到目标损失值：Substitute the target category, the label of the training pixel, and the target weight into the following formula to get the target loss value:

其中，y_i表示训练像素的标签，p_i表示目标类别，L_c表示目标损失值。where _yi represents the label of the training pixel, _pi represents the target class, and _Lc represents the target loss value.

由于训练样本集中每个类别的像素的个数差异较大，因此，采用加权交叉熵损失函数计算目标损失值，并且目标权重根据训练像素的个数确定，从而避免过拟合，进而使得得到的已训练的神经网络模型进行语义分割的准确率更高。Since the number of pixels of each category in the training sample set is quite different, the weighted cross-entropy loss function is used to calculate the target loss value, and the target weight is determined according to the number of training pixels, so as to avoid overfitting and make the obtained The trained neural network model is more accurate for semantic segmentation.

可选地，通过待训练的神经网络模型中区域上下文层，确定第一区域图像之间的初始区域上下文信息，并根据第一区域图像和初始区域上下文信息，确定训练图像的初始像素特征，包括：Optionally, determine the initial area context information between the first area images through the area context layer in the neural network model to be trained, and determine the initial pixel features of the training image according to the first area image and the initial area context information, including: :

根据第一区域图像的像素的均值，确定第一区域图像对应的第一区域特征；Determine the first area feature corresponding to the first area image according to the average value of the pixels of the first area image;

根据第一区域特征，确定第一区域图像之间的初始区域上下文信息，并根据初始区域上下文信息和第一区域特征，得到第二区域特征；According to the first region feature, determine the initial region context information between the first region images, and obtain the second region feature according to the initial region context information and the first region feature;

将第二区域特征，映射到初始形变网格上，得到训练图像的初始像素特征。The second region feature is mapped to the initial deformation grid to obtain the initial pixel feature of the training image.

可选地，基于目标损失值和训练次数，对待训练的神经网络模型进行训练，得到已训练的神经网络模型，包括：Optionally, based on the target loss value and the number of training times, the neural network model to be trained is trained to obtain a trained neural network model, including:

若训练次数小于预设阈值，则对训练次数增加1，并根据目标损失值更新待训练的神经网络模型的网络参数，然后返回执行对训练图像进行划分，得到训练区域图像。若训练次数等于预设阈值，则将待训练的神经网络模型作为已训练的神经网络模型。If the number of training is less than the preset threshold, increase the number of training by 1, update the network parameters of the neural network model to be trained according to the target loss value, and then return to execute the division of the training image to obtain the training area image. If the number of training times is equal to the preset threshold, the neural network model to be trained is used as the trained neural network model.

或者，基于目标损失值和训练次数，对待训练的神经网络模型进行训练，得到已训练的神经网络模型，也包括：Or, based on the target loss value and the number of training times, train the neural network model to be trained to obtain the trained neural network model, including:

若目标损失值不满足预设条件，则根据目标损失值更新待训练的神经网络模型的网络参数，然后返回执行对训练图像进行划分，得到训练区域图像。若目标损失值满足预设条件，则将待训练的神经网络模型作为已训练的神经网络模型。If the target loss value does not meet the preset conditions, update the network parameters of the neural network model to be trained according to the target loss value, and then return to execute the division of the training image to obtain the training area image. If the target loss value satisfies the preset condition, the neural network model to be trained is used as the trained neural network model.

又或者，基于目标损失值和训练次数，对待训练的神经网络模型进行训练，得到已训练的神经网络模型，也包括：Or, based on the target loss value and the number of training times, the neural network model to be trained is trained to obtain the trained neural network model, including:

若目标损失值不满足预设条件且训练次数小于预设阈值，则对训练次数增加1，并根据目标损失值更新待训练的神经网络模型的网络参数，然后返回执行对训练图像进行划分，得到训练区域图像；If the target loss value does not meet the preset conditions and the training times is less than the preset threshold, increase the training times by 1, update the network parameters of the neural network model to be trained according to the target loss value, and then return to execute to divide the training images to obtain training area image;

若目标损失值满足预设条件且训练次数等于预设阈值，则将待训练的神经网络模型作为已训练的神经网络模型。If the target loss value satisfies the preset condition and the number of training times is equal to the preset threshold, the neural network model to be trained is used as the trained neural network model.

当目标形变网格通过已训练的神经网络模型中形变网络层得到时，对训练图像进行划分，得到训练区域图像，包括：When the target deformation mesh is obtained through the deformation network layer in the trained neural network model, the training image is divided to obtain the training area image, including:

通过待训练的神经网络模型中特征提取层，对训练图像进行特征提取，得到训练特征图像；Through the feature extraction layer in the neural network model to be trained, feature extraction is performed on the training image to obtain the training feature image;

通过待训练的神经网络模型中形变网络层，根据训练特征图像和预设网格对预设网格的顶点进行位移预测，得到初始形变网格；Through the deformation network layer in the neural network model to be trained, the displacement prediction is performed on the vertices of the preset mesh according to the training feature image and the preset mesh, and the initial deformation mesh is obtained;

通过待训练的神经网络模型中像素上下文层，根据初始形变网格对训练特征图像进行划分，得到训练区域图像；Through the pixel context layer in the neural network model to be trained, the training feature image is divided according to the initial deformation grid, and the training area image is obtained;

根据目标类别，训练像素的标签以及目标权重，确定目标损失值，包括：According to the target category, the label of the training pixel and the target weight, determine the target loss value, including:

根据目标类别，训练像素的标签以及目标权重，确定第一目标损失值；Determine the first target loss value according to the target category, the label of the training pixel and the target weight;

根据初始像素特征以及初始像素特征的均值确定第二目标损失值；Determine the second target loss value according to the initial pixel feature and the mean value of the initial pixel feature;

根据第一目标损失值以及第二目标损失值确定目标损失值；Determine the target loss value according to the first target loss value and the second target loss value;

基于目标损失值和训练次数，对待训练的神经网络模型进行训练，得到已训练的神经网络模型，包括：Based on the target loss value and the number of training times, the neural network model to be trained is trained to obtain the trained neural network model, including:

若目标损失值不满足预设条件和/或待训练的神经网络模型的训练次数小于预设阈值，则对训练次数增加1，根据第一目标损失值更新待训练的神经网络模型中特征提取层的网络参数、像素上下文层的网络参数、区域上下文层的网络参数以及分割层的网络参数，根据第二目标损失值更新待训练的神经网络模型中形变网络层的网络参数，并返回执行通过待训练的神经网络模型中特征提取层，对训练图像进行特征提取，得到训练特征图像；If the target loss value does not meet the preset conditions and/or the training times of the neural network model to be trained is less than the preset threshold, the training times are increased by 1, and the feature extraction layer in the neural network model to be trained is updated according to the first target loss value The network parameters of the pixel context layer, the network parameters of the pixel context layer, the network parameters of the region context layer, and the network parameters of the segmentation layer, update the network parameters of the deformation network layer in the neural network model to be trained according to the second target loss value, and return to execute The feature extraction layer in the trained neural network model performs feature extraction on the training image to obtain the training feature image;

若目标损失值满足预设条件和/或待训练的神经网络模型的训练次数等于预设阈值，则停止训练，得到已训练的神经网络模型。If the target loss value satisfies the preset condition and/or the training times of the neural network model to be trained is equal to the preset threshold, the training is stopped, and the trained neural network model is obtained.

因为图像中较多的边界点或突变点的位置均可能为预设网格中顶点位移后的位置，即图像中较多的边界点或突变点均可能与位移后的预设网格中顶点重合，所以得到预设网格中顶点的目标位移的准确度较低。并且，单独优化预设网格中顶点的位置会忽略预设网格中各个顶点之间的拓扑关系以及预设网格的整体分布。Because the positions of more boundary points or mutation points in the image may be the positions of the vertices in the preset grid after displacement, that is, more boundary points or mutation points in the image may be the same as the vertices in the preset grid after displacement. Coincidence, so the accuracy of obtaining the target displacement of the vertices in the preset mesh is low. Moreover, optimizing the positions of the vertices in the preset mesh alone ignores the topological relationship between the vertices in the preset mesh and the overall distribution of the preset mesh.

因此，在本实施例中，设置第二目标损失值，然后当目标损失值不满足预设条件和/或待训练的神经网络模型的训练次数小于预设阈值时，根据第二目标损失值更新待训练的神经网络模型中形变网络层，从而使得当通过形变网络层预测预设网格中顶点的目标位移时，预测的目标位移更加准确。Therefore, in this embodiment, a second target loss value is set, and then when the target loss value does not meet the preset conditions and/or the number of training times of the neural network model to be trained is less than the preset threshold value, the second target loss value is updated according to the The deformation network layer in the neural network model to be trained, so that when the target displacement of the vertices in the preset grid is predicted through the deformation network layer, the predicted target displacement is more accurate.

其中，可以将根据初始像素特征以及初始像素特征的均值代入以下公式中进行计算，从而得到第二目标损失值：The second target loss value can be obtained by substituting the initial pixel feature and the mean value of the initial pixel feature into the following formula for calculation:

L_var表示第二目标损失值，N表示第一区域图像的数量，j表示第j个第一区域图像，p_m表示第j个第一区域图像中第m个初始像素特征的坐标，f_m表示第j个第一区域图像中第m个初始像素特征，f_j表示第j个第一区域图像的初始像素特征的均值。||||₂表示2范数运算。L _var represents the second target loss value, N represents the number of first region images, j represents the jth first region image, p _m represents the coordinates of the mth initial pixel feature in the jth first region image, _fm represents the m-th initial pixel feature in the j-th first region image, and f _j represents the mean value of the initial pixel features of the j-th first region image. |||| ₂ means 2-norm operation.

通过待训练的神经网络模型中形变网络层，根据训练特征图像和预设网格对预设网格的顶点进行位移预测，得到初始形变网格，包括：Through the deformation network layer in the neural network model to be trained, the displacement of the vertices of the preset mesh is predicted according to the training feature image and the preset mesh, and the initial deformation mesh is obtained, including:

根据顶点坐标，在训练特征图像中进行顶点特征搜索，得到顶点坐标对应的第一特征；According to the vertex coordinates, perform vertex feature search in the training feature image to obtain the first feature corresponding to the vertex coordinates;

根据第一特征以及顶点坐标的上下文信息，确定顶点坐标对应的第二特征；According to the first feature and the context information of the vertex coordinates, determine the second feature corresponding to the vertex coordinates;

根据第二特征预测顶点坐标的初始位移，根据初始位移对顶点坐标进行移动，得到初始形变网格。The initial displacement of the vertex coordinates is predicted according to the second feature, and the vertex coordinates are moved according to the initial displacement to obtain an initial deformation mesh.

在另一些实施例中，根据第一目标损失值以及第二目标损失值确定目标损失值，包括：In other embodiments, determining the target loss value according to the first target loss value and the second target loss value includes:

确定初始形变网格中各个网格的子面积以及待分割图像的总面积；Determine the sub-areas of each grid in the initial deformed grid and the total area of the image to be segmented;

根据子面积和总面积确定第三目标损失值；Determine the third target loss value according to the sub area and the total area;

根据第一目标损失值、第二目标损失值以及第三目标损失值，确定目标损失值；Determine the target loss value according to the first target loss value, the second target loss value and the third target loss value;

若目标损失值不满足预设条件和/或待训练的神经网络模型的训练次数小于预设阈值，则对训练次数增加1，根据第一目标损失值更新所述待训练的神经网络模型中特征提取层的网络参数、像素上下文层的网络参数、区域上下文层的网络参数以及分割层的网络参数，根据第二目标损失值更新待训练的神经网络模型中形变网络层的网络参数，并返回执行通过待训练的神经网络模型中特征提取层，对训练图像进行特征提取，得到训练特征图像，包括：If the target loss value does not meet the preset conditions and/or the training times of the neural network model to be trained is less than the preset threshold, the training times are increased by 1, and the features in the neural network model to be trained are updated according to the first target loss value The network parameters of the extraction layer, the network parameters of the pixel context layer, the network parameters of the region context layer, and the network parameters of the segmentation layer are updated according to the second target loss value. The network parameters of the deformation network layer in the neural network model to be trained are returned to execute Through the feature extraction layer in the neural network model to be trained, feature extraction is performed on the training image to obtain the training feature image, including:

若目标损失值不满足预设条件和/或待训练的神经网络模型的训练次数小于预设阈值，则对训练次数增加1，根据第一目标损失值更新待训练的神经网络模型中特征提取层的网络参数、像素上下文层的网络参数、区域上下文层的网络参数以及分割层的网络参数，根据第二目标损失值和第三目标损失值更新待训练的神经网络模型中形变网络层的网络参数，并返回执行通过待训练的神经网络模型中特征提取层，对训练图像进行特征提取，得到训练特征图像。If the target loss value does not meet the preset conditions and/or the training times of the neural network model to be trained is less than the preset threshold, the training times are increased by 1, and the feature extraction layer in the neural network model to be trained is updated according to the first target loss value The network parameters of the pixel context layer, the network parameters of the pixel context layer, the network parameters of the region context layer, and the network parameters of the segmentation layer are updated according to the second target loss value and the third target loss value. The network parameters of the deformation network layer in the neural network model to be trained , and return to execute the feature extraction layer in the neural network model to be trained to perform feature extraction on the training image to obtain the training feature image.

为了避免初始形变网格中相邻网格存在交叉的问题，即初始形变网格中网格发生重叠，导致初始形变网格中所有网格的子面积之和超过待分割图像的总面积，所以，本实施例还根据第三目标损失值更新形变网络层，从而避免初始形变网格出现网格交叉的现象，进而是的当通过形变网络层预测预设网格中顶点的目标位移时，预测的目标位移更加准确，即使得通过形变网络层预测得到的目标形变网格不会出现交叉的现象。In order to avoid the problem of intersection between adjacent grids in the initial deformed grid, that is, the grids in the initial deformed grid overlap, causing the sum of the sub-areas of all grids in the initial deformed grid to exceed the total area of the image to be segmented, so , this embodiment also updates the deformation network layer according to the third target loss value, thereby avoiding the phenomenon of grid intersection in the initial deformation grid, and further, when predicting the target displacement of the vertices in the preset grid through the deformation network layer, predicting The target displacement is more accurate, that is, the target deformation meshes predicted by the deformation network layer will not cross the phenomenon.

其中，可以将子面积和总面积代入以下公式中，得到第三目标损失值：Among them, the sub-area and total area can be substituted into the following formula to obtain the third target loss value:

L_area表示第三目标损失值，area_j表示第j个第一区域图像的子面积，即第j个网格的子面积，area_img表示待分割图像的总面积。L _area represents the third target loss value, area _j represents the sub-area of the j-th first area image, that is, the sub-area of the j-th grid, and area _img represents the total area of the image to be segmented.

将第一目标损失值、第二目标损失值以及第三目标损失值代入以下公式中，得到目标损失值L：Substitute the first target loss value, the second target loss value and the third target loss value into the following formula to obtain the target loss value L:

L＝L_c+αL_var+βL_area L=L _c +αL _var +βL _area

α表示第一目标损失值的权重，β表示第二目标损失值的权重。α represents the weight of the first target loss value, and β represents the weight of the second target loss value.

需要说明的是，当将训练次数作为判断待训练的神经网络模型是否终止训练的条件时，根据第一目标损失值和第二目标损失值确定目标损失值也可以指将第一目标损失值和第二目标损失值作为目标损失值，即目标损失值包括第一目标损失值和第二目标损失值。It should be noted that, when the number of training times is used as a condition for judging whether the neural network model to be trained terminates the training, determining the target loss value according to the first target loss value and the second target loss value may also refer to combining the first target loss value and the second target loss value. The second target loss value is used as the target loss value, that is, the target loss value includes the first target loss value and the second target loss value.

同理地，当将训练次数作为判断待训练的神经网络模型是否终止训练的条件时，根据第一目标损失值、第二目标损失值和第三目标损失值确定目标损失值也可以指，将第一目标损失值、第二目标损失值和第三目标损失值，作为目标损失值，即目标损失值包括第一目标损失值、第二目标损失值以及第三目标损失值。Similarly, when the number of training times is used as a condition for judging whether the neural network model to be trained terminates the training, determining the target loss value according to the first target loss value, the second target loss value and the third target loss value may also refer to: The first target loss value, the second target loss value and the third target loss value are used as the target loss value, that is, the target loss value includes the first target loss value, the second target loss value and the third target loss value.

由以上可知，在本申请实施例中，在本申请实施例中，先获取待分割图像，并对待分割图像进行划分，得到待分割图像对应的初始区域图像。然后确定初始区域图像内的目标像素上下文信息，并根据目标像素上下文信息以及初始区域图像确定初始区域图像对应的目标区域图像。接着确定目标区域图像之间的目标区域上下文信息，并根据目标区域图像和目标区域上下文信息，确定待分割图像的目标像素特征。最后根据目标像素特征，对待分割图像进行识别并分割。It can be seen from the above that, in the embodiments of the present application, the images to be segmented are obtained first, and the images to be segmented are divided to obtain the initial region images corresponding to the images to be segmented. Then, the target pixel context information in the initial area image is determined, and the target area image corresponding to the initial area image is determined according to the target pixel context information and the initial area image. Next, the target area context information between the target area images is determined, and the target pixel features of the to-be-segmented image are determined according to the target area images and the target area context information. Finally, according to the target pixel features, the image to be segmented is identified and segmented.

根据上述实施例所描述的方法，以下将举例作进一步详细说明。According to the methods described in the above embodiments, the following examples will be used for further detailed description.

本实施例以语义分割装置集成在终端为例。该语义分割方法可以包括已训练的神经网络模型的应用方法和已训练的神经网络模型的训练方法。具体地，图5为已训练的神经网络模型的训练方法的流程示意图，图6为已训练的神经网络模型的应用方法的流程示意图。In this embodiment, the semantic segmentation device is integrated in the terminal as an example. The semantic segmentation method may include an application method of the trained neural network model and a training method of the trained neural network model. Specifically, FIG. 5 is a schematic flowchart of a training method of a trained neural network model, and FIG. 6 is a schematic flowchart of an application method of the trained neural network model.

请参阅图5，已训练的神经网络模型的训练方法包括：Referring to Figure 5, the training methods of the trained neural network model include:

S501、终端获取训练样本集，并根据训练样本集的训练图像中，训练像素的个数确定训练像素的类别对应的目标权重。S501. The terminal acquires a training sample set, and determines the target weight corresponding to the category of the training pixels according to the number of training pixels in the training images of the training sample set.

S502、终端通过待训练的神经网络模型中特征提取层，对训练图像进行特征提取，得到训练特征图像。S502: The terminal performs feature extraction on the training image through the feature extraction layer in the neural network model to be trained to obtain a training feature image.

在本实施例中，待训练的神经网络模型的结构可以如图6所示，待训练的神经网络模型包括特征提取层、像素上下文层、区域上下文层、形变网络层以及分割层，其中，特征提取层可以为残差神经网络，然后通过特征提取层得到训练特征图像F。In this embodiment, the structure of the neural network model to be trained may be as shown in FIG. 6 . The neural network model to be trained includes a feature extraction layer, a pixel context layer, a region context layer, a deformation network layer, and a segmentation layer. The extraction layer can be a residual neural network, and then the training feature image F is obtained through the feature extraction layer.

应理解，在得到训练特征图像之后，还可以对训练特征图像进行降维，得到降维后的训练特征图像X，然后再将降维后的训练特征图像输入至形变网络层以及像素上下文层。It should be understood that after obtaining the training feature image, the training feature image can also be dimensionally reduced to obtain a dimensionality-reduced training feature image X, and then the dimensionality-reduced training feature image is input to the deformation network layer and the pixel context layer.

S503、终端通过待训练的神经网络模型中形变网络层，确定预设网格中顶点坐标，并根据顶点坐标，在训练特征图像中进行顶点特征搜索，得到顶点坐标对应的第一特征。S503. The terminal determines the vertex coordinates in the preset grid by deforming the network layer in the neural network model to be trained, and performs vertex feature search in the training feature image according to the vertex coordinates to obtain the first feature corresponding to the vertex coordinates.

比如，如图6所示，形变网络层包括提取层、CoordConv层、6层逐点卷积层和自注意力层以及预测卷积层，终端通过形变网络层中提取层确定预设网格中顶点坐标(t_r，s_r)，并根据顶点坐标，在训练特征图像中进行顶点特征搜索，得到顶点坐标对应的第一特征。For example, as shown in Figure 6, the deformation network layer includes an extraction layer, a CoordConv layer, a 6-layer point-by-point convolution layer, a self-attention layer, and a prediction convolution layer. The terminal determines the preset grid through the extraction layer in the deformation network layer. vertex coordinates (t _r , s _r ), and according to the vertex coordinates, perform vertex feature search in the training feature image to obtain the first feature corresponding to the vertex coordinates.

S504、终端通过待训练的神经网络模型中形变网络层，根据第一特征以及顶点坐标的上下文信息，确定顶点坐标对应的第二特征，并根据第二特征预测顶点坐标的初始位移，根据初始位移对顶点坐标进行移动，得到初始形变网格。S504, the terminal determines the second feature corresponding to the vertex coordinate according to the first feature and the context information of the vertex coordinate through the deformation network layer in the neural network model to be trained, and predicts the initial displacement of the vertex coordinate according to the second feature, and according to the initial displacement Move the vertex coordinates to get the initial deformed mesh.

终端通过形变网络层中CoordConv层，可以先将第一特征和顶点坐标进行融合，得到初始候选特征。然后终端通过形变网络层中逐点卷积层对初始候选特征进行逐点卷积，得到初始中间特征，接着，再通过形变网络层中自注意力层获取顶点坐标的上下文信息，并根据初始中间特征和顶点坐标的上下文信息确定顶点坐标对应的第二特征，并且逐点卷积过程和自注意力层的过程可以执行6次。最后，终端再通过预测卷积层，根据第二特征预测顶点坐标的初始位移(Δt_r，Δs_r)，并根据初始位移对顶点坐标进行移动，得到初始形变网格。By deforming the CoordConv layer in the network layer, the terminal can first fuse the first feature and the vertex coordinates to obtain the initial candidate feature. Then, the terminal performs point-by-point convolution on the initial candidate features through the point-by-point convolution layer in the deformation network layer to obtain the initial intermediate features, and then obtains the context information of the vertex coordinates through the self-attention layer in the deformation network layer. The context information of the feature and vertex coordinates determines the second feature corresponding to the vertex coordinates, and the point-by-point convolution process and the process of the self-attention layer can be performed 6 times. Finally, through the prediction convolution layer, the terminal predicts the initial displacement (Δt _r , Δs _r ) of the vertex coordinates according to the second feature, and moves the vertex coordinates according to the initial displacement to obtain the initial deformed mesh.

S505、终端通过待训练的神经网络模型中像素上下文层，根据初始形变网格对训练特征图像进行划分，得到训练区域图像。S505: The terminal divides the training feature image according to the initial deformation grid through the pixel context layer in the neural network model to be trained, and obtains the training area image.

S506、终端通过待训练的神经网络模型中像素上下文层，确定训练区域图像内的初始像素上下文信息，并根据初始像素上下文信息以及训练区域图像确定训练图像对应的第一区域图像。S506. The terminal determines the initial pixel context information in the training area image through the pixel context layer in the neural network model to be trained, and determines the first area image corresponding to the training image according to the initial pixel context information and the training area image.

对于训练区域图像

则第一区域图像可以为

其中，K_j表示第j个训练区域图像中的像素，并且

则各个第一区域图像可以组成新特征图X'∈R^C×H×W，其中，R表示维度，C表示通道数，H表示训练图像的高度，W表示训练图像的宽度。For training area images

Then the first region image can be

where Kj represents the pixel in the _jth training region image, and

Then each first region image can form a new feature map X'∈R ^C×H×W , where R represents the dimension, C represents the number of channels, H represents the height of the training image, and W represents the width of the training image.

S507、终端通过待训练的神经网络模型中区域上下文层的区域池化层，根据第一区域图像的像素的均值，确定第一区域图像对应的第一区域特征。S507. The terminal determines the first region feature corresponding to the first region image according to the average value of the pixels of the first region image through the region pooling layer of the region context layer in the neural network model to be trained.

S508、终端通过待训练的神经网络模型中区域上下文层的区域级自注意力机制层，根据第一区域特征，确定第一区域图像之间的初始区域上下文信息，并根据初始区域上下文信息和第一区域特征，得到第二区域特征。S508, the terminal determines the initial region context information between the images of the first region through the region-level self-attention mechanism layer of the region context layer in the neural network model to be trained, according to the first region feature, and according to the initial region context information and the first region context information A region feature is obtained, and a second region feature is obtained.

S509、终端通过待训练的神经网络模型中区域上下文层的区域反池化层，将第二区域特征，映射到初始形变网格上，得到训练图像的初始像素特征。S509: The terminal maps the second region feature to the initial deformation grid through the region de-pooling layer of the region context layer in the neural network model to be trained to obtain the initial pixel feature of the training image.

S5010、终端通过待训练的神经网络模型中分割层，根据初始像素特征、第一区域图像以及训练特征图像确定初始像素特征对应的训练像素的目标类别。S5010. The terminal determines the target category of the training pixel corresponding to the initial pixel feature according to the initial pixel feature, the first region image, and the training feature image through the segmentation layer in the neural network model to be trained.

S5011、终端根据目标类别，训练像素的标签以及目标权重，确定第一目标损失值。S5011. The terminal determines a first target loss value according to the target category, the labels of the training pixels and the target weight.

S5012、终端根据初始像素特征以及初始像素特征的均值确定第二目标损失值。S5012. The terminal determines the second target loss value according to the initial pixel feature and the mean value of the initial pixel feature.

S5013、终端确定初始形变网格中各个网格的子面积以及待分割图像的总面积，并根据子面积和总面积确定第三目标损失值。S5013. The terminal determines the sub-areas of each grid in the initial deformation grid and the total area of the image to be segmented, and determines a third target loss value according to the sub-areas and the total area.

S5014、终端获取待训练的神经网络模型的训练次数。S5014. The terminal acquires the training times of the neural network model to be trained.

S5015、若训练次数小于预设阈值，终端则对训练次数增加1，根据第一目标损失值更新待训练的神经网络模型中特征提取层的网络参数、像素上下文层的网络参数、区域上下文层的网络参数以及分割层的网络参数，根据第二目标损失值和第三目标损失值更新待训练的神经网络模型中形变网络层的网络参数，并返回执行S502。S5015. If the number of training times is less than the preset threshold, the terminal increases the number of training times by 1, and updates the network parameters of the feature extraction layer, the network parameters of the pixel context layer, and the network parameters of the region context layer in the neural network model to be trained according to the first target loss value. For the network parameters and the network parameters of the segmentation layer, update the network parameters of the deformation network layer in the neural network model to be trained according to the second target loss value and the third target loss value, and return to execute S502.

S5016、若训练次数等于预设阈值，终端则将待训练的神经网络模型作为已训练的神经网络型。S5016. If the number of training times is equal to the preset threshold, the terminal takes the neural network model to be trained as the trained neural network type.

参照图7，已训练的神经网络模型的应用方法包括：Referring to Figure 7, the application method of the trained neural network model includes:

S701、终端获取待分割图像，并通过已训练的神经网络模型中特征提取层对待分割图像进行特征提取，得到特征图像。S701. The terminal acquires the image to be segmented, and performs feature extraction on the image to be segmented through the feature extraction layer in the trained neural network model to obtain a feature image.

S702、终端获取预设网格，并通过已训练的神经网络模型中形变网络层，确定预设网格中顶点坐标，并根据顶点坐标，在特征图像中进行顶点特征搜索，得到顶点坐标对应的初始特征。S702. The terminal obtains a preset grid, and determines the vertex coordinates in the preset grid through the deformation network layer in the trained neural network model, and performs vertex feature search in the feature image according to the vertex coordinates to obtain the corresponding vertex coordinates. initial features.

S703、终端通过已训练的神经网络模型中形变网络层，根据初始特征以及顶点坐标的上下文信息，确定顶点坐标对应的目标特征，并根据目标特征预测顶点坐标的目标位移，根据目标位移对顶点坐标进行移动，得到目标形变网格。S703, the terminal determines the target feature corresponding to the vertex coordinate according to the initial feature and the context information of the vertex coordinate through the deformation network layer in the trained neural network model, and predicts the target displacement of the vertex coordinate according to the target feature, and determines the vertex coordinate according to the target displacement. Move to get the target deformed mesh.

S704、终端通过已训练的神经网络模型中像素上下文层，根据目标形变网格对特征图像进行划分，得到待分割图像对应的初始区域图像。S704. The terminal divides the feature image according to the target deformation grid through the pixel context layer in the trained neural network model, and obtains an initial region image corresponding to the image to be divided.

S705、终端通过已训练的神经网络模型中像素上下文层，确定初始区域图像内的目标像素上下文信息，并根据目标像素上下文信息以及初始区域图像确定初始区域图像对应的目标区域图像。S705. The terminal determines the target pixel context information in the initial area image through the pixel context layer in the trained neural network model, and determines the target area image corresponding to the initial area image according to the target pixel context information and the initial area image.

S706、终端通过已训练的神经网络模型中区域上下文层的区域池化层，根据目标区域图像的像素的均值，确定目标区域图像对应的初始区域特征。S706. The terminal determines the initial region feature corresponding to the target region image according to the average value of the pixels of the target region image through the region pooling layer of the region context layer in the trained neural network model.

S707、终端通过已训练的神经网络模型中区域上下文层的区域级自注意力层，确定目标区域图像之间的目标区域上下文信息，并根据初始区域特征和目标区域上下文信息确定目标区域特征。S707. The terminal determines the target area context information between the target area images through the area-level self-attention layer of the area context layer in the trained neural network model, and determines the target area feature according to the initial area feature and the target area context information.

S708、终端通过已训练的神经网络模型中区域上下文层的区域反池化层，将目标区域特征，映射到目标形变网格上，得到待分割图像的目标像素特征。S708. The terminal maps the target area feature to the target deformation grid through the area de-pooling layer of the area context layer in the trained neural network model to obtain the target pixel feature of the image to be segmented.

比如，得到的目标像素特征可以采用X”∈R^C×H×W。For example, the obtained target pixel feature can be X"∈R ^C×H×W .

S709、终端通过已训练的神经网络模型中分割层，根据目标像素特征、目标区域图像以及特征图像，对待分割图像进行分割，得到分割结果。S709. The terminal uses the segmentation layer in the trained neural network model to segment the image to be segmented according to the target pixel feature, the target area image, and the feature image to obtain a segmentation result.

其中，分割层可以为softmax层。比如，通过已训练的神经网络模型得到的分割结果可以如图8所示，图8中虚线框表示感兴趣的区域。Wherein, the segmentation layer can be a softmax layer. For example, the segmentation result obtained by the trained neural network model can be shown in Fig. 8, where the dotted box in Fig. 8 represents the region of interest.

本实施例中的有益效果以及其他的可实现方式，具体可以参照上述语义分割方法实施例中，本实施例在此不再赘述。For the beneficial effects and other achievable manners in this embodiment, reference may be made to the above-mentioned embodiment of the semantic segmentation method, and details are not described herein again in this embodiment.

为便于更好的实施本申请实施例提供的语义分割方法，本申请实施例还提供一种基于上述语义分割方法的装置。其中名词的含义与上述语义分割方法中相同，具体实现细节可以参考方法实施例中的说明。To facilitate better implementation of the semantic segmentation method provided by the embodiment of the present application, the embodiment of the present application further provides an apparatus based on the above-mentioned semantic segmentation method. The meanings of the nouns are the same as those in the above-mentioned semantic segmentation method, and the specific implementation details can refer to the description in the method embodiment.

例如，如图9所示，该语义分割装置可以包括：For example, as shown in Figure 9, the semantic segmentation device may include:

图像获取模块901，用于获取待分割图像，并对待分割图像进行划分，得到待分割图像对应的初始区域图像。The image acquisition module 901 is configured to acquire an image to be segmented, and to divide the image to be segmented to obtain an initial region image corresponding to the image to be segmented.

第一确定模块902，用于确定初始区域图像内的目标像素上下文信息，并根据目标像素上下文信息以及初始区域图像确定初始区域图像对应的目标区域图像。The first determining module 902 is configured to determine the target pixel context information in the initial area image, and determine the target area image corresponding to the initial area image according to the target pixel context information and the initial area image.

第二确定模块903，用于确定目标区域图像之间的目标区域上下文信息，并根据目标区域图像和目标区域上下文信息，确定待分割图像的目标像素特征。The second determining module 903 is configured to determine the target area context information between the target area images, and determine the target pixel feature of the image to be segmented according to the target area image and the target area context information.

图像分割模块904，用于根据目标像素特征，对待分割图像进行分割，得到分割结果。The image segmentation module 904 is configured to segment the image to be segmented according to the feature of the target pixel to obtain a segmentation result.

可选地，图像获取模块901具体用于执行：Optionally, the image acquisition module 901 is specifically configured to execute:

根据目标形变网格对特征图像进行划分，得到待分割图像对应的初始区域图像。The feature image is divided according to the target deformation grid to obtain the initial region image corresponding to the image to be divided.

根据目标特征预测顶点坐标的目标位移，并根据目标位移对顶点坐标进行移动，得到目标形变网格。The target displacement of the vertex coordinates is predicted according to the target feature, and the vertex coordinates are moved according to the target displacement to obtain the target deformation mesh.

可选地，第二确定模块903具体用于执行：Optionally, the second determining module 903 is specifically configured to execute:

训练模块，用于执行：A training module that executes:

通过待训练的神经网络模型中形变网络层，根据训练特征图像和预设网格对预设网格的顶点进行位移预测，得到初始形变网格；Through the deformation network layer in the neural network model to be trained, displacement prediction is performed on the vertices of the preset mesh according to the training feature image and the preset mesh to obtain the initial deformation mesh;

若目标损失值不满足预设条件和/或待训练的神经网络模型的训练次数小于预设阈值，则对训练次数增加1，根据第一目标损失值更新待训练的神经网络模型中特征提取层的网络参数、像素上下文层的网络参数、区域上下文层的网络参数以及分割层的网络参数，根据第二目标损失值和第三目标损失值更新待训练的神经网络模型中形变网络层的网络参数，并返回执行通过待训练的神经网络模型中特征提取层，对训练图像进行特征提取，得到训练特征图像。If the target loss value does not meet the preset conditions and/or the training times of the neural network model to be trained is less than the preset threshold, the training times are increased by 1, and the feature extraction layer in the neural network model to be trained is updated according to the first target loss value The network parameters of the pixel context layer, the network parameters of the region context layer, and the network parameters of the segmentation layer are updated according to the second target loss value and the third target loss value. The network parameters of the deformation network layer in the neural network model to be trained , and return to execute the feature extraction layer in the neural network model to be trained to perform feature extraction on the training image to obtain the training feature image.

具体实施时，以上各个模块可以作为独立的实体来实现，也可以进行任意组合，作为同一或若干个实体来实现，以上各个模块的具体实施方式以及对应的有益效果可参见前面的方法实施例，在此不再赘述。During specific implementation, the above modules can be implemented as independent entities, or can be arbitrarily combined, implemented as the same or several entities. The specific implementation of the above modules and the corresponding beneficial effects can refer to the previous method embodiments. It is not repeated here.

本申请实施例还提供一种电子设备，该电子设备可以是服务器或终端等，如图10所示，其示出了本申请实施例所涉及的电子设备的结构示意图，具体来讲：The embodiment of the present application also provides an electronic device, which may be a server or a terminal, etc. As shown in FIG. 10 , which shows a schematic structural diagram of the electronic device involved in the embodiment of the present application, specifically:

该电子设备可以包括一个或者一个以上处理核心的处理器1001、一个或一个以上计算机可读存储介质的存储器1002、电源1003和输入单元1004等部件。本领域技术人员可以理解，图10中示出的电子设备结构并不构成对电子设备的限定，可以包括比图示更多或更少的部件，或者组合某些部件，或者不同的部件布置。其中：The electronic device may include a processor 1001 of one or more processing cores, a memory 1002 of one or more computer-readable storage media, a power supply 1003 and an input unit 1004 and other components. Those skilled in the art can understand that the structure of the electronic device shown in FIG. 10 does not constitute a limitation on the electronic device, and may include more or less components than the one shown, or combine some components, or arrange different components. in:

处理器1001是该电子设备的控制中心，利用各种接口和线路连接整个电子设备的各个部分，通过运行或执行存储在存储器1002内的计算机程序和/或模块，以及调用存储在存储器1002内的数据，执行电子设备的各种功能和处理数据。可选的，处理器1001可包括一个或多个处理核心；优选的，处理器1001可集成应用处理器和调制解调处理器，其中，应用处理器主要处理操作系统、用户界面和应用程序等，调制解调处理器主要处理无线通信。可以理解的是，上述调制解调处理器也可以不集成到处理器1001中。The processor 1001 is the control center of the electronic device, uses various interfaces and lines to connect various parts of the entire electronic device, runs or executes the computer programs and/or modules stored in the memory 1002, and invokes the computer program stored in the memory 1002. data, perform various functions of electronic equipment and process data. Optionally, the processor 1001 may include one or more processing cores; preferably, the processor 1001 may integrate an application processor and a modem processor, wherein the application processor mainly processes the operating system, user interface, and application programs, etc. , the modem processor mainly deals with wireless communication. It can be understood that, the above-mentioned modulation and demodulation processor may not be integrated into the processor 1001.

存储器1002可用于存储计算机程序以及模块，处理器1001通过运行存储在存储器1002的计算机程序以及模块，从而执行各种功能应用以及数据处理。存储器1002可主要包括存储程序区和存储数据区，其中，存储程序区可存储操作系统、至少一个功能所需的计算机程序(比如声音播放功能、图像播放功能等)等；存储数据区可存储根据电子设备的使用所创建的数据等。此外，存储器1002可以包括高速随机存取存储器，还可以包括非易失性存储器，例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。相应地，存储器1002还可以包括存储器控制器，以提供处理器1001对存储器1002的访问。The memory 1002 can be used to store computer programs and modules, and the processor 1001 executes various functional applications and data processing by running the computer programs and modules stored in the memory 1002 . The memory 1002 may mainly include a stored program area and a stored data area, wherein the stored program area may store an operating system, a computer program required for at least one function (such as a sound playback function, an image playback function, etc.), etc.; Data created by the use of electronic equipment, etc. Additionally, memory 1002 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, memory 1002 may also include a memory controller to provide processor 1001 access to memory 1002 .

电子设备还包括给各个部件供电的电源1003，优选的，电源1003可以通过电源管理系统与处理器1001逻辑相连，从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。电源1003还可以包括一个或一个以上的直流或交流电源、再充电系统、电源故障检测电路、电源转换器或者逆变器、电源状态指示器等任意组件。The electronic device also includes a power supply 1003 for supplying power to various components. Preferably, the power supply 1003 can be logically connected to the processor 1001 through a power management system, so as to manage charging, discharging, and power consumption management functions through the power management system. Power supply 1003 may also include one or more DC or AC power sources, recharging systems, power failure detection circuits, power converters or inverters, power status indicators, and any other components.

该电子设备还可包括输入单元1004，该输入单元1004可用于接收输入的数字或字符信息，以及产生与用户设置以及功能控制有关的键盘、鼠标、操作杆、光学或者轨迹球信号输入。The electronic device may also include an input unit 1004, which may be used to receive input numerical or character information and generate keyboard, mouse, joystick, optical or trackball signal input related to user settings and function control.

尽管未示出，电子设备还可以包括显示单元等，在此不再赘述。具体在本实施例中，电子设备中的处理器1001会按照如下的指令，将一个或一个以上的计算机程序的进程对应的可执行文件加载到存储器1002中，并由处理器1001来运行存储在存储器1002中的计算机程序，从而实现各种功能，比如：Although not shown, the electronic device may further include a display unit and the like, which will not be described here. Specifically, in this embodiment, the processor 1001 in the electronic device loads the executable files corresponding to the processes of one or more computer programs into the memory 1002 according to the following instructions, and the processor 1001 executes them and stores them in the memory 1002 . The computer program in the memory 1002, thereby realizing various functions, such as:

获取待分割图像，并对待分割图像进行划分，得到待分割图像对应的初始区域图像；Obtain the image to be segmented, and divide the image to be segmented to obtain an initial area image corresponding to the image to be segmented;

确定初始区域图像内的目标像素上下文信息，并根据目标像素上下文信息以及初始区域图像确定初始区域图像对应的目标区域图像；Determine the target pixel context information in the initial area image, and determine the target area image corresponding to the initial area image according to the target pixel context information and the initial area image;

确定目标区域图像之间的目标区域上下文信息，并根据目标区域图像和目标区域上下文信息，确定待分割图像的目标像素特征；Determine the target area context information between the target area images, and determine the target pixel feature of the to-be-segmented image according to the target area image and the target area context information;

根据目标像素特征，对待分割图像进行分割，得到分割结果。Segment the image to be segmented according to the feature of the target pixel, and obtain the segmentation result.

以上各个操作的具体实施方式以及对应的有益效果可参见上文对语义分割方法的详细描述，在此不作赘述。For specific implementations of the above operations and corresponding beneficial effects, reference may be made to the detailed description of the semantic segmentation method above, which will not be repeated here.

本领域普通技术人员可以理解，上述实施例的各种方法中的全部或部分步骤可以通过计算机程序来完成，或通过计算机程序控制相关的硬件来完成，该计算机程序可以存储于一计算机可读存储介质中，并由处理器进行加载和执行。Those of ordinary skill in the art can understand that all or part of the steps in the various methods of the above embodiments can be completed by a computer program, or by a computer program that controls related hardware, and the computer program can be stored in a computer-readable storage. medium, and is loaded and executed by the processor.

为此，本申请实施例提供一种计算机可读存储介质，其中存储有计算机程序，该计算机程序能够被处理器进行加载，以执行本申请实施例所提供的任一种语义分割方法中的步骤。例如，该计算机程序可以执行如下步骤：To this end, the embodiments of the present application provide a computer-readable storage medium, in which a computer program is stored, and the computer program can be loaded by a processor to execute the steps in any of the semantic segmentation methods provided by the embodiments of the present application . For example, the computer program may perform the following steps:

以上各个操作的具体实施方式以及对应的有益效果可参见前面的实施例，在此不再赘述。For the specific implementation manners of the above operations and the corresponding beneficial effects, reference may be made to the foregoing embodiments, which will not be repeated here.

其中，该计算机可读存储介质可以包括：只读存储器(ROM，Read Only Memory)、随机存取记忆体(RAM，Random Access Memory)、磁盘或光盘等。Wherein, the computer-readable storage medium may include: read only memory (ROM, Read Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk, and the like.

由于该计算机可读存储介质中所存储的计算机程序，可以执行本申请实施例所提供的任一种语义分割方法中的步骤，因此，可以实现本申请实施例所提供的任一种语义分割方法所能实现的有益效果，详见前面的实施例，在此不再赘述。Because the computer program stored in the computer-readable storage medium can execute the steps in any of the semantic segmentation methods provided by the embodiments of the present application, it is possible to implement any of the semantic segmentation methods provided by the embodiments of the present application. For the beneficial effects that can be achieved, refer to the foregoing embodiments for details, which will not be repeated here.

其中，根据本申请的一个方面，提供了一种计算机程序产品或计算机程序，该计算机程序产品或计算机程序包括计算机指令，该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令，处理器执行该计算机指令，使得该计算机设备执行上述语义分割方法。Wherein, according to one aspect of the present application, there is provided a computer program product or computer program, the computer program product or computer program comprising computer instructions, and the computer instructions are stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the above-mentioned semantic segmentation method.

以上对本申请实施例所提供的一种语义分割方法、装置、电子设备和计算机可读存储介质进行了详细介绍，本文中应用了具体个例对本申请的原理及实施方式进行了阐述，以上实施例的说明只是用于帮助理解本申请的方法及其核心思想；同时，对于本领域的技术人员，依据本申请的思想，在具体实施方式及应用范围上均会有改变之处，综上所述，本说明书内容不应理解为对本申请的限制。A semantic segmentation method, apparatus, electronic device, and computer-readable storage medium provided by the embodiments of the present application have been described above in detail. The principles and implementations of the present application are described with specific examples. The above embodiments The description is only used to help understand the method of the present application and its core idea; meanwhile, for those skilled in the art, according to the idea of the present application, there will be changes in the specific embodiments and application scope. In summary, the above , the contents of this specification should not be construed as limiting the application.

Claims

1. A method of semantic segmentation, comprising:

acquiring an image to be segmented, and dividing the image to be segmented to obtain an initial region image corresponding to the image to be segmented;

determining target pixel context information in the initial area image, and determining a target area image corresponding to the initial area image according to the target pixel context information and the initial area image;

determining target area context information between the target area images, and determining target pixel characteristics of the image to be segmented according to the target area images and the target area context information;

and segmenting the image to be segmented according to the target pixel characteristics to obtain a segmentation result.

2. The semantic segmentation method according to claim 1, wherein the step of dividing the image to be segmented to obtain an initial region image corresponding to the image to be segmented comprises:

performing feature extraction on the image to be segmented to obtain a feature image;

acquiring a preset grid, and performing displacement prediction on the vertex of the preset grid according to the characteristic image and the preset grid to obtain a target deformation grid;

and dividing the characteristic image according to the target deformation grid to obtain an initial area image corresponding to the image to be segmented.

3. The semantic segmentation method according to claim 2, wherein the determining of the context information of the target area between the target area images and the determining of the target pixel feature of the image to be segmented according to the target area images and the context information of the target area comprise:

determining initial area characteristics corresponding to the target area image according to the mean value of the pixels of the target area image;

determining target area context information between the target area images according to the initial area characteristics;

determining the target area characteristics according to the initial area characteristics and the target area context information;

and mapping the target region characteristics to the target deformation grid to obtain the target pixel characteristics of the image to be segmented.

4. The semantic segmentation method according to any one of claims 1-3, wherein the target pixel context information and the target area image are determined by a pixel context layer in a trained neural network model, the target area context information and the target pixel feature are determined by a region context layer in the trained neural network model, and the segmentation result is determined by a segmentation layer in the trained neural network model, the method further comprising:

acquiring a training sample set, and determining a target weight corresponding to the category of a training pixel according to the number of the training pixels in a training image of the training sample set;

dividing the training image to obtain a training area image;

determining initial pixel context information in the training area image through a pixel context layer in a neural network model to be trained, and determining a first area image corresponding to the training image according to the initial pixel context information and the training area image;

determining initial region context information between the first region images through a region context layer in the neural network model to be trained, and determining initial pixel characteristics of the training images according to the first region images and the initial region context information;

determining the target class of the training pixel corresponding to the initial pixel feature through a segmentation layer in the neural network model to be trained;

determining a target loss value according to the target category, the label of the training pixel and the target weight;

acquiring the training times of the neural network model to be trained;

and training the neural network model to be trained based on the target loss value and the training times to obtain the trained neural network model.

5. The semantic segmentation method according to claim 4, wherein the target deformation mesh is obtained through a deformation network layer in the trained neural network model;

the dividing the training image to obtain a training area image includes:

performing feature extraction on the training image through a feature extraction layer in the neural network model to be trained to obtain a training feature image;

performing displacement prediction on the vertex of the preset mesh according to the training characteristic image and the preset mesh through the deformation network layer in the neural network model to be trained to obtain an initial deformation mesh;

dividing the training characteristic image according to the initial deformation grid through the pixel context layer in the neural network model to be trained to obtain a training area image;

determining a target loss value according to the target class, the label of the training pixel and the target weight, including:

determining a first target loss value according to the target category, the label of the training pixel and the target weight;

determining a second target loss value according to the initial pixel characteristic and the average value of the initial pixel characteristic;

determining a target loss value according to the first target loss value and the second target loss value;

the training the neural network model to be trained based on the target loss value and the training times to obtain the trained neural network model, including:

if the target loss value does not meet a preset condition and/or the training times of the neural network model to be trained are smaller than a preset threshold value, increasing 1 for the training times, updating the network parameters of the feature extraction layer, the pixel context layer, the region context layer and the segmentation layer in the neural network model to be trained according to the first target loss value, updating the network parameters of the deformation network layer in the neural network model to be trained according to the second target loss value, returning to execute the feature extraction of the training image through the feature extraction layer in the neural network model to be trained, and obtaining a training feature image;

and if the target loss value meets a preset condition and/or the training times of the neural network model to be trained are equal to a preset threshold value, stopping training to obtain the trained neural network model.

6. The semantic segmentation method according to claim 5, wherein determining a target loss value based on the first target loss value and the second target loss value comprises:

determining the sub-area of each grid in the initial deformation grid and the total area of the image to be segmented;

determining a third target loss value according to the sub-area and the total area;

determining a target loss value according to the first target loss value, the second target loss value and the third target loss value;

if the target loss value does not meet a preset condition and/or the training frequency of the neural network model to be trained is smaller than a preset threshold value, adding 1 to the training frequency, updating the network parameters of the feature extraction layer, the pixel context layer, the region context layer and the segmentation layer in the neural network model to be trained according to the first target loss value, updating the network parameters of the deformation network layer in the neural network model to be trained according to the second target loss value, returning to execute the feature extraction through the feature extraction layer in the neural network model to be trained, and performing the feature extraction on the training image to obtain a training feature image, wherein the method comprises the following steps:

if the target loss value does not meet a preset condition and/or the training frequency of the neural network model to be trained is smaller than a preset threshold value, increasing 1 for the training frequency, updating the network parameters of the feature extraction layer, the pixel context layer, the region context layer and the segmentation layer in the neural network model to be trained according to the first target loss value, updating the network parameters of the deformation network layer in the neural network model to be trained according to the second target loss value and the third target loss value, and returning to execute the feature extraction of the training image through the feature extraction layer in the neural network model to be trained to obtain a training feature image.

7. A semantic segmentation apparatus, comprising:

the image acquisition module is used for acquiring an image to be segmented and dividing the image to be segmented to obtain an initial region image corresponding to the image to be segmented;

a first determining module, configured to determine context information of a target pixel in the initial area image, and determine a target area image corresponding to the initial area image according to the context information of the target pixel and the initial area image;

the second determining module is used for determining target area context information between the target area images and determining target pixel characteristics of the image to be segmented according to the target area images and the target area context information;

and the image segmentation module is used for segmenting the image to be segmented according to the target pixel characteristics to obtain a segmentation result.

8. An electronic device comprising a processor and a memory, the memory storing a computer program, the processor being configured to execute the computer program in the memory to perform the semantic segmentation method according to any one of claims 1 to 6.

9. A computer-readable storage medium, characterized in that it stores a computer program adapted to be loaded by a processor for performing the semantic segmentation method according to any one of claims 1 to 6.

10. A computer program product, characterized in that it stores a computer program adapted to be loaded by a processor for performing the semantic segmentation method according to any one of claims 1 to 6.