CN113592876A

CN113592876A - Training method and device for split network, computer equipment and storage medium

Info

Publication number: CN113592876A
Application number: CN202110049449.7A
Authority: CN
Inventors: 胡一凡
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-01-14
Filing date: 2021-01-14
Publication date: 2021-11-02

Abstract

The application relates to a training method and device for a split network, computer equipment and a storage medium. The method comprises the following steps: acquiring a sample image comprising a target geometric figure and a corresponding labeling geometric area; carrying out target segmentation processing on the sample image through a segmentation network to be trained to obtain a prediction geometric area corresponding to a target geometric figure and corresponding prediction vertex information; determining corresponding regional characteristic loss according to the predicted geometric region and the labeled geometric region; determining corresponding geometric characteristic loss according to the predicted geometric area and the predicted polygon area determined by the predicted vertex information; training a segmentation network to be trained based on a target loss function constructed by regional characteristic loss and geometric characteristic loss until a training stopping condition is reached to obtain a trained target segmentation network; the target segmentation network is used for segmenting a target geometric figure from an image to be processed. By adopting the method, the accuracy and the precision of the network segmentation can be improved.

Description

Training method and device for split network, computer equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a training method and apparatus for a split network, a computer device, and a storage medium.

Background

With the development of computer technology, deep learning is widely used in various fields, such as image recognition, image segmentation, and the like by the deep learning. Partial areas or partial information required by a user can be segmented from the image through image recognition and image segmentation.

However, image segmentation is a prediction at a pixel level, and usually requires multiple downsampling and upsampling, and multi-scale information is used in the identification and segmentation process. In the traditional method, parameters are reduced by reducing the up-down sampling times and the number of channels, but the precision of image segmentation is easily reduced.

Disclosure of Invention

In view of the above, it is necessary to provide a training method, an apparatus, a computer device and a storage medium for a segmentation network, which can improve the accuracy and precision of image segmentation.

A training method of a split network, the method comprising:

obtaining a sample image comprising a target geometric figure, and determining an annotation geometric area corresponding to the target geometric figure based on the sample image;

performing target segmentation processing on the sample image through a segmentation network to be trained to obtain a predicted geometric area corresponding to the target geometric figure, and determining predicted vertex information corresponding to the target geometric figure based on image characteristics in the target segmentation processing;

determining corresponding regional characteristic loss according to the predicted geometric region and the labeled geometric region;

determining corresponding geometric feature loss according to the predicted geometric region and the predicted polygon region determined by the predicted vertex information;

constructing an objective loss function based on the regional characteristic loss and the geometric characteristic loss;

training the segmentation network to be trained through the target loss function until the training stopping condition is reached, and obtaining a trained target segmentation network; the target segmentation network is used for segmenting a target geometric figure from an image to be processed.

In one embodiment, the target vertex information includes target vertex coordinates; the method further comprises the following steps:

determining a target polygon area formed by the target vertex coordinates;

determining the intersection ratio between the area of the target polygonal area and the area of a preset area;

when the intersection ratio is larger than or equal to a threshold value, segmenting a target polygon area formed by the target vertex coordinates from the image to be processed;

and performing corresponding service processing based on the target polygon area.

A training apparatus for segmenting a network, the apparatus comprising:

the system comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring a sample image comprising a target geometric figure and determining an annotation geometric area corresponding to the target geometric figure based on the sample image;

the prediction module is used for carrying out target segmentation processing on the sample image through a segmentation network to be trained to obtain a prediction geometric area corresponding to the target geometric figure, and determining prediction vertex information corresponding to the target geometric figure based on image features in the target segmentation processing;

a region characteristic loss determining module, configured to determine a corresponding region characteristic loss according to the predicted geometric region and the labeled geometric region;

a geometric feature loss determining module, configured to determine a corresponding geometric feature loss according to the predicted geometric region and the predicted polygon region determined by the predicted vertex information;

a construction module for constructing a target loss function based on the regional characteristic loss and the geometric characteristic loss;

the training module is used for training the segmentation network to be trained through the target loss function until a training stopping condition is reached, and obtaining a trained target segmentation network; the target segmentation network is used for segmenting a target geometric figure from an image to be processed.

A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:

A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:

According to the training method and device for the segmentation network, the computer equipment and the storage medium, the segmentation network to be trained is used for carrying out target segmentation processing on the sample image comprising the target geometric figure, and the predicted geometric area corresponding to the target geometric figure predicted by the segmentation network can be obtained. Based on the image features in the target segmentation process, prediction vertex information of the target geometry predicted by the segmentation network in the sample image can be obtained. And constructing an objective loss function according to the area characteristic loss between the prediction geometric area and the labeling geometric area and the geometric characteristic loss between the prediction geometric area and the prediction polygon area determined by the prediction vertex information, so that the objective loss function comprises multi-aspect loss characteristics. The segmented network to be trained is trained based on losses in various aspects, and the influence of the losses in various aspects on the identification and segmentation of the segmented network can be fully considered, so that the identification and segmentation accuracy of the segmented network can be improved through training. The target geometric figure can be accurately identified and segmented from the image through the trained target segmentation network. And the vertex information of the target geometric figure in the image can be accurately output, so that the target geometric figure in the image can be accurately positioned.

Drawings

FIG. 1 is a diagram of an embodiment of an application environment of a training method for segmenting a network;

FIG. 2 is a schematic flow chart diagram illustrating a training method for segmenting a network in one embodiment;

FIG. 3 is a flow diagram illustrating the steps for determining predicted vertex information corresponding to a target geometry in one embodiment;

FIG. 4 is a flowchart illustrating the steps of determining a loss of a feature of a corresponding region based on a predicted geometric region and an annotated geometric region according to one embodiment;

FIG. 5 is a flowchart illustrating the steps of determining a first color value corresponding to a predicted geometric region and a second color value corresponding to an annotated geometric region according to one embodiment;

FIG. 6 is a flowchart illustrating the steps for determining a corresponding geometric feature penalty based on the predicted geometric region and the predicted polygon region determined from the predicted vertex information, in one embodiment;

FIG. 7 is a flow diagram illustrating the testing of a split network in one embodiment;

FIG. 8 is a training architecture diagram of a split network in one embodiment;

FIG. 9 is a flow diagram illustrating an application of a split network in one embodiment;

FIG. 10 is a block diagram of a training apparatus for segmenting a network in one embodiment;

FIG. 11 is a diagram illustrating an internal structure of a computer device in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The present application relates to the field of Artificial Intelligence (AI) technology, which is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making. The scheme provided by the embodiment of the application relates to a training method of an artificial intelligence segmentation network, and is specifically explained by the following embodiments.

The training method of the segmented network provided by the application can be applied to a training system of the segmented network shown in fig. 1. As shown in fig. 1, the split network training system includes a terminal 110 and a server 120. In one embodiment, the terminal 110 and the server 120 may each separately perform the training method for segmenting the network provided in the embodiment of the present application. The terminal 110 and the server 120 may also be cooperatively used to perform the training method for segmenting the network provided in the embodiment of the present application. When the terminal 110 and the server 120 are cooperatively used to execute the training method for segmenting the network provided in the embodiment of the present application, the terminal 110 obtains a sample image including a target geometric figure, and determines an annotated geometric region corresponding to the target geometric figure based on the sample image. The terminal 110 sends the sample image and the corresponding annotated geometric region to the server 120. The server 120 performs target segmentation processing on the sample image through a segmentation network to be trained to obtain a predicted geometric region corresponding to the target geometric figure, and determines predicted vertex information corresponding to the target geometric figure based on image features in the target segmentation processing. The server 120 determines the corresponding area feature loss according to the predicted geometric area and the labeled geometric area. The server 120 determines the corresponding geometric feature loss based on the predicted geometric region and the predicted polygon region determined by the predicted vertex information. The server 120 constructs an objective loss function based on the regional characteristic loss and the geometric characteristic loss. The server 120 trains the segmentation network to be trained through the target loss function until the training stopping condition is reached, and a trained target segmentation network is obtained; the target segmentation network is used for segmenting a target geometric figure from an image to be processed.

The terminal 110 uploads the image to be processed to the server 120, and the server 120 performs target segmentation processing on the image to be processed through a trained segmentation network to obtain a target geometric figure and target vertex information in the image to be processed. The server 120 returns the target geometry and target vertex information to the terminal 110. The sample image and the image to be processed are shown at 112 in fig. 1 and the target geometry is shown at 114 in fig. 1.

The terminal 110 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server 120 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud computing services. The terminal 110 and the server 120 may be directly or indirectly connected through wired or wireless communication, and the application is not limited thereto.

In an embodiment, as shown in fig. 2, a training method for a split network is provided, which is described by taking an example that the method is applied to a computer device (the computer device may specifically be a terminal or a server in fig. 1), and includes the following steps:

step S202, a sample image including the target geometric figure is obtained, and an annotation geometric area corresponding to the target geometric figure is determined based on the sample image.

The target geometric figure refers to a vector diagram formed by external contour lines of a target area in the sample image, and the target area refers to an area which is interested by a user. For example, but not limited to, the target geometry is rectangular, trapezoidal, circular, elliptical, polygonal, and documents, etc. in the shape of geometric figures. The sample image can be an image containing the identity document, the target area is the area where the identity document is located, and the target geometric figure is a figure formed by the outer contour lines of the identity document. The identity document is used for representing the identity of a user, such as a resident identity card, a hong Kong and Macau pass or a temporary residence card corresponding to different regions.

The labeling geometric area refers to an area where a target geometric figure is labeled in advance. The labeling geometric region may be a region where a target geometric figure labeled in advance in the sample image is located, or may be a mask map corresponding to the sample image labeled in advance. The mask image is an image filter template used for identifying the target geometric figure in the image, and can shield other parts of the image and screen out the target geometric figure in the image. For example, a sample image is binarized to represent each pixel point in a region where a target geometric figure is located in the sample image by 1, and each pixel point in a region where a non-target geometric figure is located by 0, so as to obtain a mask image. The region formed by each pixel point represented by 1 is the labeling geometric region.

Specifically, the computer device obtains a sample image including the target geometry, and the target geometry is labeled in advance in the sample image. Or, the computer device acquires a sample image including the target geometric figure and acquires a corresponding mask map, wherein the mask map marks the region where the target geometric figure is located.

For example, the sample image is an image including a target certificate, an area where the target certificate is located in the sample image is marked in advance, and the area where the target certificate is located is the marked geometric area. Or, carrying out binarization processing on the image containing the target certificate to obtain a corresponding mask image, wherein pixel points of the region where the target certificate is located in the mask image are represented by 1, and pixel points of the region where the non-target certificate is located are represented by 0, so that the region where the target certificate is located in the mask image is used as a labeling geometric region.

Step S204, carrying out target segmentation processing on the sample image through a segmentation network to be trained to obtain a prediction geometric area corresponding to the target geometric figure, and determining prediction vertex information corresponding to the target geometric figure based on image characteristics in the target segmentation processing.

The prediction vertex information is a key point which is predicted by the segmentation network and constitutes an outer contour of the target geometric figure in the sample image. The predicted vertex information may include a vertex position, which may include vertex coordinates, and may also include a vertex number, vertex pixels, and the like. The predicted vertex information enables determination of the shape and size of the graph.

Specifically, when the labeled geometric region is included in the sample image, the computer device inputs the sample image into the segmentation network to be trained. When the target geometric figure is not labeled in the sample image, the computer equipment inputs the sample image comprising the target geometric figure and the labeled geometric area corresponding to the target geometric figure into the segmentation network to be trained.

And the segmentation network to be trained performs feature extraction on the sample image to obtain a corresponding feature map, and predicts the region of the target geometric figure in the sample image based on the feature map to obtain a predicted geometric region. Further, the segmentation network to be trained segments the predicted geometric region from the sample image.

And predicting the vertex information of the external contour of the target geometric figure formed in the sample image by the segmentation network to be trained based on the extracted characteristic diagram to obtain predicted vertex information.

And step S206, determining corresponding area characteristic loss according to the predicted geometric area and the labeled geometric area.

Wherein the region feature loss comprises at least one of a region color loss between the prediction geometric region and the labeling geometric region, and a region segmentation loss between the prediction geometric region and the labeling geometric region.

Specifically, the computer device may determine a color of the predicted geometric region and a color of the labeled geometric region, and determine a region color loss between the predicted geometric region and the labeled geometric region based on the color of the predicted geometric region and the color of the labeled geometric region. The computer device treats the region color loss as a region feature loss between the predicted geometric region and the annotated geometric region.

In one embodiment, the computer device may obtain a first region area corresponding to the predicted geometric region and a second region area corresponding to the labeled geometric region, and calculate a region segmentation loss between the predicted geometric region and the labeled geometric region based on the first region area and the second region area. The computer device treats the region segmentation loss as a region feature loss between the predicted geometric region and the annotated geometric region.

In one embodiment, the computer device treats a region color loss and a region segmentation loss between the predicted geometric region and the labeled geometric region as a region feature loss between the predicted geometric region and the labeled geometric region.

Step S208, according to the predicted geometric area and the predicted polygonal area determined by the predicted vertex information, determining the corresponding geometric feature loss.

Wherein the geometric feature loss comprises at least one of a geometric area loss between the predicted geometric region and the predicted polygon region, and a geometric centroid loss between the predicted geometric region and the predicted polygon region.

Specifically, the computer device predicts a polygonal region determined in the sample image based on the prediction vertex information. The computer device may determine a third region area corresponding to the predicted polygon region, and calculate a geometric area penalty between the predicted geometric region and the predicted polygon region based on the first region area and the third region area. The computer device may treat the geometric area loss as a geometric feature loss between the predicted geometric region and the predicted polygon region.

In one embodiment, the computer device may determine a second barycentric location corresponding to the predicted polygonal region based on the predicted vertex information, and determine a first barycentric location of the predicted geometric region. The computer device determines a geometric barycentric loss between the predicted geometric region and the predicted polygonal region based on the second barycentric location and the first barycentric location. The computer device may treat the geometric centroid loss as a geometric feature loss between the predicted geometric region and the predicted polygon region.

In one embodiment, the computer device may predict a geometric area loss and a geometric centroid loss between the geometric region and the predicted polygon region, and a geometric feature loss between the geometric region and the predicted polygon region.

And step S210, constructing an objective loss function based on the regional characteristic loss and the geometric characteristic loss.

Specifically, the computer device may obtain weights corresponding to the regional characteristic loss and the geometric characteristic loss, respectively, and construct the target loss function according to the regional characteristic loss, the geometric characteristic loss, and the corresponding weights.

In one embodiment, the computer device performs weighted calculation on the regional characteristic loss and the geometric characteristic loss to obtain an objective loss function. Alternatively, the computer device may also multiply or add logarithms to the regional characteristic loss and the geometric characteristic loss, or perform other mathematical operations, to obtain the target loss function.

Step S212, training the segmentation network to be trained through a target loss function until the training stopping condition is reached, and obtaining a trained target segmentation network; the target segmentation network is used for segmenting a target geometric figure from an image to be processed.

Specifically, the computer device may train the segmentation network to be trained through the target loss function, adjust parameters of the segmentation network in the training process, and continue training until the segmentation network meets the training stop condition, so as to obtain the trained target segmentation network.

In this embodiment, the training stopping condition may be at least one of that the loss value of the segmented network is less than or equal to the loss threshold, that the preset iteration number is reached, that the preset iteration time is reached, that the segmentation performance of the network reaches the preset index, and the like.

For example, the loss value generated in each training is calculated through the target loss function, the parameters of the segmentation network are adjusted based on the difference between the loss value and the loss threshold value, and the training is continued until the training is stopped, so that the trained target segmentation network is obtained.

And the terminal calculates the iteration times of the segmentation network in the training process, and stops training when the iteration times of the terminal in the training process reach the preset iteration times to obtain the trained segmentation network.

In one embodiment, each training of the segmented network may use a preset number of sample images, for example, 32 sample images. During training, the computer device may update parameters of the segmented network based on Adam's gradient descent method, with the initial learning rate set to 0.05 and beta in Adam ═ 0.95, 0.9995. The classification of each pixel can be predicted through the segmentation network, and a predicted geometric area is obtained, wherein the size of the predicted geometric area is the same as that of the target geometric figure. And the segmentation network can output the predicted vertex information corresponding to the target geometric figure. And calculating an error gradient based on the target loss function and updating the gradient of the segmentation network through back propagation so as to obtain the trained target segmentation network.

In the training method of the segmentation network, the segmentation network to be trained is used for carrying out target segmentation processing on the sample image comprising the target geometric figure, so that a predicted geometric area corresponding to the target geometric figure predicted by the segmentation network can be obtained. Based on the image features in the target segmentation process, prediction vertex information of the target geometry predicted by the segmentation network in the sample image can be obtained. And constructing an objective loss function according to the area characteristic loss between the prediction geometric area and the labeling geometric area and the geometric characteristic loss between the prediction geometric area and the prediction polygon area determined by the prediction vertex information, so that the objective loss function comprises multi-aspect loss characteristics. The segmented network to be trained is trained based on losses in various aspects, and the influence of the losses in various aspects on the identification and segmentation of the segmented network can be fully considered, so that the identification and segmentation accuracy of the segmented network can be improved through training. The target geometric figure can be accurately identified and segmented from the image through the trained target segmentation network. And the vertex information of the target geometric figure in the image can be accurately output, so that the target geometric figure in the image can be accurately positioned.

In one embodiment, as shown in fig. 3, performing a target segmentation process on a sample image through a segmentation network to be trained to obtain a predicted geometric region corresponding to a target geometry, and determining predicted vertex information corresponding to the target geometry based on image features in the target segmentation process includes:

step S302, sampling processing is carried out on the sample image, and a corresponding sample image is obtained.

The sampling process comprises an up-sampling process and a down-sampling process. The upsampling process is a process of interpolating an image to enlarge a sample image, and the enlarged image has higher resolution. Downsampling, also known as downsampling, refers to scaling an image to obtain a desired image resolution. The sample sampling image refers to an image obtained by an up-sampling process or a down-sampling process, and includes a sample up-sampling image and a sample down-sampling image. When the sample image is subjected to the down-sampling process, the sample sampled image is a sample down-sampling image, and when the sample image is subjected to the up-sampling process, the sample sampled image is a sample up-sampling image.

Specifically, the computer device performs downsampling processing on the sample image to obtain a corresponding sample downsampled image. Or the computer equipment performs up-sampling processing on the sample image to obtain a corresponding sample up-sampled image.

In one embodiment, the computer device may perform sampling processing on the sample image through the segmentation network to be trained to obtain a corresponding sample sampling image. Further, the up-sampling processing or the down-sampling processing is carried out on the sample image through the segmentation network to be trained, and a corresponding sample up-sampling image or a corresponding sample down-sampling image is obtained.

Step S304, respectively carrying out feature extraction on the sample image and the sample sampling image through a segmentation network to be trained to obtain a first feature map corresponding to the sample image and a second feature map corresponding to the sample sampling image.

Specifically, the segmentation network to be trained performs multi-layer feature extraction on the sample image to obtain a first feature map corresponding to the sample image. Further, the segmentation network to be trained performs feature extraction on the sample image through the multilayer convolution layers, and the first feature map output by the previous convolution layer is used as the input of the next convolution layer, so that the first feature map output by each convolution layer is obtained.

And the segmentation network to be trained performs multi-layer feature extraction on the sample sampling image to obtain a second feature map corresponding to the sample sampling image. Further, the segmentation network to be trained performs feature extraction on the sample sampling image through the multilayer convolution layers, and the second feature map output by the previous convolution layer is used as the input of the next convolution layer, so that the second feature map output by each convolution layer is obtained.

In one embodiment, the segmentation network to be trained includes two encoders, each of which includes a plurality of convolutional layers. And respectively inputting the sample image and the sample sampling image into two encoders by the segmentation network to be trained, and respectively extracting the characteristics of the sample image and the sample sampling image to obtain a corresponding first characteristic diagram and a corresponding second characteristic diagram.

And step S306, carrying out fusion processing on the first feature map and the second feature map, and obtaining a predicted geometric area corresponding to the target geometric figure based on the sample fusion feature map after the fusion processing.

Specifically, the segmentation network to be trained performs fusion processing on the first feature map and the second feature map to obtain a sample fusion feature map. And carrying out 1 × 1 convolution processing on the sample fusion feature map by the segmentation network to be trained, and carrying out sampling processing on the feature map subjected to the 1 × 1 convolution processing to obtain a prediction geometric region in the sample image.

In this embodiment, the process of fusing the first feature map and the second feature map may be a process of splicing or superimposing the first feature map and the second feature map. The splicing processing refers to splicing the first characteristic diagram and the second characteristic diagram to obtain a characteristic diagram. The superposition processing refers to summing and averaging the gray values of corresponding pixel points in the first characteristic diagram and the second characteristic diagram to obtain a characteristic diagram.

It is to be understood that the sampling process after the 1 × 1 convolution process is opposite to the sampling process when the sample image is obtained, for example, when the sample image is downsampled to obtain the sample image, the sampling process after the 1 × 1 convolution process is an upsampling process. And (3) sampling the sample image to obtain a sample sampled image, wherein the sampling process after 1 × 1 convolution processing is downsampling processing.

And step S308, performing convolution and full-connection processing on the second feature graph to obtain prediction vertex information corresponding to the target geometric figure.

Specifically, the segmentation network to be trained performs 1 × 1 convolution processing on the second feature map output by the last convolution layer, and outputs the feature map obtained by the 1 × 1 convolution processing to the full connection layer. And carrying out full connection processing on the feature map through a full connection layer to obtain the corresponding predicted vertex information of the target geometric figure in the sample sampling image.

In one embodiment, the segmentation network to be trained may perform convolution and full join processing on the first feature map to obtain predicted vertex information corresponding to the target geometry.

In one embodiment, the predicted vertex information includes predicted vertex coordinates. And the segmentation network to be trained performs convolution and full-connection processing on the first feature map or the second feature map output by the last convolution layer to obtain a predicted vertex coordinate corresponding to the target geometric figure.

In this embodiment, a sample image is obtained by sampling the sample image, and feature extraction is performed on the sample image and the sample image through a segmentation network to be trained, so that features of images with different resolutions can be obtained. The fusion processing is carried out based on the extracted first characteristic diagram and the second characteristic diagram, and the prediction geometric area where the target geometric figure is located in the sample image can be more accurately predicted based on the image characteristics with different resolutions. And based on the full-connection processing of the second characteristic diagram, outputting the predicted vertex information corresponding to the target geometric figure, and training the segmentation network based on the predicted geometric area and the predicted vertex information obtained by different processing.

In one embodiment, as shown in fig. 4, determining the corresponding region feature loss according to the predicted geometric region and the labeled geometric region includes:

step S402, determining a first color value corresponding to the predicted geometric region and a second color value corresponding to the labeled geometric region.

Specifically, the computer device may determine each pixel point in the prediction geometric region, and obtain a first gray value corresponding to each pixel point. And the computer equipment acquires the color value corresponding to each pixel point, and calculates the first color value corresponding to the predicted geometric area based on the color value corresponding to each pixel point and the first gray value.

And the computer equipment acquires second gray values respectively corresponding to all the pixel points in the labeling geometric region and acquires color values respectively corresponding to all the pixel points. And the computer equipment acquires the color value corresponding to each pixel point, and calculates a second color value corresponding to the labeling geometric area based on the color value corresponding to each pixel point and the second gray value.

Step S404, determining a region color loss between the predicted geometric region and the annotated geometric region based on the difference between the first color value and the second color value.

Wherein the difference between the first color value and the second color value may be characterized by a contrast value between the first color value and the second color value, which may be at least one of a difference, an absolute value of a difference, a logarithm of a difference, a ratio, and a percentage.

Specifically, the computer device calculates a difference between the first color value and the second color value, and determines the difference between the first color value and the second color value as a regional color loss between the predicted geometric region and the annotated geometric region. Further, the difference between the first color value and the second color value may be a contrast value between the first color value and the second color value.

In one embodiment, the contrast value refers to a difference between the first color value and the second color value. The computer device calculates a difference between the first color value and the second color value as a regional color loss between the predicted geometric region and the annotated geometric region.

In one embodiment, the contrast value refers to an absolute value of a difference between the first color value and the second color value. The computer device calculates a difference between the first color value and the second color value, and uses an absolute value of the difference as a region color loss between the predicted geometric region and the annotated geometric region. The regional color loss between the predicted geometric region and the labeled geometric region is calculated, for example, by the following formula:

Loss₃＝|C₀-C| (1)

therein, Loss₁Is the regional color loss. C₀The second color value corresponding to the labeled geometric area, and the first color value corresponding to the predicted geometric area.

In one embodiment, the contrast value refers to a ratio between the first color value and the second color value. The computer device calculates a ratio between the first color value and the second color value as a regional color loss between the predicted geometric region and the annotated geometric region.

Step S406, determining a first region area corresponding to the predicted geometric region and a second region area corresponding to the labeled geometric region.

Specifically, the computer device may obtain first gray values corresponding to the respective pixel points in the predicted geometric region, and calculate a first region area corresponding to the predicted geometric region based on the first gray values corresponding to the respective pixel points.

And the computer equipment acquires second gray values respectively corresponding to all the pixel points in the labeling geometric area, and calculates the area of the second area corresponding to the labeling geometric area based on the second gray values respectively corresponding to all the pixel points.

Step S408, determining the region segmentation loss between the prediction geometric region and the labeling geometric region based on the first region area and the second region area.

Specifically, the computer device may determine an area product of the first region area and the second region area, and a sum of the areas of the first region area and the second region area, and determine a region segmentation penalty between the predicted geometric region and the annotated geometric region based on the area product and the sum of the areas. Further, the computer device may use a ratio of the area product to the sum of the areas as a region segmentation penalty between the predicted geometric region and the annotated geometric region.

In one embodiment, the computer device may use a ratio of the product of the areas of the preset multiple to the sum of the areas as a region segmentation penalty between the predicted geometric region and the annotated geometric region. For example, the area division penalty is obtained by dividing the product of the two times the area by the sum of the areas.

The region segmentation penalty between the predicted geometric region and the annotated geometric region is calculated, for example, by the following formula:

wherein, GT_i,j(x) And marking the gray value of the pixel point of the ith row and the jth column of the geometric area. Sigma_i,jGT_i,j(x) Is the second region area of the labeled geometric region. Sigma_i,jI_i,j(x) Is the first region area of the predicted geometric region.

And step S410, determining corresponding regional characteristic loss according to the regional color loss and the regional segmentation loss.

Specifically, the computer device may obtain weights corresponding to the region color loss and the region segmentation loss, and perform weighted summation processing on the region color loss and the region segmentation loss to obtain corresponding region characteristic losses.

In one embodiment, the computer device may sum the regional color loss and the regional segmentation loss to obtain the corresponding regional feature loss.

In this embodiment, the color loss of the region is determined based on the color difference between the predicted geometric region and the labeled geometric region, and the color loss between the geometric region and the real geometric region predicted by the segmentation network can be determined. Based on the area of the predicted geometric region and the area of the labeled geometric region, the segmentation difference between the geometric region predicted by the segmentation network and the real geometric region can be determined, so that the color loss and the area segmentation loss are used as conditions for training the segmentation network, and the trained segmentation network has higher accuracy and segmentation precision.

In one embodiment, as shown in fig. 5, the determining a first color value corresponding to the predicted geometric region and a second color value corresponding to the labeled geometric region includes:

step S502, obtaining the value of the color channel corresponding to each pixel point in the prediction geometric area, and obtaining the value of the color channel corresponding to each pixel point in the labeling geometric area.

Wherein, the color channel refers to a red channel, a green channel and a blue channel of the image. The red channel, i.e., R (Red), the green channel, i.e., G (Green), and the blue channel, i.e., B (blue).

Specifically, the computer device may obtain each pixel point in the prediction geometric region, and for each pixel point, the computer device may obtain a value of a red channel, a value of a green channel, and a value of a blue channel corresponding to the pixel point.

The computer equipment can obtain each pixel point in the labeling geometric area, and for each pixel point, the computer equipment can obtain the value of the red channel, the value of the green channel and the value of the blue channel corresponding to the pixel point.

Step S504, a first color value corresponding to the predicted geometric region is determined according to the value of the color channel corresponding to each pixel point in the predicted geometric region, the first gray value corresponding to the corresponding pixel point and the first region area.

Specifically, the computer device obtains a first gray value corresponding to each pixel point in the prediction geometric region and a first region area corresponding to the prediction geometric region. And aiming at each pixel point, the computer equipment respectively calculates the product of the value of each color channel and the first gray value of the corresponding pixel point to obtain the first product respectively corresponding to each color channel.

The computer device calculates the sum of the first products of the same color channel, i.e., calculates the sum of the first products corresponding to each red channel, the sum of the first products corresponding to each green channel, and the sum of the first products corresponding to each blue channel. And the computer equipment calculates a first ratio of the sum of the first products of the same color channel to the area of the first region, and uses the first ratio as the color value of the color channel, so as to obtain the color value corresponding to the red channel, the color value corresponding to the green channel and the color value corresponding to the blue channel.

And the computer equipment calculates a first color value corresponding to the prediction geometric area according to the color value corresponding to each color channel. Further, the computer device sums and averages the color values corresponding to each color channel, and the average value is used as the first color value corresponding to the predicted geometric area.

For example, the computer device may calculate a first color value corresponding to the predicted geometric region according to the following formula,

wherein, C is the first color value corresponding to the predicted geometric area. R_i,j(x) For predicting the value of pixel point in ith row and jth column in red channel, G_i,j(x) For predicting the value of the pixel point of the ith row and the jth column in the green channel in the geometric region, B_i,j(x) The value of the pixel point in the ith row and the jth column in the blue channel in the geometric region is predicted. I is_i,j(x) The gray value corresponding to the pixel point of the ith row and the jth column in the predicted geometric area is obtained. Sigma_i,jI_i,j(x) The area of the first region corresponding to the predicted geometric region. color _ R₁To predict the value of a pixel in the geometric region in the red channel, color _ G₁To predict the value of a pixel in the geometric region in the green channel, color _ B₁To predict the value of the pixel points in the geometric region in the blue channel.

Step S506, determining a second color value corresponding to the geometric labeling area according to the color channel value corresponding to each pixel point in the geometric labeling area, the second gray value corresponding to the corresponding pixel point and the second area.

Specifically, the computer device obtains a second gray value corresponding to each pixel point in the labeling geometric region and a second region area corresponding to the labeling geometric region. And aiming at each pixel point, the computer equipment respectively calculates the product of the value of each color channel and the second gray value of the corresponding pixel point to obtain a second product respectively corresponding to each color channel.

The computer device calculates the sum of the second products of the same color channel, i.e., calculates the sum of the second products corresponding to each red channel, the sum of the second products corresponding to each green channel, and the sum of the second products corresponding to each blue channel. And calculating a second ratio of the sum of the second products of the same color channel to the area of the second region by the computer equipment, and taking the second ratio as the color value of the color channel so as to obtain the color value corresponding to the red channel, the color value corresponding to the green channel and the color value corresponding to the blue channel.

And the computer equipment calculates a second color value corresponding to the labeling geometric area according to the color value corresponding to each color channel. Further, the computer device sums the color values corresponding to each color channel and takes the average value, and the average value is used as the second color value corresponding to the labeling geometric area.

For example, the computer device can calculate the second color value corresponding to the labeling geometric region according to the following formula,

and C is a second color value corresponding to the labeling geometric area. R_i,j(y) is the value of the pixel point in the ith row and the jth column in the labeled geometric region in the red channel, G_i,j(y) is the value of the pixel point in the ith row and the jth column in the labeling geometric region in the green channel, B_i,jAnd (y) is the value of the pixel point in the ith row and the jth column in the labeling geometric area in the blue channel. GT system_i,j(x) And marking the gray value corresponding to the pixel point of the ith row and the jth column in the geometric region. Sigma_i,jGT_i,j(x) The area of the second region corresponding to the geometric region is marked. color _ R₂For labeling the value of the pixel point in the geometric region in the red channel, color _ G₂For labeling the value of the pixel point in the geometric region in the green channel, color _ B₂And marking the value of the pixel point in the geometric region in the blue channel.

In this embodiment, based on the values of the color channels corresponding to the respective pixel points in the predicted geometric region, the average color value of the predicted geometric region can be accurately calculated according to the first gray value and the first region area corresponding to the corresponding pixel point. Based on the value of the color channel corresponding to each pixel point in the labeling geometric region, the second gray value and the second region area corresponding to the corresponding pixel point, the average color value of the labeling geometric region can be accurately calculated. Based on the difference between the average color value of the predicted geometric area and the average color value of the labeled geometric area, the color loss between the predicted geometric area and the real geometric area predicted by the segmentation network can be accurately determined, so that the color loss is used as a training condition of the segmentation network, and the precision of the segmentation network can be improved.

In one embodiment, determining a first region area corresponding to the predicted geometric region and a second region area corresponding to the labeled geometric region includes:

acquiring first gray values corresponding to all pixel points in the prediction geometric region respectively, and acquiring second gray values corresponding to all pixel points in the labeling geometric region respectively; taking the sum of the first gray values of all the pixel points in the predicted geometric area as the first area of the predicted geometric area; and taking the sum of the second gray values of all the pixel points in the labeling geometric area as the area of the second area of the labeling geometric area.

Specifically, the computer device may obtain first gray values corresponding to each pixel point in the predicted geometric region, sum the first gray values corresponding to each pixel point, and use the sum of the first gray values of each pixel point as the first region area of the predicted geometric region.

The computer equipment can obtain second gray values respectively corresponding to all the pixel points in the predicted geometric area, sum the second gray values respectively corresponding to all the pixel points, and use the sum of the second gray values of all the pixel points as the area of the second area of the labeled geometric area.

For example, the computer device may calculate a first region area of the predicted geometric region and a second region area of the labeled geometric region according to the following formulas:

S₁＝∑_i,jI_i,j(x) (11)

wherein S is₁Is an area, such as a first area, a second area. I is_i,j(x) And marking the gray value of the pixel point in the ith row and the jth column in the geometric region for the gray value of the pixel point in the ith row and the jth column, such as the gray value of the pixel point in the ith row and the jth column in the predicted geometric region.

In this embodiment, the sum of the gray values corresponding to the pixels is used as the area, so that the area formed by the pixels in the predicted geometric region can be accurately calculated, and the area formed by the pixels in the labeled geometric region can be accurately calculated.

In one embodiment, as shown in fig. 6, determining a corresponding geometric feature loss according to the predicted geometric region and the predicted polygon region determined by the predicted vertex information includes:

in step S602, a first region area of the predicted geometric region is determined, and a third region area of the predicted polygonal region determined by the predicted vertex information is calculated.

Specifically, the computer device obtains first gray values corresponding to all pixel points in the prediction geometric area, and the sum of the first gray values of all the pixel points in the prediction geometric area is used as the first area of the prediction geometric area.

The computer device may obtain predicted vertex coordinates in the predicted vertex information, and determine a predicted polygon determined by the predicted vertex coordinates in the sample image. The computer device calculates a third area of the predicted polygon area based on the predicted vertex coordinates. For example, the computer device may calculate a third area of the predicted polygon area according to the Shoelace theorem (shoespace formula):

wherein S is₂Is the area of the third region, [ P ]₁₁,P₁₂],[P₂₁,P₂₂],[P₃₁,P₃₂]，[P₄₁,P₄₂]To predict the vertex coordinates.

Step S604, determining a geometric area loss according to a difference between the first region area and the third region area.

Wherein the difference between the first region area and the third region area may be characterized by an area contrast value between the first region area and the third region area, which may be at least one of a difference, an absolute value of the difference, a logarithm of the difference, a ratio, a percentage.

Specifically, the computer device calculates a difference between the first region area and the third region area, and determines the difference between the first region area and the third region area as a region color loss between the prediction geometric region and the prediction polygonal region. Further, the difference between the first region area and the third region area may be an area contrast value between the first region area and the third region area.

In one embodiment, the area contrast value refers to the difference between the area of the first region and the area of the third region. The computer device calculates a difference between the first region area and the third region area as a geometric area loss between the predicted geometric region and the predicted polygon region.

In one embodiment, the area contrast value refers to an absolute value of a difference between the first region area and the third region area. The computer device calculates a difference between the area of the first region and the area of the third region, and takes an absolute value of the difference as a geometric area loss between the predicted geometric region and the predicted polygonal region. The geometric area loss between the predicted geometric region and the predicted polygon region is calculated, for example, by the following formula:

Loss₁＝|S₁-S₂| (13)

therein, Loss₁For the loss of geometric area between the predicted geometric region and the predicted polygonal region, S₁Is the area of the first region, S₂Is the third area.

In one embodiment, the area contrast value refers to a ratio between the area of the first region and the area of the third region. The computer device calculates a ratio between the area of the first region and the area of the third region as a geometric area loss between the predicted geometric region and the predicted polygon region.

In step S606, a first barycentric position corresponding to the predicted geometric region and a second barycentric position corresponding to the predicted polygonal region are determined.

Specifically, the computer device obtains first gray values corresponding to all pixel points in the predicted geometric area respectively, and determines a first gravity center position of the predicted geometric area according to all the first gray values. The computer device determines a second barycentric location of the predicted polygon area based on the predicted vertex information.

Step S608, determining the geometric gravity center loss according to the distance between the first gravity center position and the second gravity center position.

Specifically, the computer device calculates a distance between a first barycentric position and a second barycentric coordinate position as a geometric barycentric loss between the predicted geometric region and the predicted polygonal region.

In one embodiment, the computer device may obtain weights corresponding to the first barycentric location and the second barycentric location, and perform weighted summation processing on the first barycentric location, the second barycentric location, and the corresponding weights to obtain the geometric barycentric loss.

And step S610, determining geometric feature loss based on the geometric area loss and the geometric barycentric loss.

Specifically, the computer device may obtain weights corresponding to the geometric area loss and the geometric gravity center loss, respectively, and perform weighted summation processing on the geometric area loss and the geometric gravity center loss to obtain corresponding geometric feature loss.

In one embodiment, the computer device may sum the geometric area loss and the geometric centroid loss to obtain a corresponding geometric feature loss.

In this embodiment, the geometric area loss is determined based on the area difference between the predicted geometric area and the predicted polygonal area, and the area loss between the geometric area and the polygonal area predicted by the segmentation network can be determined, where a large area loss indicates that the prediction and the segmentation of the segmentation network are inaccurate. Based on the gravity center position of the predicted geometric region and the gravity center position of the predicted polygonal region, gravity center loss between the geometric region predicted by the segmentation network and the predicted polygonal region can be determined, and the fact that the gravity center loss is large indicates that the prediction and segmentation of the segmentation network are inaccurate. Therefore, the area loss and the gravity center loss are used as conditions for training the segmentation network, and the trained segmentation network has higher accuracy and segmentation precision.

In one embodiment, the first center of gravity position comprises a first center of gravity coordinate and the second center of gravity position comprises a second center of gravity coordinate; determining a geometric center of gravity loss based on a distance between the first center of gravity position and the second center of gravity position, comprising:

determining a first barycentric coordinate of the predicted geometric area according to a first gray value corresponding to each pixel point in the predicted geometric area; determining a second center of gravity coordinate of a prediction polygon region formed by the prediction vertex coordinates based on the prediction vertex coordinates in the prediction vertex information;

determining a geometric center of gravity loss based on a distance between the first center of gravity position and the second center of gravity position, comprising: a geometric barycentric loss between the predicted geometric region and the predicted polygonal region is determined based on a distance between the first barycentric coordinate and the second barycentric coordinate.

Specifically, the computer device determines the abscissa and the ordinate of the first barycentric coordinate respectively according to the first gray value corresponding to each pixel point in the predicted geometric area, so that the first barycentric coordinate of the predicted geometric area is obtained according to the abscissa and the ordinate.

The computer device acquires each predicted vertex coordinate in the predicted vertex information, averages the sum of abscissas of each predicted vertex coordinate, and takes the average of the sum of the abscissas as the abscissa of the second center of gravity coordinate of the predicted polygon region. The computer device averages the sum of the vertical coordinates of the respective predicted vertex coordinates, and takes the average of the sum of the vertical coordinates as the horizontal coordinate of the second centroid coordinate of the predicted polygon area. And the computer equipment obtains a second barycentric coordinate of the measured polygonal area according to the abscissa and the ordinate. For example, the computer device may calculate the second centroid coordinate according to the following formula:

wherein [ P ]₁₁,P₁₂],[P₂₁,P₂₂],[P₃₁,P₃₂]，[P₄₁,P₄₂]To predict vertex coordinates, (x)₂,y₂) Is the second centroid coordinate.

The computer device calculates a distance between an abscissa in the first barycentric coordinate and an abscissa in the second barycentric coordinate, calculates a distance between an ordinate in the first barycentric coordinate and an ordinate in the second barycentric coordinate, and determines a geometric barycentric loss between the predicted geometric region and the predicted polygonal region based on the distance between the abscissas and the distance between the ordinates. Further, the computer device takes the sum of the square of the distance between the abscissas and the square of the distance between the ordinates as the geometric barycentric loss between the predicted geometric region and the predicted polygonal region. For example, the computer device may calculate the geometric gravity center loss according to the following formula:

Loss₂＝(x₁-x₂)²+(y₁-y₂)² (15)

therein, Loss₂For geometric center of gravity loss, (x)₁,y₁) Is the first barycentric coordinate.

In one embodiment. The computer device may acquire weights corresponding to the first barycentric coordinate and the second barycentric coordinate, respectively, multiply the first barycentric coordinate by the corresponding weight, multiply the second barycentric coordinate by the corresponding weight, and sum up a product of the first barycentric coordinate and the weight and a product of the second barycentric coordinate and the weight as a geometric barycentric loss.

In this embodiment, based on the barycentric coordinates of the predicted geometric region and the barycentric coordinates of the predicted polygonal region, the barycentric loss between the geometric region predicted by the segmentation network and the predicted polygonal region can be determined, so that the barycentric difference between the prediction results corresponding to the two modes is determined when the same segmentation mesh predicts the target geometric figure by using the two modes, and the barycentric loss is used as a condition for training the segmentation network, so that the trained segmentation network has higher accuracy and segmentation precision. Moreover, the trained segmentation network can simultaneously have two prediction modes, namely a segmentation target geometric figure and a vertex coordinate of a prediction target geometric figure.

In one embodiment, determining a first barycentric coordinate of the prediction geometric region according to a first gray value corresponding to each pixel point in the prediction geometric region includes:

constructing a gray value matrix based on first gray values respectively corresponding to all pixel points in the prediction geometric region; taking the sum of the first gray values of each row in the gray value matrix and a contrast value of the first area of the prediction geometric area as an abscissa in the first barycentric coordinate of the prediction geometric area; and taking the sum of the first gray values of each column in the gray value matrix and a contrast value of the area of the first region of the prediction geometric region as a vertical coordinate in the first barycentric coordinate.

Wherein the contrast value includes a ratio, a difference, a percentage, and the like.

Specifically, the computer device obtains a first gray value corresponding to each pixel point in the prediction geometric area, and constructs a gray value matrix according to the first gray value. And the computer equipment calculates the first area of the predicted geometric area according to the first gray value respectively corresponding to each pixel point. The computer device may sum the first gray values of all rows in the gray value matrix, calculate a contrast value between the sum of the first gray values of all rows and the first region area of the prediction geometry, and take the contrast value as the abscissa in the first barycentric coordinate.

The computer device may sum the first gray values of all columns in the gray value matrix, calculate a contrast value between the sum of the first gray values of all columns and the first region area of the prediction geometry, and take the contrast value as the ordinate in the first barycentric coordinate. And obtaining a first barycentric coordinate of the prediction geometric area according to the abscissa and the ordinate.

In one embodiment, the contrast value is a ratio. The computer device calculates a ratio between the sum of the first gray values of all rows in the gray value matrix and the first region area of the prediction geometry, taking this ratio as the abscissa. The computer device calculates a ratio between the sum of the first gray values of all columns in the gray value matrix and the first region area of the prediction geometry, taking this ratio as the ordinate. For example, the computer device may calculate the first barycentric coordinate according to the following formula:

wherein x is₁Is the abscissa, y, of the first barycentric coordinate₁Is the ordinate of the first barycentric coordinate. i is a row in the grey value matrix and j is a column in the grey value matrix. Sigma_i,jI_i,j(x) Is the first region area of the predicted geometric region. I is_i,j(x) The gray value of the pixel point of the ith row and the jth column is the gray value of the ith row and the jth column.

In one embodiment, the contrast value is a difference value. The computer device calculates a ratio between the sum of the first gray values of all rows in the gray value matrix and the first region area of the prediction geometry, taking this ratio as the abscissa. The computer device calculates a ratio between the sum of the first gray values of all columns in the gray value matrix and the first region area of the prediction geometry, taking this ratio as the ordinate.

In this embodiment, the abscissa of the barycentric coordinate is taken as the sum of the first gray values of all rows in the gray value matrix and the contrast value of the area of the first region, and the ordinate is taken as the sum of the first gray values of all columns and the contrast value of the area of the first region, so that the barycentric coordinate of the prediction geometric region can be calculated based on the gray values of the pixels, and the calculation of the barycentric coordinate at the pixel level is realized.

FIG. 7 is a flow diagram illustrating testing of a split network in one embodiment. The computer equipment can collect sample videos containing the target geometric figures under different environments and acquire sample images containing the target geometric figures in each frame from the sample videos. And the computer inputs each frame of acquired sample image into the trained segmentation network to predict a predicted geometric region and predicted vertex information corresponding to the target geometric figure in each frame of sample image, and the segmentation network is tested through the predicted geometric region and the predicted vertex information.

FIG. 8 is a diagram of a training architecture for segmenting a network in one embodiment.

And the computer equipment performs downsampling processing on the sample image to obtain a corresponding sample sampling image. And inputting the sample images and the sample sampling images into a segmentation network to be trained, wherein the segmentation network to be trained can comprise segmentation branches and vertex prediction branches. The segmentation network to be trained performs convolution processing on the sample image through a plurality of convolution layers of the segmentation branch, for example, 4 convolution layers, so as to extract features, and obtain a first feature map output by each convolution layer.

The segmentation network to be trained performs convolution processing on the sample sampling image through a plurality of convolution layers of the vertex prediction branch, for example, 4 convolution layers, so as to extract features, and obtain a second feature map output by each convolution layer.

And performing 1 × 1 convolution processing on the first feature map output by the first convolution layer of the division branch, and performing up-sampling processing on the feature map after the convolution processing.

And starting from the second convolutional layer of the split branch, splicing the first feature map of the second convolutional layer and the second feature map of the first convolutional layer of the vertex prediction branch to obtain a sample fusion feature map until a sample fusion feature map output by the last convolutional layer of the split branch is obtained. And respectively carrying out 1 × 1 convolution processing on each sample fusion feature map, respectively carrying out up-sampling processing on the feature maps obtained after the 1 × 1 convolution processing, and fusing each feature map subjected to the up-sampling processing with the feature map subjected to the up-sampling processing corresponding to the first convolution layer to obtain a predicted geometric region in the sample image.

And performing 1 × 1 convolution processing on the second feature map output by the last convolution layer of the vertex prediction branch, and outputting the feature map obtained by the 1 × 1 convolution processing to the full-connection layer. And carrying out full connection processing on the feature map through a full connection layer to obtain the corresponding predicted vertex information of the target geometric figure in the sample sampling image.

And determining corresponding region color loss and region segmentation loss according to the predicted geometric region and the labeled geometric region. And determining corresponding geometric area loss and geometric barycentric loss according to the predicted geometric region and the predicted polygonal region determined by the predicted vertex information.

And constructing an objective loss function based on the region color loss, the region segmentation loss, the geometric area loss and the geometric barycentric loss. And training the segmentation network to be trained through the target loss function until the training stopping condition is reached, and obtaining the trained target segmentation network.

For example, the target loss function is: loss ═ Loss₀+Loss₁+Loss₂+Loss₃ (17)

Therein, Loss₃Is regional color Loss, Loss₀Is Loss of area division, Loss₁Loss of geometric area, Loss₂Is the geometric center of gravity loss.

In one embodiment, the method further comprises: acquiring an image to be processed, and sampling the image to be processed to obtain a corresponding sampled image to be processed; respectively extracting the features of the image to be processed and the sampled image to be processed through a trained target segmentation network to obtain a third feature map corresponding to the image to be processed and a fourth feature map corresponding to the sampled image to be processed; and performing fusion processing on the third feature map and the fourth feature map, and determining a target geometric figure in the image to be processed based on the target fusion feature map after the fusion processing.

Specifically, the computer device performs downsampling processing on the image to be processed to obtain a corresponding downsampled image to be processed. Or the computer equipment performs up-sampling processing on the image to be processed to obtain a corresponding up-sampled image to be processed.

In one embodiment, the computer device may perform sampling processing on the image to be processed through the trained segmentation network to obtain a corresponding sampled image to be processed. Further, the image to be processed is subjected to up-sampling processing or down-sampling processing through a segmentation network, and a corresponding up-sampling image to be processed or down-sampling image to be processed is obtained.

Specifically, the segmentation network performs multi-layer feature extraction on the image to be processed to obtain a third feature map corresponding to the image to be processed. Further, the segmentation network to be trained performs feature extraction on the image to be processed through the multilayer convolution layers, and uses the third feature map output by the previous convolution layer as the input of the next convolution layer to obtain the third feature map output by each convolution layer.

And the segmentation network carries out multi-layer feature extraction on the to-be-processed sampling image to obtain a fourth feature map corresponding to the to-be-processed sampling image. Further, the segmentation network performs feature extraction on the to-be-processed sample image through the multilayer convolution layers, and obtains a fourth feature map output by each convolution layer by taking a fourth feature map output by a previous convolution layer as an input of a next convolution layer.

In one embodiment, the partitioning network includes two encoders, each of which includes a plurality of convolutional layers. And the segmentation network respectively inputs the image to be processed and the sampling image to be processed into the two encoders, and respectively performs feature extraction on the image to be processed and the sampling image to be processed to obtain a corresponding third feature map and a corresponding fourth feature map.

Specifically, the segmentation network performs fusion processing on the third feature map and the fourth feature map to obtain a target fusion feature map. And the segmentation network performs 1 × 1 convolution processing on the target fusion feature map, and performs sampling processing on the feature map subjected to 1 × 1 convolution processing to obtain a target geometric figure in the image to be processed.

In this embodiment, the process of fusing the third feature map and the fourth feature map may be stitching or overlapping the third feature map and the fourth feature map.

In this embodiment, images to be processed are sampled and processed to obtain images of different sizes, the images of different sizes are respectively subjected to feature extraction through a trained segmentation network, so as to obtain feature maps of different resolutions, the feature maps of different resolutions are subjected to fusion processing, a target geometric figure in the image to be processed can be more accurately identified, and the accuracy of identification and segmentation of the target geometric figure can be improved.

In one embodiment, the method further comprises: determining the intersection ratio between the area of the target geometric figure and the area of a preset area; when the intersection ratio is larger than or equal to a threshold value, segmenting a target geometric figure from the image to be processed; and performing corresponding business processing based on the content included by the target geometric figure.

Specifically, the computer device may obtain gray values corresponding to respective pixel points in the target geometric figure, and determine the area of the target geometric figure according to the gray values. The computer equipment can obtain the area of the preset area, and calculate the intersection between the area of the target geometric figure and the area of the preset area and the union between the area of the target geometric figure and the area of the preset area, thereby calculating the ratio of the intersection to the union and obtaining the intersection ratio of the areas.

When the intersection ratio of the areas is larger than a threshold value, the computer equipment can segment the target geometric figure from the image to be processed. The computer device can obtain the content in the target geometric figure and perform corresponding business processing based on the content in the target geometric figure. The service processing may be, but is not limited to, a service data query, a service data modification, a service data update, a service data deletion, and the like.

In this embodiment, when the intersection ratio between the area of the target geometric figure and the preset area is greater than or equal to the threshold, it indicates that the target geometric figure is successfully verified, so that after the verification is successful, service processing is performed based on information in the target geometric figure, and the security of service processing can be improved.

In one embodiment, the method further comprises: acquiring an image to be processed, and extracting the characteristics of the image to be processed through a trained target segmentation network; and performing convolution and full connection processing on the extracted image features to output target vertex information corresponding to the target geometric figure in the image to be processed.

Specifically, the computer device obtains an image to be processed and inputs the image to be processed into a trained segmentation network. The segmentation network extracts the features of the image to be processed through the multilayer convolution layers, and the feature graph output by the previous convolution layer is used as the input of the next convolution layer until the feature graph output by the last convolution layer is obtained. And the segmentation network performs 1 × 1 convolution processing on the feature map output by the last convolution layer, and outputs the feature map obtained by the 1 × 1 convolution processing to the full connection layer. And carrying out full connection processing on the feature graph through a full connection layer to obtain the corresponding predicted vertex information of the target geometric figure in the image to be processed.

In the embodiment, the feature extraction is performed on the image to be processed based on the target segmentation network, and the convolution and full connection processing is performed on the extracted image feature, so that the target vertex information corresponding to the target geometric figure in the image to be processed can be accurately identified, the target geometric figure can be segmented from the image to be processed based on the target vertex information, and the accuracy and precision of identification and segmentation are improved.

determining a target polygon area formed by target vertex coordinates; determining the intersection ratio between the area of the target polygonal area and the area of a preset area; when the intersection ratio is larger than or equal to a threshold value, a target polygon area formed by target vertex coordinates is segmented from the image to be processed; and performing corresponding business processing based on the target polygon area.

Specifically, the computer device may calculate the area of the target polygon region from the target vertex coordinates. The computer device may obtain the area of the preset region, and calculate an intersection between the area of the target polygon region and the area of the preset region, and a union between the area of the target polygon region and the area of the preset region, thereby calculating a ratio of the intersection to the union, and obtaining an intersection ratio of the areas.

When the intersection ratio of the areas is greater than the threshold, the computer device may segment a target polygon region formed by the target vertex coordinates from the image to be processed. The computer device can obtain the content in the target polygon area and perform corresponding business processing based on the content in the target polygon area.

In this embodiment, when the intersection ratio between the area of the target polygon region and the preset region area is greater than or equal to the threshold, it indicates that the target polygon region is successfully verified, so that after the verification is successful, service processing is performed based on information in the target geometric figure, and the security of service processing can be improved.

In one embodiment, the sample image is an image containing a target document, the target geometry is a quadrilateral formed based on a boundary of the target document, the predetermined point information corresponding to the target geometry includes vertex coordinates corresponding to each of four vertices of the target document, and the segmentation network is a document segmentation network.

Wherein the target document is a quadrilateral document for characterizing user information, including but not limited to a pin, residence permit, driver's license, passport.

Specifically, the computer device acquires a document image including the target document, and determines an annotated document region corresponding to the target document based on the document image. And performing target segmentation processing on the certificate image through a certificate segmentation network to be trained to obtain a predicted certificate area corresponding to the target certificate, and determining predicted vertex coordinates corresponding to the target certificate based on image characteristics in the target segmentation processing. And determining the corresponding area characteristic loss according to the predicted certificate area and the marked certificate area. And determining the corresponding geometric feature loss according to the predicted certificate area and the predicted polygon area determined by the predicted vertex coordinates. The computer device constructs an objective loss function based on the regional characteristic loss and the geometric characteristic loss. Training the certificate segmentation network to be trained through a target loss function until a training stopping condition is reached, and obtaining a trained certificate segmentation network; the document segmentation network is used to segment a target document from a document image.

In this embodiment, the certificate image including the target certificate is subjected to the target segmentation processing by the certificate segmentation network to be trained, so that the predicted certificate area corresponding to the target certificate, which is predicted by the certificate segmentation network, can be obtained. Based on the image features in the target segmentation process, the predicted vertex coordinates of the target certificate in the certificate image predicted by the certificate segmentation network can be obtained. And constructing a target loss function according to the regional characteristic loss between the predicted certificate region and the marked certificate region and the geometric characteristic loss between the predicted certificate region and the predicted polygon region determined by the predicted vertex coordinates, so that the target loss function comprises multi-aspect loss characteristics. The certificate segmentation network to be trained is trained based on losses in various aspects, and the influence of the losses in various aspects on the identification and segmentation of the certificate segmentation network can be fully considered, so that the identification and segmentation accuracy and accuracy of the certificate segmentation network can be improved through training. The target certificate can be accurately identified and segmented from the image through the trained target certificate segmentation network. And the vertex coordinates of the target certificate in the image can be accurately output, so that the target certificate in the image can be accurately positioned.

Fig. 9 shows an application scenario of the target segmentation network in one embodiment.

The front end collects certificate videos through an SDK (software development kit) and sends the certificate videos to the background. The background uses an object segmentation network to segment the object geometric figure from the certificate video and outputs the object vertex coordinates of the object geometric figure in the certificate video. And the background returns the target geometric figure and the target vertex coordinates to the front end. And the front end calculates the intersection ratio between the area of the target geometric figure and the area of the preset area, and when the intersection ratio is greater than or equal to a threshold value, the target certificate is judged to be successfully verified, and the user is allowed to perform corresponding service processing.

Or the front end determines a target polygon region formed by the target vertex coordinates, calculates the intersection ratio between the area of the target polygon region and the area of the preset region, and when the intersection ratio is greater than or equal to the threshold value, judges that the target polygon region is successfully verified, and allows the user to perform corresponding service processing.

In one embodiment, a training method for a split network is provided, including:

and (S1) acquiring a sample image including the target geometric figure, and determining an annotation geometric area corresponding to the target geometric figure based on the sample image.

And (S2) sampling the sample image to obtain a corresponding sample image.

And (S3) respectively performing feature extraction on the sample image and the sample sampling image through a segmentation network to be trained to obtain a first feature map corresponding to the sample image and a second feature map corresponding to the sample sampling image.

And a step (S4) of performing fusion processing on the first feature map and the second feature map, and obtaining a predicted geometric region corresponding to the target geometric figure based on the sample fusion feature map after the fusion processing.

And (S5) performing convolution and full-concatenation processing on the second feature map to obtain predicted vertex information corresponding to the target geometry.

And (S6) obtaining the values of the color channels corresponding to the pixel points in the prediction geometric area respectively, and obtaining the values of the color channels corresponding to the pixel points in the labeling geometric area respectively.

And (S7) determining a first color value corresponding to the predicted geometric region according to the color channel value corresponding to each pixel point in the predicted geometric region, the first gray value corresponding to the corresponding pixel point and the first region area.

And (S8) determining a second color value corresponding to the labeling geometric area according to the color channel value corresponding to each pixel point in the labeling geometric area, the second gray value corresponding to the corresponding pixel point and the second area.

A step (S9) of determining a regional color loss between the predicted geometric region and the annotated geometric region based on the difference of the first color value and the second color value.

And (S10) acquiring first gray values corresponding to the pixels in the prediction geometric area, and taking the sum of the first gray values of the pixels in the prediction geometric area as the first area of the prediction geometric area.

And (S11) acquiring second gray values corresponding to the pixel points in the labeling geometric area, and taking the sum of the second gray values of the pixel points in the labeling geometric area as the second area of the labeling geometric area.

A step (S12) of determining a region segmentation loss between the prediction geometric region and the annotation geometric region based on the first region area and the second region area; determining a first region area of the predicted geometric region and calculating a third region area of the predicted polygonal region determined by the predicted vertex information; determining a geometric area loss based on a difference between the area of the first region and the area of the third region.

And (S13) constructing a gray value matrix based on the first gray values respectively corresponding to the pixel points in the prediction geometric area.

And (S14) taking the sum of the first gray values of each row in the gray value matrix and the contrast value of the first area of the prediction geometric area as the abscissa in the first barycentric coordinate of the prediction geometric area.

And (S15) taking the sum of the first gray values of each column in the gray value matrix and the contrast value of the first area of the prediction geometric area as the ordinate in the first barycentric coordinate.

A step (S16) of determining a second center of gravity coordinate of a prediction polygon region formed by the prediction vertex coordinates based on the prediction vertex coordinates in the prediction vertex information; a geometric barycentric loss between the predicted geometric region and the predicted polygonal region is determined based on a distance between the first barycentric coordinate and the second barycentric coordinate.

And a step (S17) for constructing a target loss function on the basis of the region color loss, the region segmentation loss, the geometric area loss, and the geometric centroid loss.

And (S18) training the segmentation network to be trained through the target loss function until the training stop condition is reached, and obtaining the trained target segmentation network.

And (S19) acquiring the image to be processed, and sampling the image to be processed to obtain a corresponding sampled image to be processed.

And (S20) respectively extracting the features of the image to be processed and the sample image to be processed through the trained target segmentation network to obtain a third feature map corresponding to the image to be processed and a fourth feature map corresponding to the sample image to be processed.

And (S21) performing fusion processing on the third feature map and the fourth feature map, and determining a target geometric figure in the image to be processed based on the target fusion feature map after the fusion processing.

And (S22) performing convolution and full-connection processing on the fourth feature map to output target vertex information corresponding to the target geometric figure in the image to be processed.

A step (S23) of determining the intersection ratio between the area of the target geometric figure and the area of a preset area; and when the intersection ratio is larger than or equal to the threshold value, segmenting the target geometric figure from the image to be processed, and performing corresponding business processing based on the content included by the target geometric figure.

In this embodiment, the target segmentation processing is performed on the sample image including the target geometric figure through the segmentation network to be trained, so that a predicted geometric region corresponding to the target geometric figure predicted by the segmentation network can be obtained. And determining the color loss of the region based on the color difference between the predicted geometric region and the labeled geometric region, and determining the color loss between the geometric region predicted by the segmentation network and the real geometric region. Based on the area of the predicted geometric region and the area of the labeled geometric region, the segmentation difference between the geometric region predicted by the segmentation network and the real geometric region can be determined.

And determining the geometric area loss based on the area difference between the predicted geometric area and the predicted polygonal area, and determining the area loss between the geometric area and the polygonal area predicted by the segmentation network, wherein the large area loss represents the inaccuracy of prediction and segmentation of the segmentation network. Based on the gravity center position of the predicted geometric region and the gravity center position of the predicted polygonal region, gravity center loss between the geometric region predicted by the segmentation network and the predicted polygonal region can be determined, and the fact that the gravity center loss is large indicates that the prediction and segmentation of the segmentation network are inaccurate.

The color loss, the segmentation loss, the area loss and the gravity center loss are used as conditions for training the segmentation network, the segmentation network to be trained is trained based on the loss in various aspects, the influence of the loss in various aspects on the identification and segmentation of the segmentation network can be fully considered, and the identification and segmentation accuracy of the segmentation network can be improved through training. The target geometric figure can be accurately identified and segmented from the image through the trained target segmentation network. And the vertex information of the target geometric figure in the image can be accurately output, so that the target geometric figure in the image to be processed can be accurately positioned.

The application also provides an application scenario, and the application scenario applies the method for training the certificate segmentation network. Specifically, the application of the training method of the certificate segmentation network in the application scenario is as follows:

the computer device obtains a sample credential image that includes a target credential and determines an annotated credential region corresponding to the target credential based on the sample credential image. And the computer equipment samples the sample certificate image to obtain a corresponding sample sampling image. And respectively extracting the characteristics of the sample certificate image and the sample sampling image by the computer equipment through a certificate segmentation network to be trained to obtain a first characteristic diagram corresponding to the sample certificate image and a second characteristic diagram corresponding to the sample sampling image.

And the computer equipment performs fusion processing on the first characteristic diagram and the second characteristic diagram, and obtains a predicted certificate area corresponding to the target certificate on the basis of the sample fusion characteristic diagram after the fusion processing.

The computer equipment performs convolution and full-connection processing on the second feature graph to obtain a predicted vertex coordinate [ P ] corresponding to the target certificate₁₁,P₁₂],[P₂₁,P₂₂],[P₃₁,P₃₂]，[P₄₁,P₄₂]。

And the computer equipment acquires the values of the color channels corresponding to the pixel points in the predicted certificate area and acquires the values of the color channels corresponding to the pixel points in the marked certificate area. Substituting the value of the color channel corresponding to each pixel point in the predicted certificate area, the first gray value corresponding to the corresponding pixel point and the first area into the following formula (3), and calculating the first color value C corresponding to the predicted certificate area:

similarly, according to the value of the color channel corresponding to each pixel point in the marked certificate area, the second gray value corresponding to the corresponding pixel point and the second area, the second color value C corresponding to the marked certificate area is determined₀。

A first color value C and a second color value C₀Substituting the formula (1) into the formula (1) to obtain the regional color Loss between the predicted certificate region and the marked certificate region₃：

Loss₃＝|C₀-C| (1)

Acquiring first gray values corresponding to all pixel points in the predicted certificate area, and taking the sum of the first gray values of all the pixel points in the predicted certificate area as the first area of the predicted certificate area, namely:

S₁＝∑_i,jI_i,j(x) (11)

acquiring second gray values corresponding to all pixel points in the marked certificate area, and taking the sum of the second gray values of all the pixel points in the marked certificate area as the second area sigma of the marked certificate area_i,jGT_i,j(x)。

Based on the area of the first region ∑_i,jI_i,j(x) And the area of the second region ∑_i,jGT_i,j(x) Calculating Loss of segmentation between predicted and annotated credential regions₀Namely:

determining a first region area of the predicted document region and calculating a third region area S of the predicted polygon region determined from the predicted vertex information₂Namely:

according to the first region area S₁And the area S of the third region₂Calculating Loss of area of certificate Loss₁：

Loss₁＝|S₁-S₂| (13)

And constructing a gray value matrix based on the first gray values respectively corresponding to the pixel points in the predicted certificate area. Taking the sum of the first gray values of each row in the gray value matrix and the contrast value of the first area of the predicted certificate area as a first barycentric coordinate (x) of the predicted certificate area₁,y₁) Abscissa x of (1)₁. Taking the sum of the first gray values of each row in the gray value matrix and the contrast value of the first area of the predicted certificate area as the ordinate y in the first barycentric coordinate₁The following are:

determining a second center of gravity (x) coordinate of a prediction polygon region formed by the prediction vertex coordinates based on the prediction vertex coordinates in the prediction vertex information₂,y₂)：

Based on the first barycentric coordinate (x)₁,y₁) And a second centroid coordinate (x)₂,y₂) Determining the Loss of center of gravity of the document between the predicted document region and the predicted polygon region₂：

Loss₂＝(x₁-x₂)²+(y₁-y₂)² (15)

Loss of color based on region₃Loss of region division (Loss)₀Loss of area of certificate Loss₁Loss of center of gravity of certificate Loss₂And constructing a target Loss function Loss:

Loss＝Loss₀+Loss₁+Loss₂+Loss₃ (17)

and training the certificate segmentation network to be trained through the target loss function until the loss value of the certificate segmentation network is less than or equal to the loss threshold value, and obtaining the trained target certificate segmentation network.

When the user needs to transact related business of the bank, the front end of the computer equipment shoots the identity document image of the user and transmits the collected identity document image to the background.

The background carries out target segmentation processing through the trained target certificate segmentation network to obtain an identity card area in the identity certificate image, and can also output 4 vertex coordinates of the identity card in the identity certificate image. And the background returns the identity card area and the vertex coordinates to the front end.

The front end calculates the intersection ratio between the area of the target certificate and the area of the preset area, and when the intersection ratio is larger than or equal to the threshold value, the user is allowed to conduct banking business related to the user, such as bank card opening, bank reservation information inquiring and modifying and the like.

Or the front end calculates the intersection ratio between the area of the quadrilateral region formed by the 4 vertex coordinates and the area of the preset region, and when the intersection ratio is greater than or equal to the threshold value, the user is allowed to conduct banking business related to the individual, such as opening a bank card, inquiring and modifying bank reservation information and the like.

It should be understood that although the various steps in the flowcharts of fig. 2-9 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2-9 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed in turn or alternately with other steps or at least some of the other steps or stages.

In one embodiment, as shown in fig. 10, there is provided a training apparatus for network partitioning, which may be a part of a computer device using a software module or a hardware module, or a combination of the two, and the training apparatus 1000 for network partitioning specifically includes: an obtaining module 1002, a predicting module 1004, a regional feature loss determining module 1006, a geometric feature loss determining module 1008, a constructing module 1010, and a training module 1012, wherein:

an obtaining module 1002, configured to obtain a sample image including a target geometry, and determine, based on the sample image, an annotated geometric region corresponding to the target geometry.

The prediction module 1004 is configured to perform target segmentation processing on the sample image through a segmentation network to be trained to obtain a prediction geometric region corresponding to the target geometric figure, and determine prediction vertex information corresponding to the target geometric figure based on image features in the target segmentation processing.

A region feature loss determining module 1006, configured to determine a corresponding region feature loss according to the predicted geometric region and the labeled geometric region.

And a geometric feature loss determining module 1008, configured to determine a corresponding geometric feature loss according to the predicted geometric region and the predicted polygon region determined by the predicted vertex information.

A building module 1010, configured to build an objective loss function based on the regional characteristic loss and the geometric characteristic loss.

A training module 1012, configured to train the segmentation network to be trained through the target loss function until a training stop condition is reached, and obtain a trained target segmentation network; the target segmentation network is used for segmenting a target geometric figure from an image to be processed.

In this embodiment, the target segmentation processing is performed on the sample image including the target geometric figure through the segmentation network to be trained, so that a predicted geometric region corresponding to the target geometric figure predicted by the segmentation network can be obtained. Based on the image features in the target segmentation process, prediction vertex information of the prediction target geometry predicted by the segmentation network in the sample image can be obtained. And constructing an objective loss function according to the area characteristic loss between the prediction geometric area and the labeling geometric area and the geometric characteristic loss between the prediction geometric area and the prediction polygon area determined by the prediction vertex information, so that the objective loss function comprises multi-aspect loss characteristics. The segmented network to be trained is trained based on losses in various aspects, and the influence of the losses in various aspects on the identification and segmentation of the segmented network can be fully considered, so that the identification and segmentation accuracy of the segmented network can be improved through training. The target geometric figure can be accurately identified and segmented from the image through the trained target segmentation network. And the vertex information of the target geometric figure in the image can be accurately output, so that the target geometric figure in the image to be processed can be accurately positioned.

In one embodiment, the prediction module 1004 is further configured to: sampling the sample image to obtain a corresponding sample image; respectively extracting the characteristics of the sample image and the sample sampling image through a segmentation network to be trained to obtain a first characteristic diagram corresponding to the sample image and a second characteristic diagram corresponding to the sample sampling image; performing fusion processing on the first feature map and the second feature map, and acquiring a predicted geometric area corresponding to the target geometric figure based on the sample fusion feature map after the fusion processing; and performing convolution and full connection processing on the second feature graph to obtain the predicted vertex information corresponding to the target geometric figure.

In one embodiment, the regional characteristic loss determination module 1006 is further configured to: determining a first color value corresponding to the predicted geometric area and a second color value corresponding to the labeled geometric area; determining a region color loss between the predicted geometric region and the annotated geometric region based on a difference of the first color value and the second color value; determining a first region area corresponding to the prediction geometric region and a second region area corresponding to the labeling geometric region; determining a region segmentation loss between the predicted geometric region and the labeled geometric region based on the first region area and the second region area; and determining corresponding regional characteristic loss according to the regional color loss and the regional segmentation loss.

In one embodiment, the regional characteristic loss determination module 1006 is further configured to: obtaining the values of color channels corresponding to all the pixel points in the prediction geometric region respectively, and obtaining the values of the color channels corresponding to all the pixel points in the labeling geometric region respectively; determining a first color value corresponding to the predicted geometric region according to the value of the color channel corresponding to each pixel point in the predicted geometric region, the first gray value corresponding to the corresponding pixel point and the first region area; and determining a second color value corresponding to the labeling geometric area according to the value of the color channel corresponding to each pixel point in the labeling geometric area, the second gray value corresponding to the corresponding pixel point and the second area.

In one embodiment, the regional characteristic loss determination module 1006 is further configured to: acquiring first gray values corresponding to all pixel points in the prediction geometric region respectively, and acquiring second gray values corresponding to all pixel points in the labeling geometric region respectively; taking the sum of the first gray values of all the pixel points in the predicted geometric area as the first area of the predicted geometric area; and taking the sum of the second gray values of all the pixel points in the labeling geometric area as the area of the second area of the labeling geometric area.

In one embodiment, the geometric feature loss determination module 1008 is further configured to: determining a first region area of the predicted geometric region and calculating a third region area of the predicted polygonal region determined by the predicted vertex information; determining geometric area loss according to the difference between the area of the first region and the area of the third region; determining a first barycentric position corresponding to the predicted geometric region and a second barycentric position corresponding to the predicted polygonal region; determining geometric gravity center loss according to the distance between the first gravity center position and the second gravity center position; determining a geometric feature loss based on the geometric area loss and the geometric center of gravity loss.

In one embodiment, the first center of gravity position comprises a first center of gravity coordinate and the second center of gravity position comprises a second center of gravity coordinate; the geometric feature loss determination module 1008: determining a first barycentric coordinate of the predicted geometric area according to a first gray value corresponding to each pixel point in the predicted geometric area; determining a second center of gravity coordinate of a prediction polygon region formed by the prediction vertex coordinates based on the prediction vertex coordinates in the prediction vertex information; a geometric barycentric loss between the predicted geometric region and the predicted polygonal region is determined based on a distance between the first barycentric coordinate and the second barycentric coordinate.

In one embodiment, the geometric feature loss determination module 1008 is further configured to: constructing a gray value matrix based on first gray values respectively corresponding to all pixel points in the prediction geometric region; taking the sum of the first gray values of each row in the gray value matrix and a contrast value of the first area of the prediction geometric area as an abscissa in the first barycentric coordinate of the prediction geometric area; and taking the sum of the first gray values of each column in the gray value matrix and a contrast value of the area of the first region of the prediction geometric region as a vertical coordinate in the first barycentric coordinate.

In one embodiment, the apparatus further comprises: and (5) partitioning the module. The segmentation module is also used for acquiring an image to be processed, and sampling the image to be processed to obtain a corresponding sampled image to be processed; respectively extracting the features of the image to be processed and the sampled image to be processed through a trained target segmentation network to obtain a third feature map corresponding to the image to be processed and a fourth feature map corresponding to the sampled image to be processed; and performing fusion processing on the third feature map and the fourth feature map, and determining a target geometric figure in the image to be processed based on the target fusion feature map after the fusion processing.

In one embodiment, the apparatus further comprises: and a service processing module. The service processing module is further configured to:

determining the intersection ratio between the area of the target geometric figure and the area of a preset area; when the intersection ratio is larger than or equal to a threshold value, segmenting a target geometric figure from the image to be processed; and performing corresponding business processing based on the content included by the target geometric figure.

In one embodiment, the apparatus further comprises: a vertex prediction module. The vertex prediction module is further to: acquiring an image to be processed, and extracting the characteristics of the image to be processed through a trained target segmentation network; and performing convolution and full connection processing on the extracted image features to output target vertex information corresponding to the target geometric figure in the image to be processed.

In one embodiment, the target vertex information includes target vertex coordinates; the device also includes: and a service processing module. The service processing module is further configured to: determining a target polygon area formed by target vertex coordinates; determining the intersection ratio between the area of the target polygonal area and the area of a preset area; when the intersection ratio is larger than or equal to a second threshold value, a target polygon area formed by target vertex coordinates is segmented from the image to be processed; and performing corresponding business processing based on the target polygon area.

In one embodiment, the sample image is an image containing a target document, the target geometry is a quadrilateral formed based on a boundary of the target document, the predetermined point information corresponding to the target geometry includes vertex coordinates corresponding to each of four vertices of the target document, and the target segmentation network is a document segmentation network.

For specific limitations of the training apparatus for the segmented network, reference may be made to the above limitations of the training method for the segmented network, and details are not repeated here. The various modules in the network-splitting training apparatus described above may be implemented in whole or in part by software, hardware, and combinations thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 11. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing training data of the segmented network. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a training method for segmenting a network.

Those skilled in the art will appreciate that the architecture shown in fig. 11 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is further provided, which includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of the above method embodiments when executing the computer program.

In an embodiment, a computer-readable storage medium is provided, in which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.

In one embodiment, a computer program product or computer program is provided that includes computer instructions stored in a computer-readable storage medium. The computer instructions are read by a processor of a computer device from a computer-readable storage medium, and the computer instructions are executed by the processor to cause the computer device to perform the steps in the above-mentioned method embodiments.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method of training a segmented network, the method comprising:

2. The method according to claim 1, wherein the performing a target segmentation process on the sample image through a segmentation network to be trained to obtain a predicted geometric region corresponding to the target geometric figure, and determining predicted vertex information corresponding to the target geometric figure based on image features in the target segmentation process comprises:

sampling the sample image to obtain a corresponding sample image;

respectively extracting the characteristics of the sample image and the sample sampling image through the segmentation network to be trained to obtain a first characteristic diagram corresponding to the sample image and a second characteristic diagram corresponding to the sample sampling image;

fusing the first feature map and the second feature map, and obtaining a predicted geometric area corresponding to the target geometric figure based on a sample fused feature map after the fusion processing;

and performing convolution and full-connection processing on the second feature graph to obtain prediction vertex information corresponding to the target geometric figure.

3. The method of claim 1, wherein determining a corresponding region feature loss based on the predicted geometric region and the labeled geometric region comprises:

determining a first color value corresponding to the predicted geometric area and a second color value corresponding to the labeling geometric area;

determining a region color loss between the predicted geometric region and the annotated geometric region based on a difference of the first color value and the second color value;

determining a first region area corresponding to the prediction geometric region and a second region area corresponding to the labeling geometric region;

determining a region segmentation loss between the prediction geometric region and the annotation geometric region based on the first region area and the second region area;

and determining corresponding regional characteristic loss according to the regional color loss and the regional segmentation loss.

4. The method of claim 3, wherein the determining the first color value corresponding to the predicted geometric region and the second color value corresponding to the labeled geometric region comprises:

obtaining the values of the color channels corresponding to the pixel points in the prediction geometric region respectively, and obtaining the values of the color channels corresponding to the pixel points in the labeling geometric region respectively;

determining a first color value corresponding to the predicted geometric region according to the value of the color channel corresponding to each pixel point in the predicted geometric region, the first gray value corresponding to the corresponding pixel point and the area of the first region;

and determining a second color value corresponding to the labeling geometric area according to the value of the color channel corresponding to each pixel point in the labeling geometric area, the second gray value corresponding to the corresponding pixel point and the second area.

5. The method according to claim 3, wherein the determining the first region area corresponding to the predicted geometric region and the second region area corresponding to the labeled geometric region comprises:

acquiring first gray values respectively corresponding to all pixel points in the prediction geometric region, and acquiring second gray values respectively corresponding to all pixel points in the labeling geometric region;

taking the sum of the first gray values of all the pixel points in the prediction geometric region as the first region area of the prediction geometric region;

and taking the sum of the second gray values of the pixel points in the labeling geometric region as the second region area of the labeling geometric region.

6. The method of claim 1, wherein determining a corresponding geometric feature penalty based on the predicted geometric region and a predicted polygon region determined from the predicted vertex information comprises:

determining a first region area of the predicted geometric region and calculating a third region area of the predicted polygonal region determined by the predicted vertex information;

determining a geometric area loss according to the difference between the area of the first region and the area of the third region;

determining a first barycentric position corresponding to the predicted geometric region and a second barycentric position corresponding to the predicted polygonal region;

determining a geometric gravity center loss according to a distance between the first gravity center position and the second gravity center position;

determining a geometric feature loss based on the geometric area loss and the geometric centroid loss.

7. The method of claim 6, wherein the first center of gravity position comprises a first center of gravity coordinate and the second center of gravity position comprises a second center of gravity coordinate; determining a geometric center of gravity loss based on a distance between the first center of gravity position and the second center of gravity position, comprising:

determining a first barycentric coordinate of the predicted geometric region according to a first gray value corresponding to each pixel point in the predicted geometric region;

determining a second center of gravity coordinate of a prediction polygon region formed by the prediction vertex coordinates based on the prediction vertex coordinates in the prediction vertex information;

determining a geometric center of gravity loss based on a distance between the first center of gravity position and the second center of gravity position, comprising:

determining a geometric barycentric loss between the predicted geometric region and the predicted polygonal region based on a distance between the first barycentric coordinate and the second barycentric coordinate.

8. The method according to claim 7, wherein determining the first barycentric coordinate of the predicted geometric region according to the first gray scale values corresponding to the respective pixel points in the predicted geometric region comprises:

constructing a gray value matrix based on first gray values respectively corresponding to all pixel points in the prediction geometric region;

taking the sum of the first gray values of each row in the gray value matrix and a contrast value of the first area of the prediction geometric area as an abscissa in the first barycentric coordinate of the prediction geometric area;

and taking the sum of the first gray values of each column in the gray value matrix and a contrast value of the area of the first region of the prediction geometric region as a vertical coordinate in the first barycentric coordinate.

9. The method of claim 1, further comprising:

acquiring an image to be processed, and sampling the image to be processed to obtain a corresponding sampled image to be processed;

respectively extracting the features of the image to be processed and the sampled image to be processed through the trained target segmentation network to obtain a third feature map corresponding to the image to be processed and a fourth feature map corresponding to the sampled image to be processed;

and performing fusion processing on the third feature map and the fourth feature map, and determining a target geometric figure in the image to be processed based on the target fusion feature map after the fusion processing.

10. The method of claim 9, further comprising:

determining the intersection ratio between the area of the target geometric figure and the area of a preset area;

when the intersection ratio is larger than or equal to a threshold value, segmenting the target geometric figure from the image to be processed;

and performing corresponding business processing based on the content included by the target geometric figure.

11. The method of claim 1, further comprising:

acquiring an image to be processed, and extracting the characteristics of the image to be processed through the trained target segmentation network;

and performing convolution and full-connection processing on the extracted image features to output target vertex information corresponding to the target geometric figure in the image to be processed.

12. The method according to any one of claims 1 to 11, wherein the sample image is an image containing a target document, the target geometry is a quadrilateral formed based on a boundary of the target document, the predetermined point information corresponding to the target geometry includes vertex coordinates corresponding to each of four vertices of the target document, and the target segmentation network is a document segmentation network.

13. An apparatus for training a split network, the apparatus comprising:

14. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 12.

15. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 12.