CN112052839B

CN112052839B - Image data processing method, apparatus, device and medium

Info

Publication number: CN112052839B
Application number: CN202011078009.6A
Authority: CN
Inventors: 胡益清; 王元斐; 杨芮
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-10-10
Filing date: 2020-10-10
Publication date: 2021-06-15
Anticipated expiration: 2040-10-10
Also published as: CN112052839A

Abstract

The embodiment of the application provides an image data processing method, an image data processing device and an image data processing medium, wherein the method relates to a target detection technology in artificial intelligence, and comprises the following steps: acquiring object contour characteristics corresponding to a source image, and generating a contour mask image containing edge pixel points according to the object contour characteristics, wherein the source image comprises a target object, and the contour formed by the edge pixel points in the contour mask image is associated with the contour of the target object; carrying out transformation processing on edge pixel points contained in the contour mask image to obtain M discontinuous initial line segments corresponding to the edge pixel points; the contour formed by the M initial line segments is associated with the contour of the target object; and determining a target vertex associated with the target object according to the line segment intersection points among the M initial line segments, and determining an object edge shape for representing the outline of the target object in the source image according to the target vertex. By adopting the embodiment of the application, the detection accuracy of the target object in the image can be improved.

Description

Image data processing method, apparatus, device and medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a method, an apparatus, a device, and a medium for processing image data.

Background

The target detection is a hot direction of computer vision and digital image processing, and is widely applied to the fields of robot navigation, intelligent video monitoring, face recognition and the like.

In the existing target detection technology, a target object in an image may be located by using different preset-shaped borders (e.g., rectangular borders), and the preset-shaped border including the target object is used as a detection result for the target object in the image. When the difference between the edge distribution of the actual target object contained in the image and the preset shape frame is large, the shape of the preset shape frame is not matched with that of the actual target object, so that the accuracy of the image detection result is low.

Disclosure of Invention

The embodiment of the application provides an image data processing method, device, equipment and medium, which can improve the detection accuracy of a target object in an image.

An embodiment of the present application provides an image data processing method, including:

acquiring object contour characteristics corresponding to a source image, and generating a contour mask image containing edge pixel points according to the object contour characteristics; the source image comprises a target object, and the contour formed by edge pixel points in the contour mask image is associated with the contour of the target object;

carrying out transformation processing on edge pixel points contained in the contour mask image to obtain M discontinuous initial line segments corresponding to the edge pixel points; the contour formed by the M initial line segments is associated with the contour of the target object, and M is a positive integer;

and determining a target vertex associated with the target object according to the line segment intersection points among the M initial line segments, and determining an object edge shape for representing the outline of the target object in the source image according to the target vertex.

The above-mentioned obtaining the object contour feature corresponding to the source image, generating a contour mask image containing edge pixel points according to the object contour feature, including:

acquiring spatial position characteristics corresponding to a source image, and determining the spatial position characteristics and the source image as image input information;

inputting image input information into an image segmentation model, and acquiring at least two image content characteristics corresponding to the image input information in the image segmentation model;

the method comprises the steps of obtaining feature weights corresponding to at least two image content features respectively, conducting feature fusion on the at least two image content features according to the feature weights to obtain object contour features corresponding to a source image, and generating a contour mask image corresponding to the source image according to the object contour features.

The above-mentioned spatial position characteristic that obtains the source image and corresponds, confirm spatial position characteristic and source image as image input information, include:

acquiring a source image, standardizing the source image to obtain a standardized image, and acquiring space coordinate information corresponding to pixel points in the standardized image;

generating a first position characteristic corresponding to a pixel point in a source image according to an abscissa value in the space coordinate information;

generating a second position characteristic corresponding to the pixel point in the source image according to the longitudinal coordinate value in the space coordinate information;

and determining the first position characteristic and the second position characteristic as space position characteristics, and splicing the space position characteristics and the source image to obtain image input information.

The above inputting the image input information to the image segmentation model, and acquiring at least two image content features corresponding to the image input information in the image segmentation model, includes:

acquiring an image segmentation model, and inputting image input information into the image segmentation model;

performing convolution processing on image input information according to the N convolution layers in the image segmentation model to obtain image characteristics corresponding to the N convolution layers respectively; the image characteristics corresponding to the N convolutional layers respectively have different size information, and N is a positive integer;

and selecting at least two image characteristics from the image characteristics respectively corresponding to the N convolutional layers as at least two image content characteristics corresponding to the source image.

The obtaining of the feature weights corresponding to the at least two image content features respectively, and performing feature fusion on the at least two image content features according to the feature weights to obtain object contour features corresponding to the source image includes:

adding at least two image content characteristics to obtain candidate image content characteristics corresponding to the source image;

performing global pooling on the candidate image content features to obtain global description vectors corresponding to the candidate image content features;

transforming the global description vector into a feature weight corresponding to the candidate image content feature according to the full connection layer and the activation layer;

and performing product processing on the feature weight and the candidate image content feature to generate a contour mask image corresponding to the source image.

Determining a target vertex associated with the target object according to the line segment intersection points among the M initial line segments, and determining an object edge shape for characterizing the contour of the target object in the source image according to the target vertex, wherein the method comprises the following steps:

acquiring line segment distances among the M initial line segments, and merging the M initial line segments according to the line segment distances to obtain K candidate line segments; k is a positive integer less than M;

acquiring line segment intersection points between any two candidate line segments in the K candidate line segments, combining the line segment intersection points to obtain at least two intersection point groups, and determining the intersection point groups meeting the target sorting sequence as S candidate vertex sets; the number of the line segment intersection points contained in each intersection point group is the same, and S is a positive integer less than or equal to the number of at least two intersection point groups;

acquiring intersection ratio indexes corresponding to the S candidate vertex sets respectively, determining line segment intersection points in the candidate vertex set corresponding to the maximum intersection ratio index as target vertices associated with the target object, and constructing an object edge shape in the source image according to the target vertices.

The above-mentioned line segment distance that obtains between M initial line segments merges M initial line segments according to the line segment distance, obtains K candidate line segments, includes:

obtaining an initial line segment L in M initial line segments_iAnd an initial line segment L_jObtaining an initial line segment L_iAnd an initial line segment L_jRespectively corresponding endpoint coordinate information; i and j are positive integers less than or equal to M, and i and j are not equal;

determining an initial line segment L according to the endpoint coordinate information_iAnd an initial line segment L_jLine segment distance S between_ij；

When the line segment distance S_ijWhen the distance is less than the threshold value, the initial line segment L is divided into two segments_iAnd an initial line segment L_jAnd merging the line segments into candidate line segments.

Wherein, the determining the line segment intersection point in the candidate vertex set corresponding to the maximum intersection ratio index as the target vertex associated with the target object includes:

determining the candidate vertex set corresponding to the maximum intersection ratio index as a target vertex set, and constructing a candidate frame according to the target vertex set; the top point of the candidate frame is the line segment intersection point contained in the target top point set;

acquiring candidate edge straight lines corresponding to the candidate frames, screening noise points in edge pixel points covered by the candidate edge straight lines, and updating the candidate edge straight lines according to the screened edge pixel points to obtain updated candidate edge straight lines;

and determining the straight line intersection points between the updated candidate edge straight lines as target vertexes associated with the target object.

Wherein, the method also comprises:

segmenting a source image according to the edge shape of the object, and determining pixel points covered by the edge shape of the object as a target image containing a target object;

and carrying out image recognition processing on the target object image to obtain an image recognition result aiming at the target object.

Wherein, the method also comprises:

acquiring a sample data set, acquiring sample position characteristics of sample images in the sample data set, and determining the sample position characteristics and the sample images as sample input information; the sample data set comprises sample images carrying labeling edge outlines;

inputting sample input information into an initial image segmentation model, and acquiring at least two sample content characteristics corresponding to a sample image in the initial image segmentation model;

acquiring sample weights corresponding to at least two sample content features respectively, performing feature fusion on the at least two sample content features according to the sample weights to obtain sample contour features corresponding to the sample images, and generating sample contour mask images corresponding to the sample images according to the sample contour features;

and correcting the network parameters of the initial image segmentation model according to the labeled edge contour corresponding to the sample contour mask image and the sample image, and determining the initial image segmentation model containing the corrected network parameters as an image segmentation model.

Wherein, the acquiring the sample data set includes:

obtaining an initial sample data setTo obtain an initial sample image X_pAnd an initial sample image X_q(ii) a p and q are positive integers less than or equal to the number of initial sample images contained in the initial sample data set, and p and q are not equal;

obtaining an initial sample image X_pObtaining an initial sample image X_qThe foreground object image is subjected to deformation processing to obtain a deformed foreground object image;

and combining the deformed foreground object image and the background image to obtain an extended sample image, and adding the extended sample image to the initial sample data set to obtain a sample data set.

Wherein, the acquiring the sample data set includes:

acquiring an initial sample data set, and acquiring an initial sample image X in the initial sample data set_p(ii) a p is a positive integer less than or equal to the number of initial sample images contained in the initial sample data set;

obtaining an initial sample image X_pDetermining the initial sample image X according to the minimum abscissa value, the maximum abscissa value, the minimum ordinate value and the maximum ordinate value in the coordinate information of the corresponding labeling point of the labeling edge profile_pThe background edge of (1);

filling the background edge to obtain an initial sample image X_pAnd adding the corresponding extended sample image to the initial sample data set to obtain a sample data set.

An embodiment of the present application provides an image data processing apparatus, including:

the acquisition module is used for acquiring the object contour characteristics corresponding to the source image and generating a contour mask image containing edge pixel points according to the object contour characteristics; the source image comprises a target object, and the contour formed by edge pixel points in the contour mask image is associated with the contour of the target object;

the transformation processing module is used for carrying out transformation processing on the edge pixel points contained in the contour mask image to obtain M discontinuous initial line segments corresponding to the edge pixel points; the contour formed by the M initial line segments is associated with the contour of the target object, and M is a positive integer;

and the edge shape determining module is used for determining a target vertex associated with the target object according to the line segment intersection points among the M initial line segments, and determining an object edge shape for representing the outline of the target object in the source image according to the target vertex.

Wherein, the acquisition module includes:

the position characteristic acquisition unit is used for acquiring the space position characteristics corresponding to the source image and determining the space position characteristics and the source image as image input information;

the content characteristic acquisition unit is used for inputting the image input information into the image segmentation model and acquiring at least two image content characteristics corresponding to the image input information in the image segmentation model;

the feature fusion unit is used for acquiring feature weights corresponding to the at least two image content features respectively, performing feature fusion on the at least two image content features according to the feature weights to obtain object contour features corresponding to the source image, and generating a contour mask image corresponding to the source image according to the object contour features.

Wherein, the position feature acquisition unit includes:

the standardization processing subunit is used for acquiring a source image, standardizing the source image to obtain a standardized image, and acquiring space coordinate information corresponding to pixel points in the standardized image;

the first feature generation subunit is used for generating a first position feature corresponding to the pixel point in the source image according to the abscissa value in the space coordinate information;

the second feature generation subunit is used for generating a second position feature corresponding to the pixel point in the source image according to the longitudinal coordinate value in the space coordinate information;

and the feature splicing subunit is used for determining the first position feature and the second position feature as spatial position features, and splicing the spatial position features with the source image to obtain image input information.

Wherein the content feature acquisition unit includes:

the input subunit is used for acquiring an image segmentation model and inputting image input information into the image segmentation model;

the convolution subunit is used for performing convolution processing on the image input information according to the N convolution layers in the image segmentation model to obtain image characteristics corresponding to the N convolution layers respectively; the image characteristics corresponding to the N convolutional layers respectively have different size information, and N is a positive integer;

and the characteristic selecting subunit is used for selecting at least two image characteristics from the image characteristics respectively corresponding to the N convolutional layers as at least two image content characteristics corresponding to the source image.

Wherein, the feature fusion unit includes:

the characteristic adding subunit is used for adding the at least two image content characteristics to obtain candidate image content characteristics corresponding to the source image;

the pooling subunit is used for performing global pooling on the candidate image content features to obtain global description vectors corresponding to the candidate image content features;

the weight obtaining subunit is used for converting the global description vector into a feature weight corresponding to the candidate image content feature according to the full connection layer and the activation layer;

and the product operation subunit is used for performing product processing on the feature weight and the content feature of the candidate image to generate a contour mask image corresponding to the source image.

Wherein the edge shape determining module includes:

the line segment merging unit is used for acquiring line segment distances among the M initial line segments, and merging the M initial line segments according to the line segment distances to obtain K candidate line segments; k is a positive integer less than M;

the intersection point group unit is used for acquiring line segment intersection points between any two candidate line segments in the K candidate line segments, combining the line segment intersection points to obtain at least two intersection point groups, and determining the intersection point groups meeting the target sorting sequence as S candidate vertex sets; the number of the line segment intersection points contained in each intersection point group is the same, and S is a positive integer less than or equal to the number of at least two intersection point groups;

and the object edge shape construction unit is used for acquiring intersection ratio indexes corresponding to the S candidate vertex sets respectively, determining line segment intersection points in the candidate vertex set corresponding to the maximum intersection ratio index as target vertices associated with the target object, and constructing an object edge shape in the source image according to the target vertices.

Wherein, the line segment merging unit includes:

an initial line segment obtaining subunit, configured to obtain an initial line segment L of the M initial line segments_iAnd an initial line segment L_jObtaining an initial line segment L_iAnd an initial line segment L_jRespectively corresponding endpoint coordinate information; i and j are positive integers less than or equal to M, and i and j are not equal;

a line segment distance determining subunit for determining the initial line segment L according to the endpoint coordinate information_iAnd an initial line segment L_jLine segment distance S between_ij；

A line segment distance judging subunit for judging the line segment distance S_ijWhen the distance is less than the threshold value, the initial line segment L is divided into two segments_iAnd an initial line segment L_jAnd merging the line segments into candidate line segments.

Wherein the object edge shape construction unit includes:

the candidate frame construction subunit is used for determining the candidate vertex set corresponding to the maximum intersection ratio index as a target vertex set and constructing a candidate frame according to the target vertex set; the top point of the candidate frame is the line segment intersection point contained in the target top point set;

the noise point screening subunit is used for acquiring a candidate edge straight line corresponding to the candidate frame, screening noise points in edge pixel points covered by the candidate edge straight line, and updating the candidate edge straight line according to the screened edge pixel points to obtain an updated candidate edge straight line;

and the target vertex determining subunit is used for determining the straight line intersection points between the updated candidate edge straight lines as the target vertices associated with the target object.

Wherein, the device still includes:

the image segmentation module is used for segmenting the source image according to the edge shape of the object and determining pixel points covered by the edge shape of the object as a target image containing a target object;

and the image recognition module is used for carrying out image recognition processing on the target object image to obtain an image recognition result aiming at the target object.

Wherein, the device still includes:

the system comprises a sample acquisition module, a sample processing module and a data processing module, wherein the sample acquisition module is used for acquiring a sample data set, acquiring sample position characteristics of a sample image in the sample data set, and determining the sample position characteristics and the sample image as sample input information; the sample data set comprises sample images carrying labeling edge outlines;

the sample characteristic acquisition module is used for inputting sample input information into the initial image segmentation model and acquiring at least two sample content characteristics corresponding to the sample image in the initial image segmentation model;

the sample feature fusion module is used for acquiring sample weights corresponding to at least two sample content features respectively, performing feature fusion on the at least two sample content features according to the sample weights to obtain sample contour features corresponding to the sample image, and generating a sample contour mask image corresponding to the sample image according to the sample contour features;

and the network parameter correction module is used for correcting the network parameters of the initial image segmentation model according to the sample contour mask image and the labeled edge contour corresponding to the sample image, and determining the initial image segmentation model containing the corrected network parameters as the image segmentation model.

Wherein, the sample acquisition module includes:

a first sample acquiring unit, configured to acquire an initial sample data set in which an initial sample image X is acquired_pAnd an initial sample image X_q(ii) a p and q are positive integers less than or equal to the number of initial sample images contained in the initial sample data set, and p and q are not equal;

a deformation processing unit for obtaining the initial sampleThis image X_pObtaining an initial sample image X_qThe foreground object image is subjected to deformation processing to obtain a deformed foreground object image;

and the image combination unit is used for combining the deformed foreground object image and the background image to obtain an extended sample image, and adding the extended sample image to the initial sample data set to obtain a sample data set.

Wherein, the sample acquisition module includes:

a second sample obtaining unit, configured to obtain an initial sample data set, where an initial sample image X is obtained_p(ii) a p is a positive integer less than or equal to the number of initial sample images contained in the initial sample data set;

a background edge determination unit for acquiring an initial sample image X_pDetermining the initial sample image X according to the minimum abscissa value, the maximum abscissa value, the minimum ordinate value and the maximum ordinate value in the coordinate information of the corresponding labeling point of the labeling edge profile_pThe background edge of (1);

a background filling unit for filling the background edge to obtain an initial sample image X_pAnd adding the corresponding extended sample image to the initial sample data set to obtain a sample data set.

An aspect of the embodiments of the present application provides a computer device, including a memory and a processor, where the memory stores a computer program, and the computer program, when executed by the processor, causes the processor to execute the steps of the method in the aspect of the embodiments of the present application.

An aspect of the embodiments of the present application provides a computer-readable storage medium, in which a computer program is stored, the computer program comprising program instructions that, when executed by a processor, perform the steps of the method as in an aspect of the embodiments of the present application.

According to an aspect of the application, a computer program product or computer program is provided, comprising computer instructions, the computer instructions being stored in a computer readable storage medium. The computer instructions are read by a processor of a computer device from a computer-readable storage medium, and the computer instructions are executed by the processor to cause the computer device to perform the method provided in the various alternatives of the above aspect.

The method and the device can obtain the object contour characteristics corresponding to the source image, and generate the contour mask image containing the edge pixel points according to the object contour characteristics, wherein the source image comprises the target object, and the contour formed by the edge pixel points in the contour mask image is associated with the contour of the target object; the method comprises the steps of carrying out transformation processing on edge pixel points contained in an outline mask image to obtain M discontinuous initial line segments corresponding to the edge pixel points, enabling an outline formed by the M initial line segments to be associated with an outline of a target object, further determining a target vertex associated with the target object according to line segment intersection points among the M initial line segments, and determining an object edge shape used for representing the outline of the target object in a source image according to the target vertex. Therefore, according to the object contour features extracted from the source image, a contour mask image containing edge pixel points is generated, a target vertex associated with the target object is obtained from the source image according to the contour mask image, and an object edge shape is constructed according to the target vertex, so that the object edge shape is matched with the actual shape of the target object, and the detection accuracy of the target object in the source image is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic structural diagram of a network architecture according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a target object detection scene in an image according to an embodiment of the present disclosure;

fig. 3 is a schematic flowchart of an image data processing method according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a method for obtaining spatial location features according to an embodiment of the present disclosure;

fig. 5 is a schematic diagram of acquiring a contour mask image according to an embodiment of the present application;

FIG. 6 is a schematic diagram illustrating a method for determining a target vertex associated with a target object according to an embodiment of the present disclosure;

FIG. 7 is a flowchart illustrating an image segmentation model training process according to an embodiment of the present disclosure;

FIG. 8 is a schematic diagram illustrating an effect of data expansion according to an embodiment of the present application;

FIG. 9 is a schematic diagram illustrating an effect of data expansion according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of an image data processing apparatus according to an embodiment of the present application;

fig. 11 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

The scheme provided by the embodiment of the application relates to a Computer Vision technology (CV) belonging to the field of artificial intelligence. Computer vision is a science for researching how to make a machine "see", and further, it means that a camera and a computer are used to replace human eyes to perform machine vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. Computer vision technologies generally include image processing, target detection, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technologies, virtual reality, augmented reality, synchronous positioning, map construction, and other technologies, and also include common biometric technologies such as face recognition and fingerprint recognition. The embodiment of the application particularly relates to target detection in the computer vision technology.

Referring to fig. 1, fig. 1 is a schematic structural diagram of a network architecture according to an embodiment of the present disclosure. As shown in fig. 1, the network architecture may include a server 10d and a user terminal cluster, which may include one or more user terminals, where the number of user terminals is not limited. As shown in fig. 1, the user terminal cluster may specifically include a user terminal 10a, a user terminal 10b, a user terminal 10c, and the like. The server 10d may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, a middleware service, a domain name service, a security service, a CDN, a big data and artificial intelligence platform, and the like. The user terminal 10a, the user terminal 10b and the user terminal 10c may each include: the mobile terminal comprises an intelligent terminal with an image display function, such as a smart phone, a tablet computer, a notebook computer, a palm computer, a Mobile Internet Device (MID), a wearable device (such as a smart watch and a smart bracelet), and a smart television. As shown in fig. 1, the user terminal 10a, the user terminal 10b, the user terminal 10c, etc. may be respectively connected to the server 10d via a network, so that each user terminal may interact data with the server 10d via the network.

Taking the user terminal 10a shown in fig. 1 as an example, the user terminal 10a may acquire a source image (which may be understood as an image that needs to be subjected to target object detection), and transmit the source image to the server 10 d. The server 10d may obtain a source image transmitted by the user terminal 10a, extract an object contour feature in the source image through an image segmentation model, and generate a contour mask image including edge pixels according to the object image feature, where a contour formed by the edge pixels in the contour mask image is associated with a contour of a target object, a pixel value of the edge pixels in the contour mask image is 1 (a display color is white), and other pixels except the edge pixels in the contour mask image are all 0 (a display color is black); after the contour mask image is obtained according to the image segmentation model, optimization processing can be performed on a contour formed by edge pixel points in the contour mask image, for example, Hough (Hough) transformation can be performed on the contour mask image to obtain M (M is a positive integer) discontinuous initial line segments corresponding to the edge pixel points, a target vertex associated with a target object in a source image can be found through the M initial line segments, an object edge shape formed by the target vertex can be represented as a shape of the target object in the source image, the object edge shape is marked in the source image, the source image marked with the object edge shape is returned to the user terminal 10a, and the user terminal 10a can display the source image marked with the object edge shape to a user; among them, the target object may include but is not limited to: target pages, documents, certificates, faces, animals. By means of determining the target vertex, information such as the position and the shape of the target object contained in the source image is detected, the actual edge of the target object can be attached to the maximum extent, and therefore detection accuracy of the target object in the source image can be improved.

Referring to fig. 2, fig. 2 is a schematic view of a target object detection scene in an image according to an embodiment of the present disclosure. The process of detecting a target object in an image will be described by taking the server 10d in the embodiment corresponding to fig. 1 as an example. As shown in fig. 2, the server 10d obtains an image 20a that needs to perform target object detection, where the image 20a may be a picture downloaded by the server 10a in a network platform (e.g., a browser, social software, etc.), or a photo (or a video frame in a video) transmitted to the server 10d by the other devices (such as any one of the user terminals in the user terminal cluster shown in fig. 1 above), and the source of the image 20a is not limited in this application. If the server 10d needs to perform image recognition on the image 20a, it is first necessary to detect the target object included in the image 10d to obtain the edge contour shape of the target object and the position information of the target object in the image 20a, and then to segment the image 20a according to the edge contour shape and the position information to obtain the target image including the target object. For example, in an image text recognition scenario, a target object (e.g., a document, a page, etc.) including characters needs to be detected from an image, after the target object is detected, the target object may be segmented from the image, and then a subsequent image recognition process (e.g., image text recognition, etc.) may be performed on the segmented image including the target object.

As shown in fig. 2, the server 10a may obtain an image segmentation model 20c, where the image segmentation model 20c has been trained by a plurality of images carrying labeled edge contours, and the image segmentation model 20c may be used to extract image features in the image 20a and segment the edge contour shape of the target object from the image 20 a. The server 10d may input the image 20a into the image segmentation model 20c, and may perform convolution operation on the image 20a according to a plurality of convolution layers in the image segmentation model 20c (the plurality of convolution layers may be in a serial structure in the image segmentation model 20c, that is, the plurality of convolution layers have a sequential connection order, for example, the convolution layer 2 is connected behind the convolution layer 1, and an output feature of the convolution layer 1 may be used as input information of the convolution layer 2, and the like), and each convolution layer may output an image feature associated with the image 20 a. In the image segmentation model 20c, the image information expressed by the image features output by each convolutional layer is different, such as the image features output by the first convolutional layer (which may be referred to as low-layer features) in the image segmentation model 20c may capture detail information in the image 20a, and the features output by the last convolutional layer (which may be referred to as high-layer features) may capture global information in the image 20 a.

In order to obtain more comprehensive feature information for the image 20a, the server 10d may select, in the image segmentation model 20c, image features output by the a convolutional layers as image content features extracted in the image 20 a; when a is 4, the presentation server 10a may select, as the multi-scale feature 20d, an image feature output by 4 convolutional layers in the image segmentation model 20 c. It should be noted that, since the dimension of the information input by each convolution layer in the image segmentation model 20c and the receptive field (which may be referred to as a convolution kernel) of each convolution layer are different, the image features output by each convolution layer may have different scale information. In other words, the image features output by the 4 convolutional layers selected by the server 10d have different size information.

Further, the server 10d may perform feature fusion on the multi-scale features 20d to obtain the object contour features 20e corresponding to the image 20 a. Wherein, the feature fusion process of the multi-scale features 20d may refer to: the image features of 4 different scales included in the multi-scale feature 20d are added (for example, feature fusion is performed using add), the feature after the addition and fusion is used as a candidate image content feature, and further, according to an attention mechanism, a feature weight corresponding to the candidate image content feature is obtained, where the feature weight is used to characterize the importance of the image features of different scales in the image detection process, and the greater the feature weight is, the more important the image feature corresponding to the feature weight is. The server 10d may obtain the object contour feature 20e by multiplying the feature weight by the candidate image content feature. A contour mask image 20f corresponding to the image 20a can be generated according to the object contour feature 20 e. It is understood that, in the embodiment of the present application, in order to detect the target object included in the image 20a, theoretically, the feature weight corresponding to the edge feature of the target object in the image 20a should be greater than the feature weight corresponding to the remaining features (e.g., the background feature, the color feature, etc. in the image 20a) in the image 20 a. In the contour mask image 20f, the pixel value of the edge pixel point may be represented as 1, the display color in the contour mask image 20f is white, the pixel value of the non-edge pixel point (the remaining pixel points except the edge pixel point, the edge pixel point referred to in this application refers to the edge pixel point of the target object in the image 20a, but not to the edge pixel point of the image 20a itself) may be represented as 0, and the display color in the contour mask image 20f is black.

If the contour formed by the edge pixel points in the contour mask image 20f is directly determined as the edge contour of the target object in the image 20a, a large difference may exist between the edge contour determined by the contour mask image 20f and the actual contour of the target object in the image 20 a. Accordingly, the server 10d may perform subsequent processing based on the contour mask image 20f to obtain an object edge shape that more closely matches the target object in the image 20 a.

The server 10d may perform Hough transformation on the outline mask image 20f to obtain M discontinuous initial line segments (M is a positive integer) corresponding to the edge pixel points, where the M initial line segments are located in the edge pixel point region in the outline mask image 20f, that is, the M initial line segments may be located in a white region in the outline mask image 20f, such as the initial line segment 20 g. The Hough transformation is a method for detecting the boundary shape of a discontinuous point, and the fitting of a straight line and a curve is realized by transforming an image coordinate space to a parameter space, namely the Hough transformation can be used for detecting geometric figures such as a straight line or a circle in an image. The server 10d may further refine based on Hough transformation, combine the line segments with very close distances into one, and finally merge the discontinuous line segments into longer line segments to the maximum extent. In other words, the line segment distances between the M initial line segments may be calculated, and two initial line segments having line segment distances smaller than the distance threshold may be merged into one candidate line segment. For example, when the line segment distance between the initial line segment 1 and the initial line segment 2 is smaller than the distance threshold, the initial line segment 1 and the initial line segment 2 may be merged into a candidate line segment, and then the target vertices associated with the target object in the image 20a, such as vertex a, vertex B, vertex C, and vertex D in the contour mask image 20f, may be determined according to the line segment intersection points between the candidate line segments; the server 10D may construct an object edge shape with the vertex a, the vertex B, the vertex C, and the vertex D as 4 vertices, and determine an object edge shape included in the image 20a, such as the object edge shape 20B in the image 20a shown in fig. 2, from the object edge shape.

In this embodiment, the initial edge shape of the target object in the image 20a, that is, the contour mask image 20f, may be obtained through the image segmentation model 20c, and then edge pixel points included in the contour mask image 20f are transformed, so as to obtain the optimized object edge surname zhuyin 20b, which may improve the detection accuracy of the target object in the image 20 a.

Referring to fig. 3, fig. 3 is a schematic flowchart of an image data processing method according to an embodiment of the present disclosure. It is understood that the image data processing method can be executed by a computer device, which can be a user terminal, or a server, or a system composed of the user terminal and the server, or a computer program application (including program code) in the computer device, and is not limited in detail herein. As shown in fig. 3, the image data processing method may include the steps of:

step S101, acquiring object contour characteristics corresponding to a source image, and generating a contour mask image containing edge pixel points according to the object contour characteristics; the source image comprises a target object, and the outline formed by edge pixel points in the outline mask image is associated with the outline of the target object.

Specifically, a computer device (such as the server 10d in the embodiment corresponding to fig. 2) may obtain a source image (such as the image 20a in the embodiment corresponding to fig. 2), where the source image may be a photograph including a target object captured by an image capturing device, where the image capturing device may be an independent video camera, a camera, and a user terminal (e.g., a smart phone, a tablet computer, etc.) carrying the camera.

In order to obtain the spatial position relationship of each pixel point in the source image, the computer device may obtain a spatial position feature corresponding to the source image, and determine the spatial position feature and the source image as image input information, where the spatial position feature may refer to an abscissa value and an ordinate value of each pixel point in the source image in an image coordinate system. Further, the computer device may perform a normalization process on the source image, and the formula of the normalization process may be expressed as: normalized image X ═ X-u)/a,

wherein, x may be three-channel pixel values of RGB (a color standard, R represents red, G represents green, and B represents blue) of the source image, u may be an average value of the three-channel pixels of the source image, a may be an adjustment variance of the three-channel pixels of the source image, c may be a standard deviation of the three-channel pixels of the source image, and N may be a total number of the three-channel pixels of the source image. The computer equipment can obtain the space coordinate information of the pixel points in the standardized image in the picture coordinate system, and the coordinate value of each pixel point in the standardized image is standardized to [ -1,1]To (c) to (d); and then, according to an abscissa value in the spatial coordinate information corresponding to each pixel point, a first position feature (also referred to as abscissa information) corresponding to the pixel point in the source image is generated, and according to an ordinate value in the spatial coordinate corresponding to each pixel point, a second position feature (also referred to as ordinate information) corresponding to the pixel point in the source image is generated. The computer device may characterize the first location and the second locationAnd determining the spatial position characteristics corresponding to the source images, and further splicing the spatial position characteristics and the source images to obtain image input information. When the size of three-channel pixels of the source image is represented as C × H × W, the image input information may be represented as (C +2) × H × W, where C may be represented as the number of channels of the source image (when the source image is represented by RGB three channels, C is equal to 3), H may be represented as the height of the source image, and W may be represented as the width of the source image.

Referring to fig. 4, fig. 4 is a schematic diagram of obtaining spatial location characteristics according to an embodiment of the present disclosure. As shown in fig. 4, the computer device obtains a source image 30a, the size of the source image 30a is 8 × 8, standardizes the source image 30a to obtain a standardized image, the size of the standardized image is the same as that of the source image 30a, and the space coordinates corresponding to each pixel point in the obtained standardized image are shown as space coordinate information 30 b. The computer device can obtain abscissa information 30c (i.e., a first position characteristic) corresponding to the source image 30a according to the abscissa value corresponding to each pixel point in the spatial coordinate information 30 b; and obtaining the ordinate information 30d (i.e. the second position characteristic) corresponding to the source image 30a according to the ordinate value corresponding to each pixel point in the spatial coordinate information 30 b. The abscissa information 30c and the ordinate information 30d both have a size of 8 × 8, and the coordinate values included in the abscissa information 30c and the ordinate information 30d are between [ -1,1 ]. The computer device may stitch the abscissa information 30c, the ordinate information 30d, and the source image 30a (e.g., feature connection is performed using concat, and the number of channels of the features may be increased), so as to obtain the image input information.

Further, the computer device may obtain an image segmentation model (such as the image segmentation model 20c in the embodiment corresponding to fig. 2 described above) and input the image input information into the image segmentation model. The image segmentation model can be used for extracting image features and segmenting target objects contained in the image according to the image features, and the network for extracting the features can include but is not limited to: FCN (Fully connected Convolutional Networks) Networks, SegNet Networks (a deep Convolutional network for image segmentation), VGGNet Networks (a deep Convolutional network), ResNet Networks (a deep Convolutional network), Mask R-CNN (a network model for image segmentation), mobilent _ v3_ small Networks (a lightweight network model for image segmentation). For convenience of description, the embodiments of the present application all use a mobile _ v3_ small network as an example of an infrastructure to describe a feature extraction and an image segmentation process of a source image. In other words, the image segmentation model in the embodiment of the present application may be a network model using a mobilent _ v3_ small network as an infrastructure (that is, the image segmentation model may include a mobilent _ v3_ small network), and the mobilent _ v3_ small network may include N convolutional layers (N is a positive integer, for example, N is equal to 15) and a pooling layer, and the network structure thereof may be obtained by a network search technique, and has the characteristics of less model parameters and low requirement on computing resources, and the mobilent _ v3_ small network has stronger representation capability and faster speed.

The computer device may perform convolution processing on the image input information according to the N convolutional layers in the image segmentation model to obtain image features corresponding to the N convolutional layers, respectively, where the image features corresponding to the N convolutional layers in the embodiment of the present application have different size information. Each convolution layer may correspond to one or more convolution kernels (kernel, which may also be referred to as a filter, or referred to as a field), the convolution processing may refer to matrix multiplication of the convolution kernels and an input matrix corresponding to the image input information, and the number of rows H of the feature of the output image after the convolution processing_outSum column number W_outIs determined by the size of the input matrix, the size of the convolution kernel, the step size (stride), and the boundary padding (padding), i.e., H_out＝(H_in-H_kernel+2*padding)/stride+1，W_out＝(W_in-W_kernel+2*padding)/stride+1。H_in，H_kernelRespectively showing the line number of the book image characteristic and the line number of the convolution kernel; w_in，W_kernelRepresenting the number of columns of the input matrix and the number of columns of the convolution kernel, respectively. A pooling layer can be embedded in the N convolutional layers, wherein the pooling layer can be an average pooling layer, and the average pooling operation method is output by a convolutional layer which is previous to the pooling layerAn average value is calculated in each row (or column) of the image feature to represent the row (or column). The computer device may select at least two image features from the image features corresponding to the N convolutional layers, respectively, as at least two image content features corresponding to the source image. For example, 4 layers of image features with different scales may be selected from the image features corresponding to the N convolutional layers, and the sizes of the 4 layers of image features may be represented as: 16*64*64, 24*32*32, 48*16*16, 576*8*8.

It is understood that before feature extraction of the image input information using the image segmentation model, at least two convolutional layers may be pre-selected from the N convolutional layers as convolutional layers for outputting the features of the image content. For example, the image segmentation model includes 15 convolutional layers, which may be respectively represented as convolutional layer C1, convolutional layer C2, convolutional layers C3, … …, and convolutional layer C15, where convolutional layer C1 may be the first convolutional layer in the image segmentation model, and convolutional layer C15 may be the last convolutional layer in the image segmentation model. The computer device may pre-select convolutional layer C7, convolutional layer C9, convolutional layer C12, and convolutional layer C15 as convolutional layers for output image content features. The computer device inputs image input information into the image segmentation image, the image input information is firstly input into the convolutional layer C1 of the image segmentation model, after convolution operation is carried out on the image input information through the convolutional layer C1, the image characteristics corresponding to the convolutional layer C1 can be output, the image characteristics corresponding to the convolutional layer C1 are used as the input information of the convolutional layer C2, and the like, the image characteristics corresponding to the convolutional layer C2 can be used as the input information of the convolutional layer C3, namely, the image characteristics output by the former convolutional layer are used as the input information of the latter convolutional layer. After the convolutional layer C7 outputs the image features, the computer device can obtain the image features output by the convolutional layer C7, and at the same time, input the image features output by the convolutional layer C7 to the convolutional layer C8; similarly, after performing convolution operation on the image input information by the 15 convolutional layers in the image segmentation model, the image features corresponding to each of the convolutional layer C7, the convolutional layer C9, the convolutional layer C12, and the convolutional layer C15 can be acquired as the image content features.

Further, the computer device can perform feature fusion on at least two image content features to obtain object contour features for representing the source image. Wherein, the feature fusion process of at least two image content features may include: the computer equipment can add the at least two image content characteristics to obtain candidate image content characteristics corresponding to the source image, and further can perform global pooling on the candidate image content characteristics to obtain global description vectors corresponding to the candidate image content characteristics; the computer equipment can convert the global description vector into the feature weight corresponding to the candidate image content feature according to the full connection layer and the activation layer, and further can perform product processing on the candidate image content feature and the feature weight to generate the contour mask image corresponding to the source image. In other words, the computer device may learn, based on the attention mechanism, a correlation between at least two image content features, process the candidate image content features to obtain one-dimensional vector (i.e., feature weight) with the same number of channels as the candidate image content features, and use the one-dimensional vector as a weight of each channel in the candidate image content features, and perform a multiplication operation on each channel in the candidate image content features and the corresponding weight to obtain an object contour feature of the source image.

The SE module may include a global pooling layer, a first fully-connected layer, a ReLU activation function, a second fully-connected layer, and a Sigmoid activation function. The computer device may perform global average pooling (global average pooling) operation on the candidate image content features according to the global pooling layer to obtain a global description vector corresponding to the candidate image content features, where the global description vector may be understood as having a global receptive field. If the size of the candidate image content feature is represented as C1 × H1 × W1, after performing global average pooling operation on the candidate image content feature, the obtained global description vector may be represented as 1 × 1 × C1, that is, each value in the global description vector may correspond to one channel feature in the candidate image content feature; and then can pass through two full connection layers, ReLU activation letterAnd carrying out nonlinear transformation on the global description vector by the number and Sigmoid activation function to obtain the feature weight corresponding to the candidate image content feature. The fully-connected layer (which may include a first fully-connected layer and a second fully-connected layer) may well fuse at least two image content features (i.e., candidate image content features), and the Sigmoid activation function may map the input information to an interval of 0-1. The way of calculating the feature weight can be expressed as: s ═ σ (W)₂δ(W₁z)), where s represents a feature weight, σ represents a Sigmoid activation function, and W₁Denotes the first fully-connected layer, W₂Representing the second fully connected layer, δ may be represented as the ReLU activation function, and z represents the global description vector. The computer device can perform multiplication operation on the feature weight and the candidate image content feature to obtain a contour mask image corresponding to the source image.

Referring to fig. 5, fig. 5 is a schematic diagram of obtaining an outline mask image according to an embodiment of the present disclosure. As shown in fig. 5, the computer device may obtain a source image 40a, obtain spatial coordinate information corresponding to each pixel point after performing standardization processing on the source image 40a, form an abscissa value in the spatial coordinate information corresponding to each pixel point into a first position feature, form an ordinate value in the spatial coordinate information corresponding to each pixel point into a second position feature, and splice the first position feature, the second position feature, and the source image 40a to obtain image input information. Subsequently, the computer device may acquire an image segmentation model 40b, which image segmentation model 40b may include N convolutional layers, one pooling layer, and an attention mechanism. Inputting the source image 40a into the image segmentation model 40b, and obtaining image features corresponding to N convolutional layers respectively by using N convolutional layers in the image segmentation model 40b, such as an image feature 40a output by a first convolutional layer, an image feature 40d output by a second convolutional layer, an image feature 40e, … … output by a third convolutional layer, an image feature 40f output by an N-1 th convolutional layer, and an image feature 40g output by an N-th convolutional layer; the computer device may select image features output by the four convolutional layers as four image features 40h of different scales of the source image 40a, and add the four image features 40h of different scales to obtain candidate image content features 40 i.

In order to improve the detection accuracy of the target object in the source image 40a, global average pooling processing may be performed on the candidate image content features 40i to obtain global description vectors 40j corresponding to the candidate image content features 40i, then nonlinear transformation may be performed on the global description vectors 40j according to two full connection layers, a ReLU activation function and a sigmmod activation function to obtain feature weights 40k corresponding to the candidate image content features 40i, product processing may be performed on the candidate image content features 40i and the feature weights 40k to obtain object contour features 40m for the target object in the source image 40a, a contour mask image 40n may be generated according to the object contour features 40m, and a contour formed by edge pixel points in the contour mask image 40n is associated with the target object in the source image 40 a.

Step S102, carrying out transformation processing on edge pixel points contained in the contour mask image to obtain M discontinuous initial line segments corresponding to the edge pixel points; the contour formed by the M initial line segments is associated with the contour of the target object, and M is a positive integer.

Specifically, because noise points (which may also be referred to as noise pixel points) may exist in the source image, according to edge pixel points in the contour mask image, the determined edge shape of the target object has a large difference from the actual edge of the target object in the source image. Therefore, the computer device can transform the edge pixel points included in the contour mask image to obtain M (M is a positive integer) discontinuous initial line segments corresponding to the edge pixel points, and the contour formed by the M initial line segments is associated with the contour of the target object.

Taking Hough transform as an example, the process of transforming the contour mask image may include: the computer device may be provided with two coordinate systems (which may include a rectangular coordinate system and a polar coordinate system), the direct coordinate system may refer to a coordinate system formed by an origin, an abscissa X, and an ordinate Y in a plane, and the polar coordinate system may refer to a coordinate system formed by a pole O, a polar axis L, and a polar diameter r in a plane. It is to be understood that a straight line may be represented as y ═ mx + b, in the contour mask image, there may be a plurality of pixel points in the straight line y ═ mx + b, the rectangular coordinate system may represent (x, y) values in the straight line y ═ mx + b, and the polar coordinate system may represent (m, b) values in the straight line y ═ mx + b, i.e., parameter values of the straight line, i.e., one point in the polar coordinate system may be represented as a straight line in the rectangular coordinate system.

For any edge pixel m in the contour mask image (the original image space of the contour mask image corresponds to the rectangular coordinate system), all straight lines passing through the edge pixel m can be found, the straight lines can be represented as one point in the polar coordinate system, a plurality of straight lines corresponding to the edge pixel m correspond to the points in the polar coordinate system, and a straight line in the polar coordinate system can be formed. Repeating the above process can determine the straight line corresponding to each edge pixel point in the contour mask image, and based on the straight line corresponding to each edge pixel point, M discontinuous initial line segments can be obtained, where M at this time can be expressed as a positive integer less than or equal to the number of edge pixel points included in the contour mask image.

And S103, determining a target vertex associated with the target object according to the line segment intersection points among the M initial line segments, and determining an object edge shape for representing the outline of the target object in the source image according to the target vertex.

Specifically, the computer device may obtain a line segment intersection point between any two of the M initial line segments, and determine a target vertex associated with the target object in the source image from the line segment intersection points between the M initial line segments, where the target vertex may be regarded as a vertex of an edge shape of the target object in the source image, that is, an object edge shape used for representing an outline of the target object may be determined in the source image according to the target vertex.

Optionally, the computer device may further refine the M initial line segments, that is, obtain line segment distances between the M initial line segments, and merge the M initial line segments according to the line segment distances to obtain K candidate line segments, where K is a positive integer smaller than M. The merging process of the initial line segment can be represented as follows: the computer equipment can randomly acquire the initial line segment from the M initial line segmentsL_iAnd an initial line segment L_jAnd further an initial line segment L can be obtained_iAnd an initial line segment L_jRespectively corresponding endpoint coordinate information (the endpoint coordinate information at this time can be understood as coordinate values in a rectangular coordinate system), wherein i and j are positive integers smaller than or equal to M, and i and j are unequal; the computer device may determine the initial line segment L from the endpoint coordinate information_iAnd an initial line segment L_jLine segment distance S between_ijWhen the line segment distance S_ijWhen the distance is less than a distance threshold (the distance threshold may be a preset threshold), the initial line segment L may be set_iAnd an initial line segment L_jAnd merging the line segments into candidate line segments. For example, the initial line segment L_iCan be represented as a left end point E and a right end point F, an initial line segment L_jThe two endpoints of (a) may be represented as a left endpoint G and a right endpoint H, the coordinate information of the left endpoint E is (x1, y1), the coordinate information of the left endpoint E is (x2, y2), the coordinate information of the left endpoint E is (x3, y3), and the coordinate information of the left endpoint E is (x4, y 4); initial line segment L_iAnd an initial line segment L_jLine segment distance S between_ijThe calculation of (d) can be expressed as: s_ij＝(x1-x3)²+(y1-y3)²+(x2-x4)²+(y2-y4)²At S_ijWhen less than the distance threshold, the initial line segments L may be merged_iAnd an initial line segment L_jAnd forming a candidate line segment. The computer device may perform merging processing on the M initial line segments according to the above manner, to obtain K candidate line segments. It should be noted that, in the process of merging the initial line segments, the candidate line segments obtained by merging the two initial line segments may also be merged with the initial line segments until the line segment distance between any two candidate line segments in the K candidate line segments is greater than or equal to the distance threshold.

Further, the computer device may obtain line segment intersections between any two candidate line segments of the K candidate line segments, combine the line segment intersections to obtain at least two intersection groups, and may determine the intersection groups satisfying the target sorting order as S candidate vertex sets, where the number of line segment intersections included in each intersection group is the same, and S is a positive integer less than or equal to the number of the at least two intersection groups. In other words, the computer device may traverse the K candidate line segments, find the line segment intersection point between each two candidate line segments, combine the line segment intersection points between the K candidate line segments according to the preset number, and obtain at least two intersection point groups, where the number of line segment intersection points included in each intersection point group is the preset number, and if the preset number is set to 4, or the preset number is set to 8 and other values; the computer device may find a set of intersections satisfying the target ranking order from the at least two sets of intersections, and may obtain S sets of candidate vertices. The target sorting order may refer to a clockwise sorting, that is, the intersection group satisfying the clockwise sorting may be referred to as a candidate vertex set.

For example, assuming that K is 5, the K candidate line segments may be represented as candidate line segment 1, candidate line segment 2, candidate line segment 3, candidate line segment 4, and candidate line segment 5. The computer device may obtain a line segment intersection 1 between the candidate line segment 1 and the candidate line segment 2, a line segment intersection 2 between the candidate line segment 1 and the candidate line segment 3, a line segment intersection 3 between the candidate line segment 1 and the candidate line segment 4, a line segment intersection 4 between the candidate line segment 1 and the candidate line segment 5, a line segment intersection 5 between the candidate line segment 2 and the candidate line segment 3, a line segment intersection 6 between the candidate line segment 2 and the candidate line segment 4, a line segment intersection 7 between the candidate line segment 2 and the candidate line segment 5, a line segment intersection 8 between the candidate line segment 3 and the candidate line segment 4, a line segment intersection 9 between the candidate line segment 3 and the candidate line segment 5, and a line segment intersection 10 between the candidate line segment 4 and the candidate line segment 5. Of course, if two candidate line segments are parallel, there is no line segment intersection between the two candidate line segments. The computer device may combine the 10 line segment intersections (i.e., line segment intersection 1 to line segment intersection 10) to obtain a plurality of intersection groups including 4 line segment intersections (the number of line segment intersections included in each intersection group is 4 by default), and the same line segment intersection may exist in different intersection groups; when the line segment intersections included in a certain intersection group satisfy the clockwise ordering, the intersection group may be determined as a candidate vertex set, and for example, the coordinate information of the 4 line segment intersections included in the intersection group a is: (x1, y1), (x1, y2), (x1, y3), and (x2, y4), it may be determined that the 4 line segment intersections contained in the intersection group a, which cannot be a candidate vertex set of the target object edge contour, do not satisfy the clockwise ordering; the coordinate information of the 4 line segment intersections included in the intersection group B is: (x1, y1), (x1, y2), (x2, y2), and (x2, y1), it may be determined that the 4 line segment intersections included in the intersection group B, which may be the set of candidate vertices of the target object edge shape, satisfy the clockwise ordering.

Optionally, when S is 1, the number of candidate vertex sets is 1, and a candidate box may be constructed according to line segment intersections included in the candidate vertex sets, where the candidate box may be regarded as an edge shape of a target object in the source image. When S is a numerical value larger than 1, the number of the candidate vertex sets is more than one, and S candidate frames can be constructed according to the S candidate vertex sets, so that the computer equipment can select the optimal candidate frame from the S candidate frames to serve as the most probable edge shape of the target object in the source image. Specifically, the computer device may obtain Intersection-over-unity (IoU) indexes corresponding to the S candidate vertex sets, respectively, and further may determine a line segment Intersection point in the candidate vertex set corresponding to the largest Intersection-over-unity index as a target vertex associated with the target object, construct an object edge shape in the source image according to the target vertex, where the object edge shape at this time may be an edge shape of the target object in the source image. The IoU index may refer to an overlapping rate (also referred to as an overlapping degree) of the candidate frame and the marked frame, that is, an intersection of the candidate frame and the marked frame and a union of the candidate frame and the marked frame may be obtained, a ratio of the intersection to the union may be referred to as a IoU index, and a higher value of the IoU index indicates that the effect of the candidate frame is better. The computer device may sort the S candidate vertex sets according to IoU indexes, further select a line segment intersection point in the candidate vertex set corresponding to the largest IoU index, and determine the line segment intersection point as a target vertex associated with the target object, where an object edge shape formed by the target vertex may be determined as a contour edge of the target object in the source image.

For example, the S candidate vertex sets include candidate vertex set 1, candidate vertex set 2, candidate vertex set 3, and candidate vertex set 4, and the IoU index corresponding to candidate vertex set 1 is calculated to be 0.4, the IoU index corresponding to candidate vertex set 2 is calculated to be 0.5, the IoU index corresponding to candidate vertex set 3 is calculated to be 0.7, and the IoU index corresponding to candidate vertex set 4 is calculated to be 0.92; the computer device may determine line segment intersections contained in the set of candidate vertices 4 as target vertices associated with the target object, and construct an object silhouette shape from the line segment intersections contained in the set of candidate vertices 4, which may be understood as an edge shape of the target object in the source image.

Optionally, after the computer device calculates IoU indexes corresponding to the S candidate vertex sets respectively, the computer device may use a candidate vertex set corresponding to the largest IoU index as a target vertex set, and obtain a candidate box constructed by the target vertex set, where a vertex of the candidate box is a line segment intersection included in the target vertex set; the computer equipment can obtain candidate edge straight lines corresponding to the candidate frames, filter noise points in edge pixel points covered by the candidate edge straight lines, and update the candidate edge straight lines according to the filtered edge pixel points to obtain updated candidate edge straight lines; and determining the line intersection points among the updated candidate edge lines as target vertexes associated with the target object, and further constructing the object edge shape of the target object in the source image according to the target vertexes. In other words, to improve the quality of the candidate edge shapes, after constructing the candidate box based on the line segment intersections in the target vertex set, the computer device may further optimize each bounding box in the candidate box to obtain object edge shapes that are closer to the target object.

The following describes an optimization process of a candidate box constructed by the target vertex set, by taking a Random Sample Consensus (RANSAC) as an example. The inputs to the RANSAC algorithm are a set of pixels (often containing large noisy pixels), a parameterized model for interpreting the pixels, and some trusted parameters, and the RANSAC algorithm can achieve its goal by repeatedly selecting a set of random subsets of pixels. The edge pixel points corresponding to each candidate edge straight line in the candidate frame are used as input data of a RANSAC algorithm, the computer equipment can randomly select a group of random subsets from the edge pixel points corresponding to the candidate edge straight lines, the selected subsets are assumed to be local points, and the selected subsets are verified: assuming that a model A is suitable for the selected subset, testing other edge pixel points except the selected subset in input data by using the model A, if the other edge pixel points t are suitable for the model A, classifying the other edge pixel points t as local interior points, if the edge pixel points meeting the quantity threshold are classified as local interior points, re-estimating the model A by using new local interior points, estimating the error rate of the local interior points and the model A to evaluate the model, repeatedly executing the process, and taking the model A with the error rate meeting the error threshold as an updated candidate edge straight line. And the computer equipment can use the intersection points among the updated candidate edge straight lines as target intersection points, and construct object edge shapes corresponding to the target objects in the source images according to the target intersection points. Interference of noise pixel points in the source image can be eliminated by using the RANSAC algorithm, and then the detection accuracy of the target object in the source image can be improved.

Referring to fig. 6, fig. 6 is a schematic diagram illustrating a method for determining a target vertex associated with a target object according to an embodiment of the present disclosure. As shown in fig. 6, after the computer device obtains the outline mask image 50a through the image segmentation model, Hough transformation may be performed on edge pixel points in the outline mask image 50a, and M initial line segments corresponding to the edge pixel points in the outline mask image 50a, such as the initial line segment 50b, the initial line segment 50c, the initial line segment 50d, the initial line segment 50e, and the like shown in fig. 6, are obtained by setting a rectangular coordinate system and a polar coordinate system. Subsequently, the computer device may calculate line segment distances between the M initial line segments, and when the line segment distance between two initial line segments is smaller than a distance threshold, may merge the two initial line segments to obtain a candidate line segment; for example, if the distance 1 between the initial line segment 50b and the initial line segment 50c calculated by the computer device is smaller than the distance threshold, the initial line segment 50b and the initial line segment 50c may be merged to obtain a candidate line segment 1; if the distance 2 between the initial line segment 50c and the candidate line segment 1 is smaller than the distance threshold value, the initial line segment 50c and the candidate line segment 1 can be merged to obtain the candidate line segment 2; if the distance 3 between the initial line segment 50e and the candidate line segment 2 is smaller than the distance threshold, the initial line segment 50e and the candidate line segment 2 may be merged to obtain a candidate line segment 50f, and K candidate line segments may be obtained according to the above calculation method.

The computer equipment can obtain segment intersections between any two candidate segments in the K candidate segments, and combines the segment intersections between the K candidate segments to obtain four segment intersections meeting the clockwise ordering as a candidate vertex set, wherein the number of the candidate vertex set can be S; the computer device may obtain IoU indexes corresponding to the S candidate vertex sets, and use the candidate vertex set corresponding to the largest IoU index as a target vertex set, where the target vertex set may include a line segment intersection a, a line segment intersection B, a line segment intersection C, and a line segment intersection D.

Further, the computer device may construct a candidate frame by the line segment intersection point a, the line segment intersection point B, the line segment intersection point C, and the line segment intersection point D, where the candidate frame may be composed of a straight line AB, a straight line AC, a straight line BD, and a straight line CD, and perform optimization processing on the four straight lines constituting the candidate frame, for example, update the straight line CD to a straight line CE, and further may determine an intersection point between the straight line AB, the straight line AC, the straight line BD, and the straight line CE as a target vertex (i.e., the line segment intersection point a, the line segment intersection point B, the line segment intersection point C, and the intersection point E) associated with the target object in the source image, and determine a quadrangle composed of the line segment intersection point a, the line segment intersection point B.

Optionally, the computer device may perform segmentation processing on the source image according to the edge shape of the object, determine pixel points covered by the edge shape of the object as a target image containing the target object, and then perform image recognition processing on the target image to obtain an image recognition result for the target object. For example, when the target object is a face, the target image at this time may also be referred to as a face image, and the computer device may use a face recognition technology to recognize the face image, so as to obtain a face recognition result corresponding to the face image (at this time, the object edge shape for the face may be an image formed by a curve). When the target object is a page in the image, the computer device may perform image text recognition on the target image by using a text recognition technology, and obtain a text recognition result corresponding to the target image (for example, the text recognition result is "i am a chinese person"). In the embodiment of the application, a target image containing a target object is obtained by dividing the source image according to the edge shape of the object (the target image can be understood as an image obtained by background cutting of the source image), and the target image is identified to obtain an identification result corresponding to the target image, so that the identification accuracy of the image can be improved.

Optionally, the computer device may further use different image segmentation models to detect the source image, and further improve the detection effect of the target object in the source image in a majority voting manner.

In the embodiment of the application, image content features of different scales in a source image can be extracted through an image segmentation model, feature fusion is carried out on the image content features of different scales through an attention mechanism, object contour features corresponding to the source image are obtained, a contour mask image containing edge pixel points can be generated based on the object contour features, then the contour mask image can be transformed, M initial line segments are obtained, a target vertex associated with a target object is determined in the source image according to line segment intersection points among the M initial line segments, an object edge shape is constructed according to the target vertex, the object edge shape is matched with the actual shape of the target object, and then the detection accuracy of the target object in the source image is improved; and carrying out image segmentation on the source image according to the edge shape of the object to obtain a target image containing the target object, and carrying out image recognition processing on the target image, so that the recognition accuracy of the target image can be improved.

Referring to fig. 7, fig. 7 is a flowchart illustrating an image segmentation model training process according to an embodiment of the present disclosure. It is to be understood that the image segmentation model training process may be executed by a computer device, which may be a user terminal, a server, or a system composed of the user terminal and the server, and is not limited herein. As shown in fig. 7, the image segmentation model training process may include the following steps:

step S201, acquiring a sample data set, acquiring sample position characteristics of a sample image in the sample data set, and determining the sample position characteristics and the sample image as sample input information; the sample data set comprises sample images carrying labeled edge contours.

Before feature extraction is performed on a source image by using an image segmentation model and a contour mask image is generated, the image segmentation model needs to be trained, and a training process of the image segmentation model will be specifically described below.

The computer device may obtain an initial sample data set, where initial sample images in the initial sample data set are all acquired images including a target object, and each initial sample image is labeled with an edge contour of the target object, where the edge contour may be labeled with vertices, for example, when the initial sample image is an image including a page, a document, a certificate, and other objects, the edge contour of the sample image may be four vertices. Because the initial sample data acquired by the computer device is very limited, the too small amount of the initial sample data may cause overfitting of the trained image segmentation model, and the generalization of the image segmentation model is too low, the computer device may perform a data expansion operation on the initial sample image contained in the initial sample data set to enrich the data volume of the sample data.

The computer device may use the initial sample image and the extended sample image as training data of the image segmentation model, before training, the computer device may perform initialization processing on the image segmentation model, where the image segmentation model may also be referred to as an initial image segmentation model, and during training of the initial image segmentation model, batch training may be performed, that is, training data is input into the initial image segmentation model in batches to train network parameters of the initial image segmentation model. The computer device can preset the Batch Size (Batch _ Size), the iteration number (iteration) and the training number (epoch) in the training process; for example, when the Batch Size Batch _ Size is 50, the iteration time iteration is 500, and the training time epoch is 10, it indicates that, in the training process of the initial image segmentation model, 50 training data may be input each time to adjust the network parameters of the initial image segmentation model, the network parameters may be updated once per iteration, the epoch 5 may refer to the training time of all the training data, where one iteration may refer to performing one forward calculation and one backward calculation on the initial image segmentation model.

The data expansion operation of the initial sample image may include a foreground and background combination operation, a smooth filling operation, and the like. The specific process of performing the foreground and background combination operation on the initial sample image may include: the computer device may obtain an initial sample image X in an initial sample data set_pAnd an initial sample image X_qWherein, p and q are both positive integers less than or equal to the number of initial sample images contained in the initial sample data set, and p and q are not equal; further, an initial sample image X can be obtained_pAnd an initial sample image X_qThe foreground object image may be subjected to deformation processing (e.g., perspective transformation, rotation operation, etc.) to obtain a deformed foreground object image; the computer equipment can combine the foreground object image and the background image after deformation to obtain an extended sample image, and the extended sample image is added to the initial sample data set to obtain a sample data set. In other words, the computer device may combine the foreground images and the background images of different initial sample images in the initial sample data set, may further perform deformation processing (including perspective transformation, rotation operation, and the like) on the foreground images of the initial sample images to obtain foreground images at different angles, and may combine the foreground images at different angles with the background images to obtain a plurality of extended sample images.

Please refer to fig. 8, fig. 8 is a schematic diagram illustrating a data expansion effect according to an embodiment of the present application. As shown in fig. 8, the computer device may obtain an initial sample image 60a and an initial sample image 60b from the initial sample data set, split the initial sample image 60a into a foreground object image 60c and a background image 60d, split the initial sample image 60b into a foreground object image 60e and a background image 60f, further combine the foreground object image 60c and the background image 60f into an extended sample image 60h, and combine the foreground object image 60e and the background image 60d into an extended sample image 60g, where the initial sample image 60a, the initial sample image 60b, the extended sample image 60h, and the extended sample image 60g may all be used as training data of the initial image segmentation model.

Optionally, the specific process of performing the smooth filling operation on the initial sample image may include: the computer device may obtain an initial sample image X in an initial sample data set_p(ii) a p is a positive integer less than or equal to the number of initial sample images contained in the initial sample data set; further, an initial sample image X can be obtained_pThe coordinate information of the corresponding labeling point labeling the edge profile can determine the X value of the initial sample image according to the minimum abscissa value, the maximum abscissa value, the minimum ordinate value and the maximum ordinate value in the coordinate information of the labeling point_pThe background edge of (1); the computer equipment can fill the background edge to obtain an initial sample image X_pAnd adding the corresponding extended sample image to the initial sample data set to obtain a sample data set. In other words, the computer device may use a background iterative flipping manner to fill the initial sample image to achieve smooth filling, so that no additional strong edge is introduced, and for any initial sample image X in the initial sample data set_pAssuming the initial sample image X_pThe dimension of (a) is W × H, the labeled edge profile carried by the labeled edge profile may include 4 labeled point coordinate information, and the 4 labeled point coordinate information may be expressed as: (x)₁,y₁)，(x₂,y₂)，(x₃,y₃)，(x₄,y₄)。

The computer equipment can obtain the minimum value of the horizontal and vertical coordinates in the coordinate information of the 4 marking pointsAnd maximum value, which can be noted as (x)_min，y_min，x_max，y_max) Wherein x is_min，x_maxCan be respectively expressed as a minimum abscissa value and a maximum abscissa value, y, in the coordinate information of the 4 marking points_min，y_maxCan be respectively expressed as a minimum ordinate value and a maximum ordinate value in the coordinate information of the 4 marking points. The computer equipment can find out the initial sample image X according to the minimum abscissa value, the maximum abscissa value, the minimum ordinate value and the maximum ordinate value_pOf (4) is provided. For example, for initial sample image X_pThe coordinates (x, y) of each pixel point in the right edge region should satisfy: x is the number of_max<x is less than or equal to W; for initial sample image X_pIn the left edge region, the coordinates (x, y) of each pixel point in the left edge region should satisfy: 0<x≤x_min(ii) a For initial sample image X_pIn the upper side edge region, the coordinates (x, y) of each pixel point in the upper side edge region should satisfy: y is_max<y is less than or equal to H; for initial sample image X_pIn the lower side edge region, the coordinates (x, y) of each pixel point in the lower side edge region should satisfy: 0<y≤y_min. The computer device can respectively start the sample image X_pThe four borders of (a) and (b) are taken as symmetry axes, the peripheral background region determined by the minimum abscissa value, the maximum abscissa value, the minimum ordinate value and the maximum ordinate value is symmetrically inverted, and the four edge regions are filled, for example, any pixel (x, y) in the right edge region can be filled by the formula I (x, y) ═ I ((W + (W-x)), y). The computer equipment can process the filled four blank corner areas, the upper side boundary or the lower side boundary of each corner area is used as a symmetry axis, the corner areas are overturned and filled, the processes can be repeated until the target object ratio reaches a preset ratio (namely the ratio of the target object to the whole image), and an initial sample image X is obtained_pAnd the filling effect of the corresponding extended sample image can be enhanced through the filling operation.

Please refer to fig. 9, fig. 9 is a schematic diagram illustrating a data expansion effect according to an embodiment of the present application. As shown in fig. 9, the computer device may obtain an initial sample image 70a from the initial set of sample data, the labeled edge profile carried in the initial sample image 70a may include coordinate information of four labeled points a, b, c, and d, and obtain a minimum abscissa value, a maximum abscissa value, a minimum ordinate value, and a maximum ordinate value of the coordinate information of the four labeled points a, b, c, and d, the right side edge region 70b, the lower side edge region 70c, the left side edge region 70d, and the upper side edge region 70e in the initial sample image 70a may be acquired based on the minimum abscissa value, the maximum abscissa value, the minimum ordinate value, and the maximum ordinate value, edge filling is performed through the four edge regions, the filled image 70f can be obtained, and both the initial sample image 70a and the filled image 70f can be used as training data of the initial image segmentation model.

Further, the computer device may obtain a Batch of sample images from the sample data set (both initial sample data and extended sample data included in the sample data set may be used as sample images), obtain sample position features of the sample images (the obtaining process of the sample position features is similar to the obtaining process of the spatial position features, and refer to the description of step S101 in the embodiment corresponding to fig. 3 above), determine the sample position features and the sample images as sample input information, where the sample input information may be represented as N × C × W, N is a Batch Size (Batch _ Size) set in the training process, C is a channel number of the sample input information, H is a height of the sample input information, and W is a width of the sample input information.

Step S202, inputting the sample input information into an initial image segmentation model, and acquiring at least two sample content characteristics corresponding to the sample image in the initial image segmentation model.

Step S203, obtaining sample weights corresponding to the at least two sample content features respectively, performing feature fusion on the at least two sample content features according to the sample weights to obtain sample contour features corresponding to the sample image, and generating a sample contour mask image corresponding to the sample image according to the sample contour features.

Specifically, the computer device may input the sample input information into the initial image segmentation model, extract, through the N convolutional layers included in the initial image segmentation model, sample features corresponding to the N convolutional layers from the sample input information, and use, as at least two sample content features, sample features corresponding to at least two convolutional layers selected in advance; the computer device may perform feature fusion on at least two sample content features based on an attention mechanism (e.g., an SE module), that is, add the at least two sample content features, input the added sample content features into a global pooling layer in the attention mechanism to obtain a sample global description feature, obtain sample weights corresponding to the at least two sample content features respectively through a full connection layer and an activation function in the attention mechanism, perform multiplication processing on the sample weights and the at least two sample content features, generate a sample contour mask image corresponding to the sample image according to the sample contour features, and obtain a sample contour mask image corresponding to the sample image. The process of generating the sample contour mask image is similar to the process of generating the contour mask image, and reference may be made to the description of step S101 in the embodiment corresponding to fig. 3, which is not repeated herein.

And step S204, correcting the network parameters of the initial image segmentation model according to the labeled edge contour corresponding to the sample contour mask image and the sample image, and determining the initial image segmentation model containing the corrected network parameters as an image segmentation model.

Specifically, the computer device may obtain annotation point coordinate information (which may be understood as an expected result corresponding to the sample image) in an annotation edge contour corresponding to the sample image, may calculate an error between a sample contour mask image (which may be understood as an actual result obtained by performing one-time forward calculation on the sample image in the initial image segmentation model) and the annotation point coordinate information, perform reverse calculation on the initial image segmentation model through the error, correct a network parameter of the initial image segmentation model, when an iteration number (iteration) and a training number (epoch) in a training process reach set values, may store the network parameter at this time, and determine the initial image segmentation model including the corrected network parameter as the image segmentation model.

Optionally, the computer device may further generate, in a manner of generating a network in a countermeasure manner, sample data more realistic to the initial sample image, and train the initial image segmentation model according to the initial sample data and data generated by the network in the countermeasure manner, so that generalization and robustness of data construction may be further improved.

In the embodiment of the application, the initial sample image is subjected to smooth filling operation and foreground and background combination operation, and data expansion is performed on the initial sample image, so that the data volume of training data is enhanced, and the generalization performance of the image segmentation model can be improved.

Referring to fig. 10, fig. 10 is a schematic structural diagram of an image data processing apparatus according to an embodiment of the present application. As shown in fig. 10, the image data processing apparatus 1 may include: the device comprises an acquisition module 11, a transformation processing module 12 and an edge shape determining module 13;

the acquisition module 11 is configured to acquire an object contour feature corresponding to a source image, and generate a contour mask image including edge pixel points according to the object contour feature; the source image comprises a target object, and the contour formed by edge pixel points in the contour mask image is associated with the contour of the target object;

a transformation processing module 12, configured to perform transformation processing on edge pixel points included in the contour mask image to obtain M discontinuous initial line segments corresponding to the edge pixel points; the contour formed by the M initial line segments is associated with the contour of the target object, and M is a positive integer;

and an edge shape determining module 13, configured to determine, according to the line segment intersections between the M initial line segments, a target vertex associated with the target object, and determine, according to the target vertex, an object edge shape used for characterizing a contour of the target object in the source image.

The specific functional implementation manners of the obtaining module 11, the transformation processing module 12, and the edge shape determining module 13 may refer to steps S101 to S103 in the embodiment corresponding to fig. 3, which is not described herein again.

Referring to fig. 10, the obtaining module 11 may include: a position feature acquisition unit 111, a content feature acquisition unit 112, a feature fusion unit 113;

the position feature acquiring unit 111 is configured to acquire a spatial position feature corresponding to a source image, and determine the spatial position feature and the source image as image input information;

a content feature acquiring unit 112, configured to input the image input information into an image segmentation model, and acquire at least two image content features corresponding to the image input information in the image segmentation model;

the feature fusion unit 113 is configured to obtain feature weights corresponding to the at least two image content features, perform feature fusion on the at least two image content features according to the feature weights, obtain object contour features corresponding to the source image, and generate a contour mask image corresponding to the source image according to the object contour features.

For specific functional implementation manners of the position feature obtaining unit 111, the content feature obtaining unit 112, and the feature fusion unit 113, reference may be made to step S101 in the embodiment corresponding to fig. 3, which is not described herein again.

Referring to fig. 10 together, the location characteristic acquiring unit 111 may include: a normalization processing sub-unit 1111, a first feature generation sub-unit 1112, a second feature generation sub-unit 1113, a feature concatenation sub-unit 1114;

the standardization processing subunit 1111 is configured to acquire a source image, perform standardization processing on the source image to obtain a standardized image, and acquire space coordinate information corresponding to a pixel point in the standardized image;

a first feature generation subunit 1112, configured to generate, according to the abscissa value in the spatial coordinate information, a first position feature corresponding to a pixel point in the source image;

a second feature generating subunit 1113, configured to generate, according to the vertical coordinate value in the spatial coordinate information, a second position feature corresponding to the pixel point in the source image;

the feature splicing subunit 1114 is configured to determine the first location feature and the second location feature as spatial location features, and splice the spatial location features with the source image to obtain image input information.

The specific functional implementation manners of the normalization processing subunit 1111, the first feature generation subunit 1112, the second feature generation subunit 1113, and the feature splicing subunit 1114 may refer to step S101 in the embodiment corresponding to fig. 3, which is not described herein again.

Referring to fig. 10, the content characteristic obtaining unit 112 may include: an input subunit 1121, a convolution subunit 1122, a feature selection subunit 1123;

an input subunit 1121, configured to acquire an image segmentation model, and input image input information to the image segmentation model;

a convolution subunit 1122, configured to perform convolution processing on the image input information according to the N convolutional layers in the image segmentation model to obtain image features corresponding to the N convolutional layers, respectively; the image characteristics corresponding to the N convolutional layers respectively have different size information, and N is a positive integer;

a feature selection subunit 1123, configured to select at least two image features from the image features respectively corresponding to the N convolutional layers as at least two image content features corresponding to the source image.

The specific function implementation manner of the input subunit 1121, the convolution subunit 1122, and the feature selection subunit 1123 may refer to step S101 in the embodiment corresponding to fig. 3, which is not described herein again.

Referring to fig. 10 together, the feature fusion unit 113 may include: a feature addition subunit 1131, a pooling subunit 1132, a weight obtaining subunit 1133, and a product operation subunit 1134;

a feature adding subunit 1131, configured to add at least two image content features to obtain candidate image content features corresponding to the source image;

the pooling sub-unit 1132 is configured to perform global pooling on the candidate image content features to obtain global description vectors corresponding to the candidate image content features;

a weight obtaining subunit 1133, configured to transform the global description vector into a feature weight corresponding to a candidate image content feature according to the full connection layer and the activation layer;

and the product operation subunit 1134 is configured to perform product processing on the feature weight and the candidate image content feature to generate a contour mask image corresponding to the source image.

The specific functional implementation manners of the feature adding subunit 1131, the pooling subunit 1132, the weight obtaining subunit 1133, and the product calculating subunit 1134 may refer to step S101 in the embodiment corresponding to fig. 3, which is not described herein again.

Referring to fig. 10 together, the edge shape determining module 13 may include: a line segment merging unit 131, an intersection group unit 132, an object edge shape construction unit 133;

the line segment merging unit 131 is configured to obtain line segment distances between the M initial line segments, and merge the M initial line segments according to the line segment distances to obtain K candidate line segments; k is a positive integer less than M;

an intersection group unit 132, configured to obtain segment intersections between any two candidate segments of the K candidate segments, combine the segment intersections to obtain at least two intersection groups, and determine the intersection groups that satisfy the target sorting order as S candidate vertex sets; the number of the line segment intersection points contained in each intersection point group is the same, and S is a positive integer less than or equal to the number of at least two intersection point groups;

the object edge shape constructing unit 133 is configured to obtain intersection ratio indexes corresponding to the S candidate vertex sets, determine a line segment intersection point in the candidate vertex set corresponding to the largest intersection ratio index as a target vertex associated with the target object, and construct an object edge shape in the source image according to the target vertex.

The specific functional implementation manners of the line segment merging unit 131, the intersection group unit 132, and the object edge shape constructing unit 133 may refer to step S103 in the embodiment corresponding to fig. 3, which is not described herein again.

Referring to fig. 10, the segment merging unit 131 may include: an initial line segment acquisition subunit 1311, a line segment distance determination subunit 1312, a line segment distance determination subunit 1313;

an initial line segment obtaining subunit 1311, configured to obtain an initial line segment L of the M initial line segments_iAnd an initial line segment L_jObtaining an initial line segment L_iAnd an initial line segment L_jRespectively corresponding endpoint coordinate information; i and j are positive integers less than or equal to M, and i and j are not equal;

a line segment distance determining subunit 1312 for determining the initial line segment L according to the endpoint coordinate information_iAnd an initial line segment L_jLine segment distance S between_ij；

A line segment distance judging subunit 1313, configured to judge the line segment distance S_ijWhen the distance is less than the threshold value, the initial line segment L is divided into two segments_iAnd an initial line segment L_jAnd merging the line segments into candidate line segments.

The specific functional implementation manners of the initial line segment obtaining subunit 1311, the line segment distance determining subunit 1312, and the line segment distance determining subunit 1313 may refer to step S103 in the embodiment corresponding to fig. 3, which is not described herein again.

Referring to fig. 10 together, the object edge shape constructing unit 133 may include: a candidate frame construction subunit 1331, a noise point screening subunit 1332, and a target vertex determination subunit 1333;

a candidate frame construction subunit 1331, configured to determine the candidate vertex set corresponding to the maximum intersection-to-parallel ratio index as a target vertex set, and construct a candidate frame according to the target vertex set; the top point of the candidate frame is the line segment intersection point contained in the target top point set;

a noise point screening subunit 1332, configured to obtain a candidate edge straight line corresponding to the candidate frame, screen a noise point in an edge pixel point covered by the candidate edge straight line, and update the candidate edge straight line according to the screened edge pixel point to obtain an updated candidate edge straight line;

a target vertex determining subunit 1333, configured to determine a straight line intersection point between the updated candidate edge straight lines as a target vertex associated with the target object.

For specific functional implementation manners of the candidate frame constructing subunit 1331, the noise point screening subunit 1332, and the target vertex determining subunit 1333, reference may be made to step S103 in the embodiment corresponding to fig. 3, which is not described herein again.

Referring to fig. 10 together, the image data processing apparatus 1 may further include: an image segmentation module 14, an image recognition module 15;

the image segmentation module 14 is configured to segment the source image according to the object edge shape, and determine a pixel point covered by the object edge shape as a target image containing a target object;

and the image recognition module 15 is configured to perform image recognition processing on the target object image to obtain an image recognition result for the target object.

The specific functional implementation manners of the image segmentation module 14 and the image recognition module 15 may refer to step S103 in the embodiment corresponding to fig. 3, which is not described herein again.

Referring to fig. 10 together, the image data processing apparatus 1 may further include: a sample acquisition module 16, a sample characteristic acquisition module 17, a sample characteristic fusion module 18 and a network parameter correction module 19;

the sample acquisition module 16 is configured to acquire a sample data set, acquire a sample position feature of a sample image in the sample data set, and determine the sample position feature and the sample image as sample input information; the sample data set comprises sample images carrying labeling edge outlines;

the sample characteristic obtaining module 17 is configured to input sample input information to the initial image segmentation model, and obtain at least two sample content characteristics corresponding to the sample image in the initial image segmentation model;

the sample feature fusion module 18 is configured to obtain sample weights corresponding to at least two sample content features, perform feature fusion on the at least two sample content features according to the sample weights to obtain sample contour features corresponding to the sample image, and generate a sample contour mask image corresponding to the sample image according to the sample contour features;

and the network parameter correction module 19 is configured to correct the network parameters of the initial image segmentation model according to the sample contour mask image and the labeled edge contour corresponding to the sample image, and determine the initial image segmentation model including the corrected network parameters as an image segmentation model.

The specific functional implementation manners of the sample obtaining module 16, the sample characteristic obtaining module 17, the sample characteristic fusing module 18, and the network parameter modifying module 19 may refer to steps S201 to S204 in the embodiment corresponding to fig. 7, which is not described herein again.

Referring also to fig. 10, the sample acquisition module 16 may include: a first sample acquisition unit 161, a deformation processing unit 162, an image combining unit 163;

a first sample acquiring unit 161, configured to acquire an initial sample data set in which an initial sample image X is acquired_pAnd an initial sample image X_q(ii) a p and q are positive integers less than or equal to the number of initial sample images contained in the initial sample data set, and p and q are not equal;

a deformation processing unit 162 for acquiring an initial sample image X_pObtaining an initial sample image X_qThe foreground object image is subjected to deformation processing to obtain a deformed foreground object image;

the image combining unit 163 is configured to combine the foreground object image and the background image after the deformation to obtain an extended sample image, and add the extended sample image to the initial sample data set to obtain a sample data set.

For specific functional implementation manners of the first sample acquiring unit 161, the deformation processing unit 162, and the image combining unit 163, reference may be made to step S201 in the embodiment corresponding to fig. 7, which is not described herein again.

Referring also to fig. 10, the sample acquisition module 16 may include: a first sample acquisition unit 164, a background edge determination unit 165, a background filling unit 166;

a first sample acquiring unit 164 for acquiring initial sample dataCollecting, obtaining initial sample image X in initial sample data set_p(ii) a p is a positive integer less than or equal to the number of initial sample images contained in the initial sample data set;

a background edge determination unit 165 for obtaining an initial sample image X_pDetermining the initial sample image X according to the minimum abscissa value, the maximum abscissa value, the minimum ordinate value and the maximum ordinate value in the coordinate information of the corresponding labeling point of the labeling edge profile_pThe background edge of (1);

a background filling unit 166, configured to fill the background edge to obtain an initial sample image X_pAnd adding the corresponding extended sample image to the initial sample data set to obtain a sample data set.

The specific functional implementation manners of the first sample obtaining unit 164, the background edge determining unit 165, and the background filling unit 166 may refer to step S201 in the embodiment corresponding to fig. 7, which is not described herein again.

In the embodiment of the application, image content features of different scales in a source image can be extracted through an image segmentation model, feature fusion is carried out on the image content features of different scales through an attention mechanism, object contour features corresponding to the source image are obtained, a contour mask image containing edge pixel points can be generated based on the object contour features, then the contour mask image can be transformed, M initial line segments are obtained, a target vertex associated with a target object is determined in the source image according to line segment intersection points among the M initial line segments, an object edge shape is constructed according to the target vertex, the object edge shape is matched with the actual shape of the target object, and then the detection accuracy of the target object in the source image is improved; and carrying out image segmentation on the source image according to the edge shape of the object to obtain a target image containing the target object, and carrying out image recognition processing on the target image, so that the recognition accuracy of the target image can be improved. The initial sample image is subjected to smooth filling operation and foreground and background combination operation, and data expansion is performed on the initial sample image, so that the data volume of training data is enhanced, and the generalization performance of the image segmentation model can be improved.

Referring to fig. 11, fig. 11 is a schematic structural diagram of a computer device according to an embodiment of the present application. As shown in fig. 11, the computer apparatus 1000 may include: the processor 1001, the network interface 1004, and the memory 1005, and the computer apparatus 1000 may further include: a user interface 1003, and at least one communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display) and a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a standard wireless interface. Optionally, the network interface 1004 may include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as at least one disk memory. Optionally, the memory 1005 may also be at least one memory device located remotely from the processor 1001. As shown in fig. 11, a memory 1005, which is a kind of computer-readable storage medium, may include therein an operating system, a network communication module, a user interface module, and a device control application program.

In the computer device 1000 shown in fig. 11, the network interface 1004 may provide a network communication function; the user interface 1003 is an interface for providing a user with input; and the processor 1001 may be used to invoke a device control application stored in the memory 1005 to implement:

It should be understood that the computer device 1000 described in this embodiment of the present application may perform the description of the image data processing method in the embodiment corresponding to fig. 3, and may also perform the description of the image data processing apparatus 1 in the embodiment corresponding to fig. 10, which is not described herein again. In addition, the beneficial effects of the same method are not described in detail.

Further, here, it is to be noted that: an embodiment of the present application further provides a computer-readable storage medium, where the computer program executed by the aforementioned image data processing apparatus 1 is stored in the computer-readable storage medium, and the computer program includes program instructions, and when the processor executes the program instructions, the description of the image data processing method in the embodiment corresponding to any one of fig. 3 and fig. 7 can be performed, so that details are not repeated here. In addition, the beneficial effects of the same method are not described in detail. For technical details not disclosed in embodiments of the computer-readable storage medium referred to in the present application, reference is made to the description of embodiments of the method of the present application. As an example, the program instructions may be deployed to be executed on one computing device or on multiple computing devices at one site or distributed across multiple sites and interconnected by a communication network, which may constitute a block chain system.

Further, it should be noted that: embodiments of the present application also provide a computer program product or computer program, which may include computer instructions, which may be stored in a computer-readable storage medium. The processor of the computer device reads the computer instruction from the computer-readable storage medium, and the processor can execute the computer instruction, so that the computer device executes the description of the image data processing method in the embodiment corresponding to any one of fig. 3 and fig. 7, which will not be described herein again. In addition, the beneficial effects of the same method are not described in detail. For technical details not disclosed in the embodiments of the computer program product or the computer program referred to in the present application, reference is made to the description of the embodiments of the method of the present application.

It should be noted that, for simplicity of description, the above-mentioned embodiments of the method are described as a series of acts or combinations, but those skilled in the art should understand that the present application is not limited by the order of acts described, as some steps may be performed in other orders or simultaneously according to the present application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

The steps in the method of the embodiment of the application can be sequentially adjusted, combined and deleted according to actual needs.

The modules in the device can be merged, divided and deleted according to actual needs.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium and can include the processes of the embodiments of the methods described above when the computer program is executed. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

The above disclosure is only for the purpose of illustrating the preferred embodiments of the present application and is not to be construed as limiting the scope of the present application, so that the present application is not limited thereto, and all equivalent variations and modifications can be made to the present application.

Claims

1. An image data processing method characterized by comprising:

acquiring object contour features corresponding to a source image, and generating a contour mask image containing edge pixel points according to the object contour features; the source image comprises a target object, and the contour formed by the edge pixel points in the contour mask image is associated with the contour of the target object;

carrying out transformation processing on the edge pixel points contained in the contour mask image to obtain M discontinuous initial line segments corresponding to the edge pixel points; the contour formed by M initial line segments is associated with the contour of the target object, and M is a positive integer;

acquiring line segment intersection points between any two candidate line segments in the K candidate line segments, combining the line segment intersection points to obtain at least two intersection point groups, and determining the intersection point groups meeting the target sorting sequence as S candidate vertex sets; the line segment intersection points contained in each intersection point group are the same in number, and S is a positive integer less than or equal to the number of the at least two intersection point groups;

acquiring intersection ratio indexes corresponding to the S candidate vertex sets respectively, determining line segment intersection points in the candidate vertex set corresponding to the maximum intersection ratio index as target vertices associated with the target object, and determining an object edge shape used for representing the outline of the target object in the source image according to the target vertices.

2. The method according to claim 1, wherein the obtaining of the object contour feature corresponding to the source image and the generating of the contour mask image including the edge pixel points according to the object contour feature comprises:

inputting the image input information into an image segmentation model, and acquiring at least two image content characteristics corresponding to the image input information in the image segmentation model;

acquiring feature weights corresponding to the at least two image content features respectively, performing feature fusion on the at least two image content features according to the feature weights to obtain object contour features corresponding to the source image, and generating a contour mask image corresponding to the source image according to the object contour features.

3. The method according to claim 2, wherein the obtaining of the spatial position feature corresponding to the source image, and the determining of the spatial position feature and the source image as the image input information comprises:

acquiring the source image, standardizing the source image to obtain a standardized image, and acquiring space coordinate information corresponding to pixel points in the standardized image;

generating a first position characteristic corresponding to a pixel point in the source image according to an abscissa value in the space coordinate information;

and determining the first position characteristic and the second position characteristic as the spatial position characteristic, and splicing the spatial position characteristic and the source image to obtain the image input information.

4. The method of claim 2, wherein the inputting the image input information into an image segmentation model, and obtaining at least two image content features corresponding to the image input information in the image segmentation model, comprises:

acquiring the image segmentation model, and inputting the image input information into the image segmentation model;

performing convolution processing on the image input information according to the N convolutional layers in the image segmentation model to obtain image characteristics corresponding to the N convolutional layers respectively; the image characteristics corresponding to the N convolutional layers respectively have different size information, and N is a positive integer;

5. The method according to claim 2, wherein the obtaining of the feature weights corresponding to the at least two image content features respectively, and performing feature fusion on the at least two image content features according to the feature weights to obtain the object contour features corresponding to the source image comprises:

adding the at least two image content characteristics to obtain candidate image content characteristics corresponding to the source image;

transforming the global description vector into a feature weight corresponding to the candidate image content feature according to a full connection layer and an activation layer;

6. The method of claim 1, wherein the obtaining segment distances between the M initial segments, and merging the M initial segments according to the segment distances to obtain K candidate segments comprises:

obtaining an initial line segment L in the M initial line segments_iAnd an initial line segment L_jObtaining the initial line segment L_iAnd the initial line segment L_jRespectively corresponding endpoint coordinate information; i and j are positive integers less than or equal to M, and i and j are not equal;

determining the initial line segment L according to the endpoint coordinate information_iAnd the initial line segment L_jLine segment distance S between_ij；

When the line segment distance S_ijWhen the distance is less than the distance threshold value, the initial line segment L is divided into two segments_iAnd the initial line segment L_jAnd merging the line segments into candidate line segments.

7. The method of claim 1, wherein determining the line segment intersection point in the candidate vertex set corresponding to the maximum intersection ratio index as the target vertex associated with the target object comprises:

determining a candidate vertex set corresponding to the maximum intersection ratio index as a target vertex set, and constructing a candidate frame according to the target vertex set; the vertex of the candidate box is a line segment intersection point contained in the target vertex set;

acquiring a candidate edge straight line corresponding to the candidate frame, screening noise points in edge pixel points covered by the candidate edge straight line, and updating the candidate edge straight line according to the screened edge pixel points to obtain an updated candidate edge straight line;

determining a line intersection between the updated candidate edge lines as a target vertex associated with the target object.

8. The method of claim 1, further comprising:

segmenting the source image according to the object edge shape, and determining pixel points covered by the object edge shape as a target image containing the target object;

9. The method of claim 2, further comprising:

acquiring a sample data set, acquiring sample position characteristics of a sample image in the sample data set, and determining the sample position characteristics and the sample image as sample input information; the sample data set comprises sample images carrying labeling edge contours;

inputting the sample input information into an initial image segmentation model, and acquiring at least two sample content characteristics corresponding to the sample image in the initial image segmentation model;

acquiring sample weights respectively corresponding to the at least two sample content features, performing feature fusion on the at least two sample content features according to the sample weights to obtain sample contour features corresponding to the sample images, and generating sample contour mask images corresponding to the sample images according to the sample contour features;

and correcting the network parameters of the initial image segmentation model according to the sample contour mask image and the labeled edge contour corresponding to the sample image, and determining the initial image segmentation model containing the corrected network parameters as the image segmentation model.

10. The method of claim 9, wherein said obtaining a set of sample data comprises:

acquiring an initial sample data set, and acquiring an initial sample image X in the initial sample data set_pAnd an initial sample image X_q(ii) a p and q are positive integers less than or equal to the number of initial sample images contained in the initial sample data set, and p and q are not equal;

obtaining the initial sample image X_pObtaining the initial sample image X_qThe foreground object image is subjected to deformation processing to obtain a deformed foreground object image;

and combining the deformed foreground object image and the background image to obtain an extended sample image, and adding the extended sample image to the initial sample data set to obtain the sample data set.

11. The method of claim 9, wherein said obtaining a set of sample data comprises:

acquiring an initial sample data set, and acquiring an initial sample image X in the initial sample data set_p(ii) a p is less than or equal to the initial sample data setA positive integer of the number of initial sample images contained in the sum;

obtaining the initial sample image X_pDetermining the initial sample image X according to the minimum abscissa value, the maximum abscissa value, the minimum ordinate value and the maximum ordinate value in the corresponding marking point coordinate information of the marking edge profile_pThe background edge of (1);

filling the background edge to obtain the initial sample image X_pAnd adding the corresponding extended sample image to the initial sample data set to obtain the sample data set.

12. An image data processing apparatus characterized by comprising:

the acquisition module is used for acquiring the object contour characteristics corresponding to the source image and generating a contour mask image containing edge pixel points according to the object contour characteristics; the source image comprises a target object, and the contour formed by the edge pixel points in the contour mask image is associated with the contour of the target object;

the transformation processing module is used for carrying out transformation processing on the edge pixel points contained in the contour mask image to obtain M discontinuous initial line segments corresponding to the edge pixel points; the contour formed by M initial line segments is associated with the contour of the target object, and M is a positive integer;

the edge shape determining module is used for acquiring line segment distances among the M initial line segments, and merging the M initial line segments according to the line segment distances to obtain K candidate line segments; k is a positive integer less than M;

the edge shape determining module is further configured to obtain segment intersections between any two candidate segments of the K candidate segments, combine the segment intersections to obtain at least two intersection groups, and determine the intersection groups that satisfy a target sorting order as S candidate vertex sets; the line segment intersection points contained in each intersection point group are the same in number, and S is a positive integer less than or equal to the number of the at least two intersection point groups;

the edge shape determining module is further configured to obtain intersection ratio indexes corresponding to the S candidate vertex sets, determine a line segment intersection point in the candidate vertex set corresponding to the largest intersection ratio index as a target vertex associated with the target object, and determine an object edge shape used for representing a contour of the target object in the source image according to the target vertex.

13. A computer arrangement comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, performs the steps of the method of any one of claims 1 to 11.

14. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program comprising program instructions which, when executed by a processor, perform the steps of the method of any one of claims 1 to 11.