CN113808040A - Live image contour correction method, device, equipment and storage medium - Google Patents

Live image contour correction method, device, equipment and storage medium Download PDF

Info

Publication number
CN113808040A
CN113808040A CN202111057866.2A CN202111057866A CN113808040A CN 113808040 A CN113808040 A CN 113808040A CN 202111057866 A CN202111057866 A CN 202111057866A CN 113808040 A CN113808040 A CN 113808040A
Authority
CN
China
Prior art keywords
contour
image
live
acquiring
live broadcast
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111057866.2A
Other languages
Chinese (zh)
Inventor
陈广
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Cubesili Information Technology Co Ltd
Original Assignee
Guangzhou Cubesili Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Cubesili Information Technology Co Ltd filed Critical Guangzhou Cubesili Information Technology Co Ltd
Priority to CN202111057866.2A priority Critical patent/CN113808040A/en
Publication of CN113808040A publication Critical patent/CN113808040A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/80Geometric correction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/13Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/136Segmentation; Edge detection involving thresholding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • G06T7/62Analysis of geometric attributes of area, perimeter, diameter or volume
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Geometry (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The application relates to a method, a device, equipment and a storage medium for correcting the contour of a live image, wherein the method comprises the steps of extracting straight line segments in a contour map of the live image, converting the straight line segments into straight lines, acquiring intersections between every two straight lines, and combining the intersections meeting combination conditions according to preset combination conditions, so that the data operation amount of the contour correction of the live image is reduced; through the formed position information of the four cross points with the largest rectangular area, an affine transformation matrix is obtained based on the preset target image position information and the position information of the four cross points, the contour map of the live broadcast image is corrected by using the affine transformation matrix, the contour map of the extracted live broadcast image can be quickly corrected by the contour correction method of the live broadcast image, the extracted contour map of the live broadcast image is easier to recognize, and the contour detection efficiency of the live broadcast image is improved.

Description

Live image contour correction method, device, equipment and storage medium
Technical Field
The present application relates to the field of live webcasting technologies, and in particular, to a method, an apparatus, a device, and a storage medium for contour correction of live webcasting images.
Background
With the progress of network communication technology, live webcast becomes a new network interaction mode, and live webcast is popular with more and more audiences due to the characteristics of instantaneity, interactivity and the like.
In the process of live network broadcast, an online anchor often needs interaction with audiences, and in a partial live broadcast scene, when the anchor shows an object to the audiences, contour detection needs to be carried out on the object, and after the object is detected through the contour detection, the object can be subjected to processing such as picture amplification, special effect adding, independent showing and the like.
However, objects in the captured live images are prone to shift or distortion, and the object contour map extracted through contour detection may have distortion problems, thereby affecting the recognition of the objects.
Disclosure of Invention
Based on this, an object of the present application is to provide a method, an apparatus, a device, and a storage medium for correcting a contour of a live broadcast image, which can quickly correct a contour of a live broadcast image, make it easier to recognize an extracted contour of a live broadcast image, and improve detection efficiency.
According to a first aspect of an embodiment of the present application, there is provided a method for correcting a contour of a live image, the method including:
acquiring a live broadcast image to be detected;
carrying out contour extraction on the live broadcast image to obtain a contour map of the live broadcast image;
extracting straight line segments in a contour map of the live image;
converting the straight line segments into straight lines, acquiring position information of a cross point between every two straight lines, and combining the cross points meeting the combination condition according to a preset combination condition;
acquiring the rectangular area formed by every four intersection points, and acquiring the position information of the four intersection points with the largest rectangular area;
and acquiring an affine transformation matrix based on preset target image position information and position information of the four cross points, and correcting the contour map of the live broadcast image by using the affine transformation matrix to acquire the corrected contour map of the live broadcast image.
According to a second aspect of embodiments of the present application, there is provided a live image contour correction device including:
the acquisition module is used for acquiring a live image to be detected;
the contour map acquisition module is used for extracting contours of the live broadcast images and acquiring contour maps of the live broadcast images;
the straight line segment extraction module is used for extracting straight line segments in a contour map of the live broadcast image;
the intersection acquisition module is used for converting the straight line segments into straight lines, acquiring the position information of an intersection between every two straight lines and combining the intersections meeting the combination conditions according to preset combination conditions;
the position information acquisition module is used for acquiring the rectangular area formed by every four intersections and acquiring the position information of the four intersections with the largest rectangular area;
and the correction module is used for acquiring an affine transformation matrix based on preset target image position information and the position information of the four cross points, correcting the contour map of the live broadcast image by using the affine transformation matrix and acquiring the corrected contour map of the live broadcast image.
According to a third aspect of embodiments of the present application, there is provided an electronic apparatus, including: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to execute any one of the methods of contour rectification of live images.
According to a fourth aspect of embodiments of the present application, there is provided a computer-readable storage medium, on which a computer program is stored, the computer program, when being executed by a processor, implementing any one of the live broadcast image contour correction methods.
The method comprises the steps of obtaining a live image to be detected, extracting straight line segments in a contour map of the live image, converting the straight line segments into straight lines, obtaining intersections between every two straight lines, and combining the intersections meeting combination conditions according to preset combination conditions, so that the data operation amount of contour correction of the live image is reduced; through the formed position information of the four cross points with the largest rectangular area, an affine transformation matrix is obtained based on the preset target image position information and the position information of the four cross points, the contour map of the live broadcast image is corrected by using the affine transformation matrix, the contour map of the extracted live broadcast image can be quickly corrected by the contour correction method of the live broadcast image, the extracted contour map of the live broadcast image is easier to recognize, and the contour detection efficiency of the live broadcast image is improved.
For a better understanding and practice, the present application is described in detail below with reference to the accompanying drawings.
Drawings
Fig. 1 is a schematic view of an application scenario of a live image contour correction method according to an embodiment of the present application;
fig. 2 is a flowchart of a method for correcting contours of live images according to an embodiment of the present application;
FIG. 3 is an exemplary diagram of an outline of a live image extracted by a method described herein according to an embodiment of the present application;
FIG. 4 is an exemplary diagram of an outline of a live image extracted by a method described herein according to another embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of a device for correcting contours of live images according to an embodiment of the present application;
fig. 6 is a block diagram illustrating a structure of an electronic device according to an embodiment of the present application.
Detailed Description
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
It should be understood that the embodiments described are only a few embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the application, as detailed in the appended claims.
In the description of the present application, it is to be understood that the terms "first," "second," "third," and the like are used solely to distinguish one from another and are not necessarily used to describe a particular order or sequence, nor are they to be construed as indicating or implying relative importance. The specific meaning of the above terms in the present application can be understood by those of ordinary skill in the art as appropriate. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. The word "if/if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination". Further, in the description of the present application, "a plurality" means two or more unless otherwise specified. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.
Please refer to fig. 1, which is a schematic view of an application scenario of the live video contour correction method provided in the present application, where the application scenario includes a live client 10 and a server 20, and the live client 10 interacts with the server 20.
The hardware to which the live client 10 is directed is essentially a computer device, and in particular, it may be a computer device of the type of a smartphone, smart interactive tablet, personal computer, or the like. The live client 10 may access the internet via a known network access method to establish a data communication link with the server 20.
The server 20 is a business server, and may be responsible for further connecting to related audio data servers, video streaming servers, and other servers providing related support services, so as to form a logically associated server cluster for providing services to related terminal devices, such as the live client 10 shown in fig. 1.
The live image contour correction method may be executed in the live client 10 and/or the server 20. When the live broadcast image contour correction method is operated on the live broadcast client 10, the live broadcast client 10 executes the live broadcast image contour correction method on a locally acquired live broadcast picture to acquire a corrected live broadcast image contour map. When the live broadcast image contour correction method is executed in the server, the server 20 obtains a live broadcast picture from the live broadcast client, executes the live broadcast image contour correction method, obtains a corrected live broadcast image contour map, and returns the live broadcast image contour map to the live broadcast client 10.
The first embodiment is as follows:
the embodiment of the application discloses a contour correction method for live images.
A method for correcting the contour of a live image according to an embodiment of the present application will be described in detail below with reference to fig. 2.
The method for correcting the contour of the live broadcast image comprises the following steps:
s101: and acquiring a live image to be detected.
The live image may be a live image obtained by a live client, or a part of the live image, such as a local screenshot of the live image.
S102: carrying out contour extraction on the live broadcast image to obtain a contour map of the live broadcast image;
in one embodiment, the live image may be contour extracted using an existing contour extraction model. In another embodiment, a pre-trained weak semantic contour detection model can be adopted to obtain a contour map of a live broadcast image, and the weak semantic contour detection model only focuses on complete and obvious target objects, so that the computation amount in contour detection can be effectively reduced, and the detection efficiency is improved; and for the live broadcast image without a complete object, the contour detection is stopped in time, so that the false detection rate is reduced.
The weak semantic contour detection model is used for contour detection of a complete object which is obvious in the live broadcast image, does not care about the type of the object in the live broadcast image, and can detect the contour of the complete object as long as the complete object which is obvious is in the live broadcast image. The weak semantic contour detection model is designed based on a decoding-encoding (Encoder-Decoder) framework, wherein the decoding-encoding framework is a model framework and uses different algorithms to solve different tasks, wherein encoding (Encoder) refers to converting an input sequence into dense vectors with fixed dimensions by an Encoder, and decoding (Decoder) refers to converting the dense vectors obtained by encoding into target data. The weak semantic contour detection model includes an encoder, a classification module, and a decoder.
Specifically, the step of extracting the contour of the live image includes:
acquiring contour features in the live broadcast image through an encoder in a pre-trained weak semantic contour detection model; wherein the weak semantic contour detection model comprises an encoder, a classification module and a decoder;
determining, by the classification module, whether the live image contains at least one complete object according to the contour features;
and if detecting that the live image contains at least one complete object, extracting the object outline in the live image through the decoder.
The encoder comprises an input layer and a plurality of encoding layers which are sequentially connected;
the step of obtaining the contour features in the live broadcast image through an encoder in a pre-trained weak semantic contour detection model comprises the following steps:
the live broadcast image is convoluted through the input layer, and is output to the plurality of coding layers after being downsampled to a first preset resolution;
respectively performing separation convolution on the live broadcast image through the plurality of coding layers to obtain a contour characteristic diagram in the live broadcast image;
the input layer down-samples the live broadcast image to a first preset resolution ratio so as to reduce the characteristic operation amount output to the coding layer and improve the efficiency of extracting the object contour by the weak semantic contour detection model. The first preset resolution can be set according to the image size of the input live broadcast image.
The encoding layer is used for performing convolution operation on the live broadcast image to acquire the contour characteristic image in the live broadcast image, preferably, the convolution mode of the encoding layer is depth separable convolution, the depth separable convolution can reduce parameters compared with conventional convolution, and the operation speed of the network can be improved to a certain extent.
The classification module is a two-classification module, each pixel point of the input image is divided into a contour pixel point and a non-contour pixel point, and the connectivity of each contour pixel point is determined so as to determine whether the live broadcast image contains at least one complete object. In the embodiment of the present application, whether the live image has a significance is determined by determining whether the live image contains at least one or more complete objects: if the live image does not contain at least one or more complete objects, the live image has significance; if the live image does not contain a complete object, the live image is not significant. And if the live broadcast image is judged to have significance, continuing contour detection, and if the live broadcast image does not have significance, terminating contour detection. The conclusion that the live images do not contain complete objects is obtained in advance, contour detection is not needed, and contour detection of invalid images is reduced, so that the false detection rate is reduced.
The classification module comprises an average pooling layer, a vector conversion layer and a plurality of full connection layers which are connected in sequence;
down-sampling the contour feature map output by the encoder to a second preset resolution through the average pooling layer;
and converting the profile characteristic diagram with the second preset resolution into a one-dimensional vector with a preset length through the vector conversion layer, and obtaining a secondary classification value of a complete object representing whether the live broadcast image contains at least one through connection layer.
The vector conversion layer converts the profile feature map with the second preset resolution into a one-dimensional vector with the length of 512, the classification module comprises three full-connection layers with the number of nodes of 64, 16 and 1, and the three full-connection layers perform connection operation on the one-dimensional vector to obtain a second classification value which indicates whether the live broadcast image contains at least one complete object.
And the decoder is used for decoding the contour characteristic diagram obtained by the encoding of the encoder to obtain the object contour in the live broadcast image. In one embodiment, the decoder comprises a plurality of decoding layers and output layers which are connected in sequence; each decoding layer corresponds to one coding layer; the step of extracting the object contour in the live image by the decoder comprises:
performing bilinear interpolation on the outputs of the plurality of coding layers corresponding to the decoding layers through the plurality of decoding layers respectively, and up-sampling to the first preset resolution;
and extracting the object outline in the live broadcast image through the output layer.
The bilinear interpolation is an up-sampling method for improving the resolution of an image by interpolating in two directions to obtain new pixel points by utilizing 4 pixel points existing on the original image. Compared with other up-sampling methods, the bilinear interpolation method carries out calculation based on pixel points in the original image, avoids the sawtooth phenomenon and can obtain smoother high-resolution images.
S103: extracting straight line segments in a contour map of the live image;
in an embodiment, a straight Line segment detection algorithm (LSD) may be used to extract a straight Line segment in the contour map of the live broadcast image, where the straight Line segment detection algorithm (LSD) is to perform straight Line segmentation on an input gray scale image based on a gradient value and a gradient direction by detecting the gradient value and the gradient direction of each pixel of the image to obtain a plurality of straight Line segments. Specifically, the step of extracting a straight line segment in the contour map of the live image comprises the following steps:
based on a Gaussian down-sampling method, down-sampling the silhouette of the live broadcast image to a preset image scale;
acquiring gradient values and gradient direction values of all pixel points of a contour map of the live broadcast image;
eliminating pixel points with gradient values smaller than a preset gradient threshold value, and selecting the pixel points with the maximum gradient values as seed points;
determining a direction value range based on the gradient direction value of the seed point and a preset range threshold, and obtaining pixel points of the gradient direction value in the direction value range to obtain a plurality of same-sex points;
generating a rectangle comprising the plurality of same-sex points based on the position information of the plurality of same-sex points;
acquiring the length and the width of the rectangle, and calculating the density of the same-polarity points of the rectangle according to the number of the same-polarity points in the rectangle;
if the density of the same-polarity points is greater than or equal to a set density threshold value, obtaining an error value of the rectangle in the contour map based on a fitted rectangle precision calculation function;
if the error value is smaller than or equal to a preset threshold value, taking a straight line segment of the rectangle as a straight line segment in a contour map of the live image; and if the error value is larger than the preset threshold, adjusting the side length of the rectangle until the error value of the rectangle obtained based on the fitted rectangle precision calculation function is smaller than or equal to the preset threshold.
Based on a Gaussian down-sampling method, the contour map of the live broadcast image is down-sampled, the sawtooth phenomenon of the image can be effectively solved by reducing the image, and the detection precision of the straight line segment is improved. In the embodiment of the present application, the preset image scale may be 0.8.
Specifically, the gradient value can be calculated according to the gray values i (x +1, y), i (x, y +1) and i (x +1, y) of the pixel (x, y) and the neighboring pixel (x +1, y), (x, y +1) and (x +1, y) thereof according to a gradient value calculation formula,
Figure BDA0003255275340000061
Figure BDA0003255275340000062
Figure BDA0003255275340000071
wherein G (x, y) is a gradient value, Gx(x, y) is a first gray value, gy(x, y) is a second gray scale value.
The gradient direction value can be calculated according to the gradient direction value calculation formula according to the gray values i (x +1, y), i (x, y +1) and i (x +1, y) of the pixel point (x, y) and the adjacent pixel points (x +1, y), (x, y +1) and (x +1, y).
Wherein, the gradient direction value calculation formula is as follows:
Figure BDA0003255275340000072
where θ is a gradient direction value.
The gradient threshold value can be set according to actual requirements.
The direction value range is determined based on the gradient direction value of the seed point and a preset range threshold, specifically, the direction value range may be [ a-t, a + t ], where a is the gradient direction value of the seed point, t is the preset range threshold, and the range threshold may be set according to the actual demand of the user.
After the range of the direction value is determined, the seed points are used as starting points, all pixel points of the contour map of the live broadcast image are searched, pixel points (namely isotropic points) of the gradient direction value within the range of the direction value are obtained, and a rectangle containing all the isotropic points is generated.
The same-sex point density is used for determining the number of same-sex points in the rectangle and can be obtained by dividing the number of same-sex points by the area of the rectangle, and the density threshold value can be set by inputting the size of the image and the actual requirements of the user. When the density of the same-polarity points of the rectangle is smaller than the set density threshold, the rectangle can be converted into a plurality of rectangles by truncating the rectangle, and the density of the same-polarity points of the truncated rectangle is recalculated until the density of the same-polarity points of the obtained rectangle is larger than or equal to the set density threshold.
In the embodiment of the present application, when the error value is less than or equal to a preset threshold, it is determined that the currently fitted rectangle meets a set requirement, and a straight-line segment of the rectangle is taken as a straight-line segment in the contour map of the live broadcast image. If the error value is larger than the preset threshold, the side length of the rectangle is adjusted to be cut into a plurality of rectangular frames, and the error value is obtained based on the fitted rectangle precision calculation function until the error value is smaller than or equal to the preset threshold.
S104: converting the straight line segments into straight lines, acquiring position information of a cross point between every two straight lines, and combining the cross points meeting the combination condition according to a preset combination condition;
in one embodiment, the step of converting the straight line segment into a straight line specifically includes:
acquiring the slope and intercept of the straight line corresponding to the position information of the straight line section end point;
and acquiring a straight line corresponding to the straight line segment according to the slope and the intercept.
The position information of the end points of the straight line segment comprises position information of two end points of the straight line segment, the slope and the intercept of the straight line corresponding to the end points of the two straight line segments are obtained according to the position information of the end points of the two straight line segments, and the straight line corresponding to the straight line segment is determined according to the slope and the intercept. In a preferred embodiment, the correction efficiency may be improved by merging the close straight lines to reduce the amount of data operations, and therefore, after the step of obtaining the straight line corresponding to the straight line segment, the method further includes:
according to the slope and the intercept, combining straight lines meeting preset combining conditions;
wherein the preset merging condition comprises: the slope difference value of at least two straight lines is within a preset slope difference range, and the intercept difference value of at least two straight lines is within an intercept difference range.
The intersection point is the intersection point between two straight lines, and for the intersection points with adjacent positions, the data calculation amount can be reduced in a manner of combining the intersection points, and the correction efficiency is improved. Specifically, the step of merging the intersections satisfying the merging condition according to a preset merging condition includes:
when the distance between at least two intersection points is smaller than a set threshold value, acquiring the mean value of the position information of the at least two intersection points;
and combining the at least two intersections, and generating a combined intersection according to the average value of the position information of the at least two intersections.
By combining the adjacent cross points and generating a new cross point at the midpoint of the adjacent cross points according to the average value of the position information of the adjacent cross points, the number of the cross points is reduced, and the correction efficiency is improved.
S105: acquiring the rectangular area formed by every four intersection points, and acquiring the position information of the four intersection points with the largest rectangular area;
and determining a rectangle by the position information of the four intersection points with the maximum formed rectangle area, thereby determining a rectangular area to be corrected in the contour map of the live image.
In one embodiment, after the step of obtaining the rectangular area formed by every four intersection points, the method further comprises:
acquiring a rectangle formed by every four intersections, and acquiring a rectangle which meets a preset rectangle screening condition;
wherein, the preset rectangular screening conditions comprise: the included angle of adjacent sides of the rectangle is larger than a set included angle threshold value, the length proportion of the opposite sides of the rectangle is larger than a set proportion threshold value, at least one group of opposite sides of the rectangle are parallel, and the length-width ratio of the rectangle is larger than a set length-width ratio threshold value.
And at least one group of opposite sides of the rectangle are parallel, so that the finally corrected matrix area is trapezoidal or parallelogram, and affine transformation is more convenient to perform. The set angle threshold, the set ratio threshold, and the set aspect ratio threshold may be set according to the size of the input image and the size of the outline included in the image. For example, the rectangular filtering condition may be set as: the included angle of the adjacent sides of the rectangle is more than 4 degrees, the length proportion of the opposite sides of the rectangle is more than 0.5, at least one group of opposite sides of the rectangle are parallel, and the proportion of the shortest side and the longest side of the rectangle is more than 0.15.
S106: and acquiring an affine transformation matrix based on preset target image position information and position information of the four cross points, and correcting the contour map of the live broadcast image by using the affine transformation matrix to acquire the corrected contour map of the live broadcast image.
Affine transformation refers to linear transformation between two-dimensional coordinates, and lines in an image after affine transformation can keep original straightness and relative position relation. In the embodiment of the application, an affine transformation matrix can be constructed according to the position information of the target image and the position information of the four cross points, and the contour map of the live broadcast image is corrected by using the affine transformation matrix, so that the live broadcast contour map extracted through contour detection is clearer and is easy to recognize, and the recognition efficiency of the contour image is improved.
As shown in fig. 3, it is an exemplary diagram of extracting a contour map of a live image by using a pre-trained weak semantic contour detection model. The input image comprises 256 multiplied by 192 multiplied by 3 characteristic points, and the weak semantic outline detection model comprises an encoder, a classification module (cls _ out), a decoder and 5 connection layers (skip-layer5, skip-layer4, skip-layer3, skip-layer2 and skip-layer 1);
the Encoder comprises an input layer (InConv) and 5 encoding layers (Encoder1, Encoder2, Encoder3, Encoder4 and Encoder5) which are sequentially connected, wherein the input layer and the 5 encoding layers are used for performing convolution and downsampling on the live broadcast image to obtain the contour features in the live broadcast image.
The Decoder comprises 5 decoding layers (Decoder1, Decoder2, Decoder 3, Decoder4 and Decoder 5) corresponding to the 5 coding layers and an output layer (OutConv), wherein each coding layer is connected with the corresponding decoding layer through a full connection layer, and the contour of an object in the live broadcast image is extracted by utilizing each decoding layer to up-sample the output of the coding layer and the output of the previous coding layer. The full-connection layer is connected with the coding layer and the decoding layer, so that information loss in the coding process can be avoided, the decoding layer can be combined with information which is not lost before coding of the corresponding coding layer in the decoding process, and the extracted object contour is more accurate.
As shown in fig. 4, which is a profile extracted by using the method for correcting the profile of a live broadcast image according to the present application, it can be seen that the profile of the image obtained after correction is clearer and easier to recognize, and the detection efficiency of the target object can be effectively improved.
Experiments prove that when the contour correction method of the live broadcast image is applied to most of middle-high-end mobile phones (2000+ models) for contour detection, the floating point operation times (flops) per second is 174M, the model parameter is 386.56k, and the calculation speed is about 30 ms. Therefore, the live broadcast image contour correction method can realize super real-time contour detection.
In the embodiment of the application, through extracting straight line segments in a contour map of a live image, converting the straight line segments into straight lines and acquiring intersections between every two straight lines, and combining the intersections meeting a combination condition according to a preset combination condition, the data operation amount of contour correction of the live image is reduced; through the formed position information of the four cross points with the largest rectangular area, an affine transformation matrix is obtained based on the preset target image position information and the position information of the four cross points, the contour map of the live broadcast image is corrected by using the affine transformation matrix, the contour map of the extracted live broadcast image can be quickly corrected by the contour correction method of the live broadcast image, the extracted contour map of the live broadcast image is easier to recognize, and the contour detection efficiency of the live broadcast image is improved.
Optionally, before the step of obtaining a live image to be detected, the method for correcting the contour of the live image further includes:
acquiring a preset weak semantic training sample set; the weak semantic training sample set comprises a plurality of images with complete objects and corresponding contour maps thereof, and a plurality of images with incomplete objects and corresponding contour maps thereof;
constructing a weak semantic outline detection model with an outline extraction function based on a decoding-encoding framework;
and pre-training the weak semantic contour detection model by using the weak semantic training sample set until the loss value of the weak semantic contour detection model meets the target loss, and obtaining the pre-trained weak semantic contour detection model.
The images with complete objects and the images with incomplete objects can be a set of images of indoor scenes, street scenes or sceneries shot manually, and the corresponding contour map can be an image extracted by a contour detection algorithm. In the embodiment of the present application, the contour map corresponding to the image with the incomplete object is a black non-contour image.
In one embodiment, the weak semantic training sample set is an image captured from a live scene; or the weak semantic training sample set is an image acquired in a live broadcast room, and the images with the complete objects comprise combined images of various live broadcast scenes and various complete objects. The live scenes may include, but are not limited to, desktop-type scenes (e.g., solid desktop, wooden desktop, patterned desktop, table top with sundries), handheld-type scenes (e.g., hand-held), wall-type backgrounds (e.g., solid wall, patterned wall), and lighting-changing scenes (e.g., dim light, bright light, backlight, glistening), etc. The plurality of objects may include, but are not limited to, cards (e.g., work cards, membership cards, identification cards), tickets (e.g., tickets, tax receipts, registries), electronics (e.g., cell phones, computers, tablets), and other objects (e.g., billboards, books, signs, cartons), etc.
The images with the complete objects and the corresponding contour maps thereof are used as positive samples of the weak semantic contour detection model, and the images with the incomplete objects and the corresponding contour maps thereof are used as negative samples of the weak semantic contour detection model; and pre-training the weak semantic contour detection model by using the positive sample and the negative sample until the loss value of the weak semantic contour detection model meets the target loss, and obtaining the pre-trained weak semantic contour detection model.
Specifically, the step of pre-training the weak semantic contour detection model by using the weak semantic training sample set includes: and adjusting the model parameters of the weak semantic contour detection model based on an Adam optimization algorithm.
The model parameters may include learning rate, training period, etc.
The Adam optimization algorithm is a method for calculating the adaptive learning rate of each parameter, and can iteratively update the neural network weight based on training data; in the embodiment of the application, the initial learning rate of the weak semantic contour detection model is set to be 0.001, and 200 periods (epochs) are trained; wherein, training all weak semantic training sample sets once in a training period, then attenuating the learning rate to 0.0005, attenuating the learning rate to 0.0001 after the 300 th period, performing fine-tuning (fine-tune) after the 400 th epoch, freezing all Batch Normalization layers (BN) in the network, and attenuating the learning rate to 0.00005 again, wherein the freezing of all Batch Normalization layers in the network means that the Batch Normalization layer is enabled not to participate in the network training, i.e. not to update the parameters of the Batch Normalization layer.
In the embodiment of the present application, when performing contour detection, the weak semantic contour detection model performs contour detection in a manner of dividing each pixel point of an input image into contour pixel points and non-contour pixel points, but because the number of contour pixel points is small under normal conditions, category prediction imbalance is easily caused, and therefore, in an preferred embodiment, the loss function of the weak semantic contour detection model is:
Figure BDA0003255275340000111
where L denotes a loss value, β denotes a first coefficient, and β ═ Y-|/|Y+|,|Y-L represents the number of non-contour pixels in the live image, Y+I represents the number of contour pixels in the live image, yjAnd expressing the predicted value of the weak semantic contour detection model at the j pixel point. P (y)j=1|X)=σ(aj)∈[0,1],P(yj=0|X)=σ(aj)∈[0,1]And sigma (#) represents a sigmoid function, wherein the sigmoid function is an activation function used as a neural network, variables are mapped between 0 and 1, and a cross entropy loss function with class balance is introduced to be used as a loss function of the weak semantic outline detection model, so that class prediction unbalance is avoided, and the accuracy of outline pixel point detection is improved.
Example two:
the present embodiment provides a live video contour correction apparatus, which can be used to execute the live video contour correction method according to the first embodiment of the present application. For details not disclosed in the present embodiment, please refer to embodiment one of the present application.
Referring to fig. 5, fig. 5 is a schematic structural diagram of a contour correction device for live images according to an embodiment of the present application. The live image contour correction device can operate in a server or a live client. The contour correction device for live images comprises:
an obtaining module 301, configured to obtain a to-be-detected live broadcast image;
an outline drawing obtaining module 302, configured to perform outline extraction on the live broadcast image, and obtain an outline drawing of the live broadcast image;
a straight line segment extraction module 303, configured to extract a straight line segment in a contour map of the live broadcast image;
an intersection obtaining module 304, configured to convert the straight line segments into straight lines, obtain position information of an intersection between each two straight lines, and merge intersections that meet the merge condition according to a preset merge condition;
a position information obtaining module 305, configured to obtain the rectangular area formed by every four intersections, and obtain position information of the four intersections with the largest rectangular area;
the correction module 306 is configured to obtain an affine transformation matrix based on preset position information of the target image and the position information of the four intersections, correct the contour map of the live broadcast image by using the affine transformation matrix, and obtain a corrected contour map of the live broadcast image.
According to the method and the device, the live broadcast image to be detected is obtained, whether the live broadcast image comprises at least one complete object is determined by utilizing the classification module in the trained weak semantic outline detection model, the live broadcast image comprises the at least one complete object when the live broadcast image is detected, the object outline in the live broadcast image is extracted through the decoder, the weak semantic outline detection model only focuses on the complete and obvious target object, the operation amount in outline detection can be effectively reduced, and the detection efficiency is improved; and for the live broadcast image without a complete object, the contour detection is stopped in time, so that the false detection rate is reduced.
Example three:
the present embodiment provides an electronic device, which can be used to execute all or part of the steps of the live image contour correction method according to the first embodiment of the present application. For details not disclosed in the present embodiment, please refer to embodiment one of the present application.
Referring to fig. 6, fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure. The electronic device 400 may be, but is not limited to, a combination of one or more of various servers, personal computers, laptops, smartphones, tablets, and the like.
In the preferred embodiment of the present application, the electronic device 400 includes a memory 401, at least one processor 402, at least one communication bus 403, and a transceiver 404.
Those skilled in the art will appreciate that the configuration of the electronic device shown in fig. 6 is not limited to the embodiments of the present application, and may be a bus-type configuration or a star-type configuration, and that the electronic device 400 may include more or less hardware or software than those shown, or a different arrangement of components.
In some embodiments, the electronic device 400 is a device capable of automatically performing numerical calculation and/or information processing according to preset or stored instructions, and the hardware includes but is not limited to a microprocessor, an application specific integrated circuit, a programmable gate array, a digital processor, an embedded device, and the like. The electronic device 400 may also include a client device, which includes, but is not limited to, any electronic product capable of interacting with a client through a keyboard, a mouse, a remote controller, a touch pad, or a voice control device, for example, a personal computer, a tablet computer, a smart phone, a digital camera, and the like.
It should be noted that the electronic device 400 is only an example, and other existing or future electronic products, such as those that may be adapted to the present application, are also included in the scope of the present application and are incorporated by reference herein.
In some embodiments, the memory 401 stores therein a computer program, which when executed by the at least one processor 402 implements all or part of the steps of the method for correcting contours of live images according to the first and second embodiments. The Memory 401 includes a Read-Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), a One-time Programmable Read-Only Memory (OTPROM), an electronically Erasable rewritable Read-Only Memory (Electrically-Erasable Programmable Read-Only Memory (EEPROM)), an optical Read-Only Memory (CD-ROM) or other optical disk Memory, a magnetic disk Memory, a tape Memory, or any other medium readable by a computer that can be used to carry or store data.
In some embodiments, the at least one processor 402 is a Control Unit (Control Unit) of the electronic device 400, connects various components of the electronic device 400 by using various interfaces and lines, and executes various functions and processes data of the electronic device 400 by running or executing programs or modules stored in the memory 401 and calling data stored in the memory 401. For example, the at least one processor 402, when executing the computer program stored in the memory, implements all or part of the steps of the method for contour correction of live images described in the embodiments of the present application; or to implement all or part of the functions of a contour correction device for live images. The at least one processor 402 may be composed of an integrated circuit, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital processing chips, graphics processors, and combinations of various control chips.
In some embodiments, the at least one communication bus 403 is arranged to enable connectivity communication between the memory 401 and the at least one processor 402, and the like.
The electronic device 400 may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.
Example four:
the present embodiment provides a computer-readable storage medium, on which a computer program is stored, where the instructions are suitable for being loaded by a processor and executing the method for correcting the contour of a live broadcast image in the first embodiment of the present application, and a specific execution process may refer to a specific description of the first embodiment, which is not described herein again.
For the apparatus embodiment, since it basically corresponds to the method embodiment, reference may be made to the partial description of the method embodiment for relevant points. The above-described device embodiments are merely illustrative, wherein the components described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the application. One of ordinary skill in the art can understand and implement it without inventive effort.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (10)

1. A contour correction method for a live image is characterized by comprising the following steps:
acquiring a live broadcast image to be detected;
carrying out contour extraction on the live broadcast image to obtain a contour map of the live broadcast image;
extracting straight line segments in a contour map of the live image;
converting the straight line segments into straight lines, acquiring position information of a cross point between every two straight lines, and combining the cross points meeting the combination condition according to a preset combination condition;
acquiring the rectangular area formed by every four intersection points, and acquiring the position information of the four intersection points with the largest rectangular area;
and acquiring an affine transformation matrix based on preset target image position information and position information of the four cross points, and correcting the contour map of the live broadcast image by using the affine transformation matrix to acquire the corrected contour map of the live broadcast image.
2. The method for correcting the contour of a live image according to claim 1, comprising: the step of extracting the straight line segment in the contour map of the live image comprises the following steps:
based on a Gaussian down-sampling method, down-sampling the silhouette of the live broadcast image to a preset image scale;
acquiring gradient values and gradient direction values of all pixel points of a contour map of the live broadcast image;
eliminating pixel points with gradient values smaller than a preset gradient threshold value, and selecting the pixel points with the maximum gradient values as seed points;
determining a direction value range based on the gradient direction value of the seed point and a preset range threshold, and obtaining pixel points of the gradient direction value in the direction value range to obtain a plurality of same-sex points;
generating a rectangle comprising the plurality of same-sex points based on the position information of the plurality of same-sex points;
acquiring the length and the width of the rectangle, and calculating the density of the same-polarity points of the rectangle according to the number of the same-polarity points in the rectangle;
if the density of the same-polarity points is greater than or equal to a set density threshold value, obtaining an error value of the rectangle in the contour map based on a fitted rectangle precision calculation function;
if the error value is smaller than or equal to a preset threshold value, taking a straight line segment of the rectangle as a straight line segment in a contour map of the live image; and if the error value is larger than the preset threshold, adjusting the side length of the rectangle until the error value of the rectangle obtained based on the fitted rectangle precision calculation function is smaller than or equal to the preset threshold.
3. The method for correcting the contour of a live image according to claim 1, comprising: the step of converting the straight line segment into a straight line specifically comprises:
acquiring the slope and intercept of the straight line corresponding to the position information of the straight line section end point;
and acquiring a straight line corresponding to the straight line segment according to the slope and the intercept.
4. The method for correcting the contour of a live image according to claim 3, comprising: after the step of obtaining the straight line corresponding to the straight line segment, the method further comprises the following steps:
according to the slope and the intercept, combining straight lines meeting preset combining conditions;
wherein the preset merging condition comprises: the slope difference value of at least two straight lines is within a preset slope difference range, and the intercept difference value of at least two straight lines is within an intercept difference range.
5. The method for correcting the contour of a live image according to claim 1, comprising: the step of combining the intersections satisfying the combining condition according to a preset combining condition includes:
when the distance between at least two intersection points is smaller than a set threshold value, acquiring the mean value of the position information of the at least two intersection points;
and combining the at least two intersections, and generating a combined intersection according to the average value of the position information of the at least two intersections.
6. The method for correcting the contour of a live image according to claim 1, comprising: after the step of obtaining the rectangular area formed by every four intersection points, the method further comprises the following steps:
acquiring a rectangle formed by every four intersections, and acquiring a rectangle which meets a preset rectangle screening condition;
wherein, the preset rectangular screening conditions comprise: the included angle of adjacent sides of the rectangle is larger than a set included angle threshold value, the length proportion of the opposite sides of the rectangle is larger than a set proportion threshold value, at least one group of opposite sides of the rectangle are parallel, and the length-width ratio of the rectangle is larger than a set length-width ratio threshold value.
7. The method for rectifying contours of live images according to any one of claims 1 to 6, wherein the step of extracting contours of the live images includes:
acquiring contour features in the live broadcast image through an encoder in a pre-trained weak semantic contour detection model; wherein the weak semantic contour detection model comprises an encoder, a classification module and a decoder;
determining, by the classification module, whether the live image contains at least one complete object according to the contour features;
and if detecting that the live image contains at least one complete object, extracting the object outline in the live image through the decoder.
8. A device for contouring live images, said device comprising:
the acquisition module is used for acquiring a live image to be detected;
the contour map acquisition module is used for extracting contours of the live broadcast images and acquiring contour maps of the live broadcast images;
the straight line segment extraction module is used for extracting straight line segments in a contour map of the live broadcast image;
the intersection acquisition module is used for converting the straight line segments into straight lines, acquiring the position information of an intersection between every two straight lines and combining the intersections meeting the combination conditions according to preset combination conditions;
the position information acquisition module is used for acquiring the rectangular area formed by every four intersections and acquiring the position information of the four intersections with the largest rectangular area;
and the correction module is used for acquiring an affine transformation matrix based on preset target image position information and the position information of the four cross points, correcting the contour map of the live broadcast image by using the affine transformation matrix and acquiring the corrected contour map of the live broadcast image.
9. An electronic device, comprising: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform a method of contouring live images as claimed in any one of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements a method for contour correction of a live image according to any one of claims 1 to 7.
CN202111057866.2A 2021-09-09 2021-09-09 Live image contour correction method, device, equipment and storage medium Pending CN113808040A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111057866.2A CN113808040A (en) 2021-09-09 2021-09-09 Live image contour correction method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111057866.2A CN113808040A (en) 2021-09-09 2021-09-09 Live image contour correction method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113808040A true CN113808040A (en) 2021-12-17

Family

ID=78940589

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111057866.2A Pending CN113808040A (en) 2021-09-09 2021-09-09 Live image contour correction method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113808040A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116030080A (en) * 2023-02-03 2023-04-28 北京博睿恩智能科技有限公司 Remote sensing image instance segmentation method and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116030080A (en) * 2023-02-03 2023-04-28 北京博睿恩智能科技有限公司 Remote sensing image instance segmentation method and device
CN116030080B (en) * 2023-02-03 2023-08-22 北京博睿恩智能科技有限公司 Remote sensing image instance segmentation method and device

Similar Documents

Publication Publication Date Title
CN112348815B (en) Image processing method, image processing apparatus, and non-transitory storage medium
CN108121986B (en) Object detection method and device, computer device and computer readable storage medium
WO2020221013A1 (en) Image processing method and apparaus, and electronic device and storage medium
US10846870B2 (en) Joint training technique for depth map generation
CN108229322B (en) Video-based face recognition method and device, electronic equipment and storage medium
CN108304775B (en) Remote sensing image recognition method and device, storage medium and electronic equipment
CN110135424B (en) Inclined text detection model training method and ticket image text detection method
US10832069B2 (en) Living body detection method, electronic device and computer readable medium
US9454714B1 (en) Sequence transcription with deep neural networks
CN108875537B (en) Object detection method, device and system and storage medium
CN108876804B (en) Matting model training and image matting method, device and system and storage medium
CN111008935B (en) Face image enhancement method, device, system and storage medium
CN113807361B (en) Neural network, target detection method, neural network training method and related products
CN110827320B (en) Target tracking method and device based on time sequence prediction
CN110427915B (en) Method and apparatus for outputting information
CN110399882A (en) A kind of character detecting method based on deformable convolutional neural networks
CN112101344B (en) Video text tracking method and device
CN113297956A (en) Gesture recognition method and system based on vision
CN114495006A (en) Detection method and device for left-behind object and storage medium
CN113808040A (en) Live image contour correction method, device, equipment and storage medium
CN113808151A (en) Method, device and equipment for detecting weak semantic contour of live image and storage medium
CN114820755B (en) Depth map estimation method and system
CN114419322A (en) Image instance segmentation method and device, electronic equipment and storage medium
CN114511877A (en) Behavior recognition method and device, storage medium and terminal
CN114429602A (en) Semantic segmentation method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination