CN114998903A

CN114998903A - Text occlusion area detection method and device, electronic equipment and storage medium

Info

Publication number: CN114998903A
Application number: CN202210571467.6A
Authority: CN
Inventors: 常战国; 吕一; 邓天生; 贠挺
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-05-24
Filing date: 2022-05-24
Publication date: 2022-09-02

Abstract

The disclosure provides a text occlusion region detection method and device, electronic equipment and a storage medium, relates to the technical field of image processing, in particular to the technical field of computer vision, image recognition and text processing, and can be applied to scenes such as text occlusion region detection in pictures or videos. The specific implementation scheme comprises the following steps: acquiring a first image to be detected; merging the pixel points of which the distance between the pixel values in the first image is smaller than a preset distance threshold value, and repeatedly executing the step of merging the pixel points of the first image until a preset condition is met to obtain a plurality of clustered pixel values; carrying out example segmentation on the first image according to the clustering pixel values to obtain a plurality of color example segmentation areas which correspond to the clustering pixel values one by one; and determining a text occlusion area in the first image from the plurality of color instance segmentation areas according to the overlapping relation between the color instance segmentation areas and the text box in the first image. The method and the device can improve the speed of detecting the text shielding area and reduce the false detection rate.

Description

Text occlusion area detection method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to the field of computer vision, image recognition, and text processing technologies, which can be applied to scenes such as text occlusion region detection in pictures or videos, and in particular, to a method and an apparatus for detecting a text occlusion region, an electronic device, and a storage medium.

Background

In some websites or interfaces of applications (e.g., a caf), a user may post information such as pictures, videos, and the like. When the text appearing in the picture or video relates to some private information or sensitive information, the user usually uses some patterns (such as expressions, graffiti, etc.) to perform the occlusion. The area in which these patterns are located in the picture or video may be referred to as a text occlusion area.

When a provider of a website or an application program checks whether information such as pictures, videos and the like published by a user meets the publishing requirement, whether a text-shielding area exists in the information such as the pictures, the videos and the like published by the user needs to be detected first for further processing, such as removing shielding patterns of the text-shielding area.

The existing text occlusion area detection mode is low in detection speed and high in false detection rate.

Disclosure of Invention

The disclosure provides a method and a device for detecting a text occlusion area, an electronic device and a storage medium, which can improve the speed of detecting the text occlusion area and reduce the false detection rate.

According to a first aspect of the present disclosure, there is provided a text occlusion region detection method, the method including: acquiring a first image to be detected. And combining the pixel points of which the distance between the pixel values in the first image is smaller than a preset distance threshold value, and repeatedly executing the step of combining the pixel points of the first image until a preset condition is met to obtain a plurality of clustered pixel values. And carrying out example segmentation on the first image according to each clustering pixel value to obtain a plurality of color example segmentation areas which correspond to the clustering pixel values one by one. And determining a text occlusion area in the first image from the plurality of color instance segmentation areas according to the overlapping relation between each color instance segmentation area and the text box in the first image.

According to a second aspect of the present disclosure, there is provided a text occlusion region detection apparatus, the apparatus comprising: the device comprises an acquisition unit, a clustering unit, a segmentation unit and a detection unit.

The acquisition unit is used for acquiring a first image to be detected. And the clustering unit is used for merging the pixel points of which the distance between the pixel values in the first image is smaller than a preset distance threshold value, and repeatedly executing the step of merging the pixel points of the first image until a preset condition is met to obtain a plurality of clustered pixel values. And the segmentation unit is used for performing example segmentation on the first image according to each clustering pixel value to obtain a plurality of color example segmentation areas corresponding to the clustering pixel values one to one. And the detection unit is used for determining a text occlusion area in the first image from the multiple color example segmentation areas according to the overlapping relation between each color example segmentation area and the text box in the first image.

According to a third aspect of the present disclosure, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the first aspect.

According to a fourth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method according to the first aspect.

According to a fifth aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method according to the first aspect.

The method comprises the steps of acquiring a first image to be detected; merging the pixel points of which the distance between the pixel values in the first image is smaller than a preset distance threshold value, and repeatedly executing the step of merging the pixel points of the first image until a preset condition is met to obtain a plurality of clustered pixel values; performing example segmentation on the first image according to each clustering pixel value to obtain a plurality of color example segmentation areas corresponding to the clustering pixel values one by one; according to the overlapping relation between each color example segmentation area and the text box in the first image, the text occlusion area in the first image is determined from the multiple color example segmentation areas, and therefore unified detection on different types of text occlusion areas can be achieved. When the method provided by the disclosure is used for detecting the text occlusion area of the first image, the method can have a lower false detection rate. Moreover, the method provided by the disclosure greatly improves the speed of detecting the text occlusion area by improving the clustering speed of the color clustering process.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

fig. 1 is a schematic flow chart of a text occlusion region detection method according to an embodiment of the present disclosure;

fig. 2 is a schematic flow chart of an implementation manner of S102 in fig. 1 according to an embodiment of the present disclosure;

fig. 3 is a schematic flowchart of an implementation manner of S103 in fig. 1 according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of a first image provided by an embodiment of the present disclosure;

fig. 5 is a schematic diagram of a binarized image of the first image shown in fig. 4 according to an embodiment of the present disclosure;

fig. 6 is a schematic flowchart of another implementation manner of S103 in fig. 1 according to an embodiment of the present disclosure;

FIG. 7 is a schematic diagram of a text box in the first image shown in FIG. 4 provided by an embodiment of the present disclosure;

fig. 8 is a schematic diagram of a color example segmentation region extracted from the binarized image shown in fig. 5 according to an embodiment of the present disclosure;

fig. 9 is another schematic flow chart of a text occlusion region detection method according to an embodiment of the present disclosure;

fig. 10 is a schematic composition diagram of a text occlusion region detection apparatus according to an embodiment of the present disclosure;

fig. 11 is another schematic diagram illustrating a component of the device for detecting a text-covered area according to the embodiment of the present disclosure;

fig. 12 is a schematic diagram of another component of the device for detecting a text-covered area according to the embodiment of the present disclosure;

fig. 13 is another schematic diagram illustrating a component of the device for detecting a text-covered area according to the embodiment of the present disclosure;

fig. 14 shows a schematic block diagram of an example electronic device 1400 that may be used to implement embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In some websites or interfaces of applications (e.g., a caf), a user may post information such as pictures, videos, and the like. Providers of websites or applications generally need to review information such as pictures, videos, etc. published by users to determine whether they meet the publishing requirements. For example, providers of websites or applications may use Optical Character Recognition (OCR) algorithms to recognize whether private information or sensitive information (such as sensitive words or sensitive names) exists in information such as pictures, videos and the like published by users, so as to determine whether the information such as the pictures, videos and the like published by the users meets the publishing requirements.

When the text appearing in the picture or video relates to some private information or sensitive information, the user usually uses some patterns (such as expressions, graffiti, etc.) to perform the occlusion. The area in which these patterns are located in the picture or video may be referred to as a text occlusion area. For such pictures or videos containing text occlusion regions, OCR algorithms cannot achieve accurate recognition. Therefore, when the provider of the website or the application program audits the information such as the picture, the video and the like published by the user, it is also necessary to determine whether a text occlusion region exists in the information such as the picture, the video and the like published by the user, so as to perform further processing, such as removing an occlusion pattern of the text occlusion region.

In an actual scene, the occlusion patterns used by the user are numerous and have different colors, and text occlusion areas exist in different types. The current text occlusion area detection mode cannot realize uniform identification of different types of text occlusion areas, and is low in detection speed and high in false detection rate.

In this background technology, the embodiments of the present disclosure provide a method for detecting a text occlusion area, which may be applied to a scene in which a text occlusion area in an image is detected. When the method is used for detecting the text occlusion area of the image, the false detection rate can be lower, and the method can greatly improve the speed of detecting the text occlusion area by improving the clustering speed of the color clustering process in the detection of the text occlusion area.

Optionally, an execution main body of the text occlusion area detection method provided by the embodiment of the present disclosure may be a terminal device such as a mobile phone, a tablet computer, a notebook computer, a desktop computer, or the like, or may be a server or other computing devices with image processing capability, for example, the server may be a backend server providing services for a website or an application program. The present disclosure does not limit the execution subject of the text occlusion region detection method.

In some embodiments, the server may be a single server, or may be a server cluster composed of a plurality of servers. In some embodiments, the server cluster may also be a distributed cluster. The present disclosure is also not limited to a specific implementation of the server.

It should be understood that in the embodiments of the present disclosure, the character "/" generally indicates that the former and latter associated objects are in an "or" relationship. The terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated.

Fig. 1 is a schematic flow chart of a text occlusion region detection method provided in the embodiment of the present disclosure. As shown in fig. 1, the method may include:

s101, acquiring a first image to be detected.

The first image may refer to a picture that a user wants to publish in an interface of some website or application program, or a certain frame of image in a video. The provider of the website or application may review the first image to determine whether the first image meets release requirements.

For example, when a user wants to publish a picture in a bar, the picture may be submitted to a background server of the bar, and the background server of the bar may identify the picture according to a preset OCR algorithm to determine whether the picture meets the publishing requirement. This picture is the first image described in S101.

For another example, when a user wants to publish a video in a short video application, the user may submit the video to a background server of the short video application, and the background server of the short video application may identify each frame of image in the video according to a preset OCR algorithm to determine whether the video meets the publishing requirement. Each frame image in the video may be the first image described in S101.

The present disclosure does not limit the first image to be a picture or an image in a video.

S102, combining the pixel points of which the distance between the pixel values in the first image is smaller than a preset distance threshold value, and repeatedly executing the step of combining the pixel points of the first image until a preset condition is met to obtain a plurality of clustering pixel values.

Alternatively, the distance between the pixel values described in the embodiments of the present disclosure may be a cosine distance, a euclidean distance, or the like, which is not limited by the present disclosure. In the following examples of the respective embodiments, the following description will be given taking the distance between pixel values as an euclidean distance as an example.

In some embodiments, the step of merging the pixels in the first image whose distance between the pixel values is smaller than the preset distance threshold in S102 may include: and calculating the distance between the pixel values of the 2 pixel points by taking the 2 pixel points as a calculation unit, and merging the 2 pixel points when the distance between the pixel values of the 2 pixel points is smaller than a preset distance threshold value until all the pixel points in the first image are traversed. After the 2 pixels are combined, the pixel values of the 2 pixels are the same.

The step of repeatedly executing the step of merging the pixel points of the first image until a preset condition is met may be: and after traversing the pixel points of the first image for the first time according to the merging mode of the pixel points, continuing traversing the pixel points of the first image for merging according to the merging mode of the pixel points, and stopping merging the pixel points until a preset condition is met.

In S102, the step of merging the pixels of the first image is repeatedly executed until a preset condition is satisfied, the pixel values of some pixels in the first image are the same, the pixel values of all pixels in the first image may include a plurality of pixels, and these pixel values may be referred to as a cluster pixel value.

For example, assume that the original image includes 224 × 224 (where × represents multiplied) pixels, the pixel values of 224 × 224 pixels are partially the same, and the first image has a total of 100 pixel values. After S102 is performed on the first image, 8 pixel values of the pixel point are left, and the 8 pixel values may be referred to as a cluster pixel value.

In a possible implementation manner, merging pixel points of which the distance between pixel values in the first image is smaller than a preset distance threshold may include: for any two target pixel points in the first image, when the distance between the pixel values of the two target pixel points is smaller than a preset distance threshold, the pixel values of the two target pixel points are respectively assigned as the mean value of the pixel values of the two target pixel points.

For example, for the pixel 1 and the pixel 2, when the euclidean distance between the pixel values of the pixel 1 and the pixel 2 is smaller than the preset distance threshold, the pixel values of the pixel 1 and the pixel 2 may be assigned as the mean values of the pixel 1 and the pixel 2, respectively.

In other possible implementation manners, when the distance between the pixel values of the two target pixel points is smaller than the preset distance threshold, the pixel values of the two target pixel points may also be respectively assigned to a certain value calculated according to the pixel values of the two target pixel points.

For example, the value calculated according to the pixel values of the two target pixel points may be a maximum value, a minimum value, and the like of the pixel values of the two target pixel points, which is not limited herein.

Alternatively, the preset distance threshold may be determined according to a value of a hue-saturation-value (HSV) standard color space, and the size of the preset distance threshold is not limited herein.

In this embodiment, the preset condition may include: the distance between any two pixel points in the first image is smaller than a preset distance threshold (here, the preset distance threshold may also be replaced by a value larger than or smaller than the preset distance threshold), or the iteration number of repeatedly executing the step of combining the pixel points of each second image reaches the preset number. For example, the preset number of times may be 10 times, 15 times, etc., and the size of the preset number of times is not limited herein.

For example, the preset distance threshold may be determined according to a color-rich discrimination of the RGB color space, such as 300.

In some other embodiments, the preset distance threshold described in S102 includes a first distance threshold and a second distance threshold, and the preset condition includes a first preset condition and a second preset condition. Fig. 2 is a schematic flowchart of an implementation manner of S102 in fig. 1 according to an embodiment of the present disclosure. As shown in fig. 2, S102 may include:

s201, dividing the first image into a plurality of second images.

For example, assuming that the size of the first image is 224 × 224, the first image may be divided into 14 × 14 small maps, each small map being the second image, by 16 × 16.

Alternatively, after the first image is divided into the plurality of second images, the size conversion (reshape) may be performed on each of the second images, and the second images may be converted into image data of (img _ w × img _ h, 3). Where img _ w represents the width of the second image, img _ h represents the height of the second image, and 3 represents that the number of color channels of the second image is 3, i.e., R, G, B three channels.

S202, combining the pixel points of which the distance between the pixel values in each second image is smaller than the first distance threshold, and repeatedly executing the step of combining the pixel points of each second image until the first preset condition is met.

S202 is similar to the step of combining the pixels of which the distance between the pixel values in the first image is smaller than the preset distance threshold in the previous embodiment.

For example, the step of combining the pixel points in each second image, where the distance between the pixel values is smaller than the first distance threshold in S202, may include: and for each second image, calculating the distance between the pixel values of 2 pixels by taking 2 pixels as a calculation unit, and merging the 2 pixels when the distance between the pixel values of the 2 pixels is smaller than a first distance threshold value until all the pixels in the second image are traversed. After the 2 pixels are combined, the pixel values of the 2 pixels are the same.

The step of repeatedly executing the step of merging the pixel points of each second image until the first preset condition is met may be: and for each second image, after traversing the pixel points of the second image for the first time according to the merging mode of the pixel points, continuing traversing the pixel points of the second image according to the merging mode of the pixel points for merging, and stopping merging the pixel points until a first preset condition is met.

In S202, the step of combining the pixel points of each second image is repeatedly executed until the first preset condition is satisfied, the pixel values of some pixel points in the second image are the same, and the pixel values of all the pixel points in the second image may include a plurality of pixel values in total.

Optionally, similar to the preset conditions described in the previous embodiment, the first preset condition may include: and the distance between any two pixel points in each second image is smaller than a third distance threshold, or the iteration times of the step of repeatedly executing and combining the pixel points of each second image reach a first time.

For example, the first number may be determined according to the size of the first image and the number of the second images. For example, the size of the first image is 224 × 224, and when the first image is divided into 14 × 14 second images according to the size of 16 × 16, the first count may be the value obtained by dividing 224 by 14 minus 1, that is, 16-1 or 15 times.

The third distance threshold may be equal to the first distance threshold, or the third distance threshold may be smaller than or greater than the first distance threshold. The first number may be a predetermined number of times as described in the previous embodiment. The determination method of the first distance threshold is similar to the determination method of the preset distance threshold, and is not repeated.

Optionally, the step of combining the pixel points in each second image, of which the distance between the pixel values is smaller than the first distance threshold value in S202, may include: for any two target pixel points in each second image, when the distance between the pixel values of the two target pixel points is smaller than the first distance threshold, the pixel values of the two target pixel points are respectively assigned as the mean value of the pixel values of the two target pixel points, or a certain value is obtained through calculation according to the pixel values of the two target pixel points.

After the step of combining the pixel points of each second image is repeatedly executed until the first preset condition is satisfied, S203 is executed.

And S203, merging the pixel points of which the distances between the pixel values in the plurality of second images are smaller than a second distance threshold, and repeatedly executing the step of merging the pixel points of the plurality of second images until a second preset condition is met.

S203 is similar to S202, except that S202 is executed for each second image, and S203 is executed for a plurality of second images. Specifically, the two pixels merged in S202 are necessarily from the same second image, and the two pixels merged in S203 may be from the same second image or from different second images.

For example, the step of combining the pixel points in the plurality of second images, of which the distances between the pixel values are smaller than the second distance threshold value in S203, may include: and for the plurality of second images, calculating the distance between the pixel values of 2 pixels by taking 2 pixels as a calculation unit, and merging the 2 pixels when the distance between the pixel values of the 2 pixels is smaller than a second distance threshold value until all the pixels in the plurality of second images (namely all the second images) are traversed. After the 2 pixels are combined, the pixel values of the 2 pixels are the same.

The step of repeatedly executing the step of merging the pixel points of the plurality of second images until a second preset condition is met may be: and for the plurality of second images, after traversing the pixel points of the plurality of second images for one time according to the merging mode of the pixel points, continuously traversing the pixel points of the plurality of second images according to the merging mode of the pixel points for merging, and stopping merging the pixel points until a second preset condition is met.

In S203, the step of combining the pixel points of the plurality of second images is repeatedly executed until a second preset condition is satisfied, pixel values of some pixel points in the plurality of second images are the same, and the pixel values of all the pixel points in the plurality of second images may include a plurality of pixel values, which may be referred to as a cluster pixel value.

Alternatively, similar to the first preset condition, the second preset condition may include: and the distance between any two pixel points in the second images is smaller than a fourth distance threshold value, or the iteration times of the step of repeatedly executing and combining the pixel points of the second images reach a second time.

The fourth distance threshold may be equal to the second distance threshold, or the fourth distance threshold may also be smaller than or greater than the second distance threshold. The second number may also be a predetermined number of times as described in the previous embodiment. The determination method of the second distance threshold is similar to that of the first distance threshold, and is not repeated.

Optionally, the step of combining the pixel points in the plurality of second images, of which the distances between the pixel values are smaller than the second distance threshold in S203, may include: and for any two target pixel points in the plurality of second images, when the distance between the pixel values of the two target pixel points is smaller than a second distance threshold value, assigning the pixel values of the two target pixel points to be the mean value of the pixel values of the two target pixel points, or calculating a certain value according to the pixel values of the two target pixel points.

In some possible implementations, the first distance threshold may be equal to the second distance threshold. That is, the conditions for merging the pixels in S202 and S203 may be the same. In the implementation manner, the first distance threshold and the second distance threshold are the same, so that parameters in the text occlusion region detection algorithm can be reduced.

In some possible implementations, the third distance threshold may be equal to the fourth distance threshold, and the first number of times is equal to the second number of times. In this implementation, the third distance threshold and the fourth distance threshold are the same, and the first number and the second number are the same, and parameters in the text occlusion region detection algorithm may also be reduced.

Compared with the implementation of S102 in the previous embodiment, in this embodiment, in combination with the implementation of S102 shown in fig. 2, by dividing the first image into a plurality of second images and then performing subsequent processing of merging the pixel points, the operation speed in S102 can be effectively increased, so that the speed of detecting the whole text-shielded area is increased.

It is understood that the implementation manner of S102 given in the above description of S102 is actually that colors in the first image are clustered, and a plurality of clustered pixel values obtained by S102 can be regarded as a plurality of clustering results.

Compared with a common k-meas clustering algorithm, the implementation manner of S102 in the embodiment of the present disclosure can be regarded as a fast clustering algorithm for clustering colors, and the fast clustering algorithm does not need to set the number of color categories in advance, and the clustering speed is greatly improved.

After performing S102 to obtain a plurality of clustered pixel values, S103 may be performed.

S103, carrying out example segmentation on the first image according to each clustering pixel value to obtain a plurality of color example segmentation areas corresponding to the clustering pixel values one to one.

Fig. 3 is a schematic flowchart of an implementation manner of S103 in fig. 1 according to an embodiment of the present disclosure. As shown in fig. 3, in one implementation, S103 may include:

s301, according to each clustering pixel value, carrying out binarization processing on the first image to obtain a binarization image corresponding to each clustering pixel value.

Exemplarily, S301 may include: converting the clustering pixel values into HSV format to obtain HSV values corresponding to the clustering pixel values; mapping HSV values corresponding to the clustering pixel values based on an HSV color comparison table to determine upper color boundary values and lower color boundary values corresponding to the clustering pixel values; and carrying out binarization processing on the first image according to the color upper boundary value and the color lower boundary value to obtain a binarized image. The HSV value corresponding to the clustering pixel value is the HSV component of the color corresponding to the clustering pixel value, and the upper color boundary value and the lower color boundary value corresponding to the clustering pixel value can be determined through a preset strict range HSV color comparison table.

S302, extracting a connected region from the binary image corresponding to each cluster pixel value to obtain a color instance segmentation region corresponding to each cluster pixel value.

For example, taking a screenshot of a chat interface in which a first image is a certain chat application as an example, fig. 4 is a schematic diagram of the first image provided by the embodiment of the present disclosure. As shown in fig. 4, the first image may include an avatar of the chat user, chat content, two smiley face emoticons, and the like. Fig. 5 is a schematic diagram of a binarized image of the first image shown in fig. 4 according to an embodiment of the present disclosure. Assume that the first image shown in fig. 4 is binarized by using a certain cluster pixel value, and then the binarized image shown in fig. 5 is obtained. The connected component extracted from the binarized image shown in fig. 5 in S301 may include: a region where the smiling face expression is located (i.e., a white region having the same contour as the smiling face expression in fig. 5), a region where the avatar of the chat user is located (i.e., a white region having the same contour as the avatar of the chat user in fig. 5), and the like, a region where the smiling face expression is located and a region where the avatar of the chat user is located (i.e., a color instance division region).

Fig. 6 is a schematic flowchart of another implementation manner of S103 in fig. 1 according to an embodiment of the present disclosure. As shown in fig. 6, in another implementation, S103 may include:

s601, performing color dimension reduction on the first image according to each cluster pixel value to obtain a first image after color dimension reduction.

Exemplarily, a canvas may be newly generated in S601, and values of all pixel points in the first image are assigned to the newly generated canvas according to each clustered pixel value, for example: and assigning the value of the pixel point to which clustering pixel value the pixel value of the pixel point is close to. And the image obtained after assignment is finished is the first image after color dimension reduction. The first image after color dimensionality reduction only has a plurality of RGB values of the clustering pixel values.

For example, assuming that the size of the first image is (224, 224, 3), the 1 st 224 represents the width of the first image, the 2 nd 224 represents the height of the first image, and the 3 rd represents the channel of the first image, the color (or pixel value of each pixel point) set a of the first image (or referred to as a cluster center point color) and the cluster pixel values (or referred to as cluster center point colors) obtained in S102 are represented as a set B, and the set a and the set B are respectively as follows:

a [ [ (r1, g1, b1), (r2, g2, b2) … (r224, g224, b224) ], [ (rr1, gg1, bb1) … (rr224, gg224, bb224) ], [ … ] ], i.e. the first image comprises 224 × 224 pixel points.

B ═ [ (R1, G1, B1), (R2, G2, B2), (R3, G3, B3), (R4, G4, B4), (R5, G5, B5) ], i.e. 5 clustered pixel values were obtained in S102.

Then, in S601, color dimensionality reduction is performed on the set a (i.e., the color of the first image) according to each clustered pixel value in the set B, and the color-dimensionality-reduced first image includes only 5 colors, i.e., (R1, G1, B1), (R2, G2, B2), (R3, G3, B3), (R4, G4, B4), and (R5, G5, B5).

And S602, performing binarization processing on the first image subjected to color dimensionality reduction according to each cluster pixel value to obtain a binarization image corresponding to each cluster pixel value.

S603, extracting a connected region from the binary image corresponding to each cluster pixel value to obtain a color example segmentation region corresponding to each cluster pixel value.

S602 and S603 may refer to S301 and S302 specifically, and are not described again.

Compared with the implementation of S103 shown in fig. 3, in the implementation of S103 shown in fig. 6, performing color dimensionality reduction on the first image according to each clustered pixel value can reduce the difficulty of performing subsequent color instance segmentation on the first image, and improve the processing speed of the subsequent color instance segmentation process. That is, when the binarization processing is performed on the first image after the color dimensionality reduction according to each cluster pixel value, the processing speed can be higher. For example, for some first images with many colors, it is generally difficult to perform color instance segmentation by color, and the implementation manner of S103 shown in fig. 6 can effectively reduce the difficulty of color instance segmentation and increase the processing speed of color instance segmentation.

In addition, after the dimension reduction is performed on the first image according to the implementation manner of S103 shown in fig. 6, the color instance segmentation is performed, so that the false detection rate of the whole text occlusion area detection can be effectively reduced.

After obtaining a plurality of color instance division areas in one-to-one correspondence with a plurality of clustered pixel values, S104 may be performed.

And S104, determining a text occlusion area in the first image from the multiple color example segmentation areas according to the overlapping relation between each color example segmentation area and the text box in the first image.

Exemplarily, taking the first image shown in fig. 4 as an example, fig. 7 is a schematic diagram of a text box in the first image shown in fig. 4 provided by the embodiment of the present disclosure. As shown in fig. 7, the black dashed box shown in fig. 7 is a text box in the first image shown in fig. 4.

Taking the binarized image shown in fig. 5 as an example, fig. 8 is a schematic diagram of a color example segmentation region extracted from the binarized image shown in fig. 5 according to an embodiment of the present disclosure. As shown in fig. 8, the area where the white dashed line box shown in fig. 8 is located is the color instance segmentation area extracted from the binarized image shown in fig. 5.

For each text box shown in fig. 7, it may be determined whether the color instance division area shown in fig. 8 is a text occlusion area according to the overlapping relationship of the color instance division area shown in fig. 8 and the text box shown in fig. 7.

Optionally, S104 may include: acquiring a first ratio of the area of the color instance segmentation region and the overlapping region of the text box to the area of the text box, a second ratio of the area of the overlapping region to the area of the color instance segmentation region, and a third ratio of the area of the overlapping region to the sum of the area of the color instance segmentation region and the area of the text box; when the maximum value of the first ratio, the second ratio and the third ratio is larger than a preset overlap ratio threshold value, determining that the color example segmentation area is a text occlusion area; and when the maximum value of the first ratio, the second ratio and the third ratio is less than or equal to a preset overlap ratio threshold value, determining that the text box is not occluded by the color instance segmentation area.

In the embodiment of the present disclosure, a maximum value of the first ratio, the second ratio, and the third ratio may also be referred to as an overlap ratio or an overlap degree (IOU) of the color instance segmentation area and the text box.

Illustratively, the first ratio may be calculated by the following formula (1).

iou1 ═ det _1 ≈ det _2/det _1 equation (1)

In formula (1), iou1 represents a first ratio; det _1 ≠ det _2 denotes an area of the color instance division area and the overlapping area of the text box; det _1 represents the area of the text box.

The second ratio can be calculated by the following formula (2).

iou2 ═ det _1 ≈ det _2/det _2 equation (2)

In formula (2), iou2 represents the second ratio; det _1 ≠ det _2 represents the area of the color instance partition area and the overlap area of the text box; det _2 represents the area of the color instance division region.

The third ratio can be calculated by the following formula (3).

iou3 ═ det _1 ≠ det _2/det _1 ═ U det _2 equation (3)

In formula (3), iou3 represents a third ratio; det _1 ≠ det _2 represents the area of the color instance partition area and the overlap area of the text box; det _1 ≡ det _2 denotes the sum of the areas of the color instance division area and the text box.

The overlapping ratio of the color instance division area and the text box can be calculated by the following formula (4).

iou ═ max (iou1, iou2, iou3) formula (4)

In formula (4), iou represents the overlapping ratio of the color instance division area and the text box; max (iou1, iou2, iou3) represents taking the maximum value of iou1, iou2, and iou 3.

After the iou is obtained through calculation, comparing the iou with a preset overlap ratio threshold, when the iou is larger than the preset overlap ratio threshold, showing that the intersection area proportion of the color example segmentation region and the text box reaches a certain degree, and determining that the color example segmentation region is a text shielding region; and when the iou is smaller than or equal to the preset overlap ratio threshold value, determining that the text box is not occluded by the color instance segmentation area.

Illustratively, the size of the preset overlap ratio threshold is related to the business scene of the target image. Illustratively, the preset overlap ratio threshold may be a value in the range of 0.1 to 0.2, such as: 0.2. in the embodiment of the present disclosure, the size of the preset overlap ratio threshold may be adjusted according to the service requirement, and the smaller the preset overlap ratio threshold is set, the more the screened text occlusion area is.

It is understood that a plurality of detection boxes may be included in the first image, and in S104, the text occlusion area in the first image is determined from the plurality of color instance division areas according to the overlapping relationship between each color instance division area and the text box in the first image, and actually the color instance division areas are respectively compared with each detection box included in the first image. For a certain color instance segmentation area, when the overlap ratio of the color instance segmentation area to each detection frame included in the first image is less than or equal to a preset overlap ratio threshold, determining that the color instance segmentation area is not a text occlusion area; when an overlap ratio of the color instance division area to at least one detection box included in the first image is greater than a preset overlap ratio threshold value, it may be determined that the color instance division area is a text occlusion area.

According to the method for detecting the text occlusion area, the first image to be detected is obtained; merging the pixel points of which the distance between the pixel values in the first image is smaller than a preset distance threshold value, and repeatedly executing the step of merging the pixel points of the first image until a preset condition is met to obtain a plurality of clustered pixel values; carrying out example segmentation on the first image according to each clustering pixel value to obtain a plurality of color example segmentation areas which correspond to the clustering pixel values one by one; according to the overlapping relation between each color instance segmentation area and the text box in the first image, the text occlusion area in the first image is determined from the multiple color instance segmentation areas, and unified detection on different types of text occlusion areas can be achieved. When the method is used for detecting the text occlusion area of the first image, the method can have lower false detection rate. In addition, the method greatly improves the speed of detecting the text occlusion area by improving the clustering speed in the color clustering process.

In some embodiments, when the maximum value of the first ratio, the second ratio, and the third ratio is greater than the preset overlap ratio threshold, the step of determining the color instance segmentation area as the text-covered area may include: and when the maximum value of the first ratio, the second ratio and the third ratio is larger than a preset overlap ratio threshold value, inputting the image corresponding to the color instance segmentation area into a preset binary model to obtain a recognition result output by the binary model. The recognition result includes whether the color instance division area is a text occlusion area or not. And when the color instance segmentation area is the text occlusion area as a result of the identification, determining the color instance segmentation area as the text occlusion area.

That is, in this embodiment, when the maximum value of the first ratio, the second ratio, and the third ratio is greater than the preset overlap ratio threshold, the image corresponding to the color instance segmentation area may also be input into the preset binary model, so as to verify whether the detection result that the color instance segmentation area is the text occlusion area is correct. And when the detection result is correct, determining the color instance segmentation area as a text occlusion area.

Optionally, in the embodiment of the present disclosure, a sample set may also be obtained, where the sample set includes a plurality of first sample images and a plurality of second sample images, the first sample images have occlusion tags, and the occlusion tags are used to indicate that the first sample images are images corresponding to text occlusion areas; then, the neural network can be trained by adopting the sample set to obtain the above two classification models.

For example, the first sample image may be an image of some common or user commonly used occlusion pattern, and the second sample image may be an image of text.

The above-described binary model has a function of determining whether an input image is a text occlusion region from the input image. Such as: after the image corresponding to the color instance segmentation area is input into a preset binary classification model, the binary classification model can output a classification label of the image corresponding to the color instance segmentation area, and when the classification label is an occlusion label (such as 1), the identification result indicates that the color instance segmentation area is a text occlusion area; when the classification label is empty (e.g., 0), it indicates that the color instance division area is not a text occlusion area as a recognition result.

In this embodiment, whether the detection result that the color instance segmentation area is the text occlusion area is correct is further verified through the binary model, and some text occlusion areas that are not character occlusion areas or are mistakenly detected in color instance segmentation or OCR can be screened out, so that the detection accuracy of the text occlusion area is improved, and the false detection rate of the text occlusion area is reduced.

Alternatively, the neural network used in training the above two classification models may be a resnet50 network. In training the binary model, a weighted focal _ loss may be used as a loss function, which may be expressed as the following equation (5).

FL＝-a(1-p) ^r log (p) formula (5)

In the formula (5), FL represents loss; a is sample weight, and the value can be 0.25; r can take the value of 2; and p represents the output result of the binary model in the training process.

When the two-classification model is trained, parameters of the two-classification model can be adjusted according to the loss function, so that the probability value predicted by the two-classification model is biased to two ends of 0-1.

Optionally, in another implementation manner, S103 may include: performing color dimensionality reduction on the first image according to each cluster pixel value to obtain a first image subjected to color dimensionality reduction; according to each cluster pixel value, carrying out binarization processing on the first image subjected to color dimensionality reduction to obtain a binarization image corresponding to each cluster pixel value; sequentially carrying out corrosion treatment, expansion treatment and corrosion treatment on the binary image; and extracting a connected region from the binary image corresponding to each cluster pixel value to obtain a color instance segmentation region corresponding to each cluster pixel value.

That is, in this implementation, before extracting the connected region from the binarized image corresponding to each clustered pixel value, the binarized image may be subjected to erosion processing, expansion processing, and erosion processing in sequence.

The expansion and the corrosion are opposite operations, the binary image is sequentially subjected to corrosion treatment to separate each region of the binary image, and then the expansion treatment and the corrosion treatment can effectively fill fine cavities and smooth boundaries in the region and optimize the result of color segmentation on the first image. For example, the binary image is sequentially subjected to erosion processing, expansion processing and erosion processing, so that the contour of the subsequently extracted connected region can be clearer.

Optionally, in this embodiment of the present disclosure, before determining, according to an overlapping relationship between each color instance division area and a text box in the first image, a text occlusion area in the first image from among the multiple color instance division areas, the text occlusion area detection method further includes: and carrying out optical character recognition text line detection on the first image to obtain a text box in the first image.

The process of performing the optical character recognition text line detection on the first image may adopt a mature OCR algorithm, which is not described herein again.

Optionally, in this embodiment of the present disclosure, before merging the pixel points whose distances between the pixel values are smaller than the preset distance threshold in the first image, the method for detecting a text-shielded area further includes: and compressing the first image into a preset size, and converting the format of the first image into an RGB format.

The step of compressing the first image to a preset size and converting the format of the first image into the RGB format may also be considered as a step of preprocessing the first image. Illustratively, the preset size may be 224 x 224. The first image is compressed to be in a preset size, and the format of the first image is converted into the RGB format, so that the speed of detecting the text shielding area of the first image in the follow-up process can be effectively increased.

Exemplarily, fig. 9 is another schematic flow chart of the text occlusion region detection method provided in the embodiment of the present disclosure. The text occlusion region detection method provided by the embodiment of the present disclosure is exemplarily described below with reference to fig. 9 as a relatively complete implementation flow. As shown in fig. 9, the text occlusion region detection method may include:

and S901, reading the picture.

S901 may refer to S101 described in the foregoing embodiment.

And S902, preprocessing the picture.

S902 may refer to the steps of compressing the first image into a preset size and converting the format of the first image into an RGB format in the foregoing embodiment.

And S903, fast agglomerative clustering.

S903 may refer to S102 described in the foregoing embodiments.

And S904, reducing the dimension of the color.

S904 may refer to S601 described in the foregoing embodiments.

S905, example division.

S905 may refer to S301-S302 or S602-S603 described in the previous embodiments.

And S906, OCR text line detection.

S906 may refer to the step of performing optical character recognition text line detection on the first image to obtain a text box in the first image in the foregoing embodiment.

And S907, selecting an occlusion area.

S907 may refer to the obtaining of the first ratio of the area of the color instance dividing region and the overlapping region of the text box to the area of the text box, the second ratio of the area of the overlapping region to the area of the color instance dividing region, and the third ratio of the area of the overlapping region to the sum of the area of the color instance dividing region and the area of the text box described in the foregoing embodiments; and when the maximum value of the first ratio, the second ratio and the third ratio is larger than a preset overlap ratio threshold value, preliminarily determining the color example segmentation area as a text occlusion area.

And S908, classifying.

S908 refers to the step of verifying whether the detection result of the text-covered region obtained by dividing the region into two classification color instances is correct in the foregoing embodiment.

In an exemplary embodiment, an embodiment of the present disclosure further provides a device for detecting a text-covered area, which may be used to implement the method for detecting a text-covered area according to the foregoing embodiment.

Fig. 10 is a schematic composition diagram of a text occlusion region detection apparatus according to an embodiment of the present disclosure. As shown in fig. 10, the apparatus may include: an acquisition unit 1001, a clustering unit 1002, a segmentation unit 1003, and a detection unit 1004.

The acquiring unit 1001 is configured to acquire a first image to be detected;

the clustering unit 1002 is configured to merge pixels of which distances between pixel values in the first image are smaller than a preset distance threshold, and repeatedly perform the step of merging pixels of the first image until a preset condition is met, so as to obtain a plurality of clustered pixel values.

A segmentation unit 1003, configured to perform example segmentation on the first image according to each cluster pixel value to obtain a plurality of color example segmentation areas corresponding to the plurality of cluster pixel values one to one;

a detecting unit 1004, configured to determine a text occlusion region in the first image from the multiple color instance division regions according to an overlapping relationship between each color instance division region and a text box in the first image.

Optionally, the preset distance threshold includes a first distance threshold and a second distance threshold, and the preset condition includes a first preset condition and a second preset condition. A clustering unit 1002, specifically configured to divide the first image into a plurality of second images; merging the pixel points of which the distance between the pixel values in each second image is smaller than a first distance threshold value, and repeatedly executing the step of merging the pixel points of each second image until a first preset condition is met; and after the step of combining the pixel points of each second image is repeatedly executed until the first preset condition is met, combining the pixel points of which the distances between the pixel values in the plurality of second images are smaller than a second distance threshold value, and repeatedly executing the step of combining the pixel points of the plurality of second images until the second preset condition is met.

Optionally, the first preset condition includes: and the distance between any two pixel points in each second image is smaller than a third distance threshold, or the iteration times of the step of repeatedly executing and merging the pixel points of each second image reach the first time. A second preset condition comprising: and the distance between any two pixel points in the plurality of second images is smaller than the fourth distance threshold, or the iteration times of the step of repeatedly executing the combination of the pixel points of the plurality of second images reaches the second time.

Optionally, the first distance threshold is equal to the second distance threshold.

Optionally, the third distance threshold is equal to the fourth distance threshold, and the first number of times is equal to the second number of times.

Optionally, the third distance threshold is equal to the first distance threshold, and the fourth distance threshold is equal to the second distance threshold.

Optionally, the segmentation unit 1003 is specifically configured to perform color dimensionality reduction on the first image according to each clustered pixel value to obtain a first image after color dimensionality reduction; according to each cluster pixel value, carrying out binarization processing on the first image subjected to color dimensionality reduction to obtain a binarization image corresponding to each cluster pixel value; and extracting a connected region from the binary image corresponding to each cluster pixel value to obtain a color instance segmentation region corresponding to each cluster pixel value.

Optionally, the segmentation unit 1003 is further configured to sequentially perform erosion processing, expansion processing, and erosion processing on the binarized image.

Optionally, the detecting unit 1004 is specifically configured to obtain a first ratio of an area of the color instance dividing region to an area of the text box, a second ratio of the area of the overlapping region to the area of the color instance dividing region, and a third ratio of the area of the overlapping region to a sum of the area of the color instance dividing region and the area of the text box; when the maximum value of the first ratio, the second ratio and the third ratio is larger than a preset overlap ratio threshold value, determining that the color example segmentation area is a text occlusion area; the size of the preset overlap ratio threshold is related to the service scene of the target image.

Optionally, the detecting unit 1004 is specifically configured to, when a maximum value of the first ratio, the second ratio, and the third ratio is greater than a preset overlap ratio threshold, input an image corresponding to the color instance segmentation area into a preset binary model, and obtain an identification result output by the binary model; the identification result comprises that the color instance segmentation area is a text occlusion area or not; and when the color instance segmentation area is the text occlusion area as a result of the identification, determining the color instance segmentation area as the text occlusion area.

Fig. 11 is another schematic composition diagram of the device for detecting a text-shielded area according to the embodiment of the present disclosure. As shown in fig. 11, the apparatus may further include: a training unit 1005.

A training unit 1005, configured to obtain a sample set, where the sample set includes a plurality of first sample images and a plurality of second sample images, and the first sample images have occlusion labels, where the occlusion labels are used to indicate that the first sample images are images corresponding to text occlusion areas; and training the neural network by adopting a sample set to obtain a two-classification model.

Fig. 12 is a schematic diagram of another composition of the device for detecting a text-covered area according to the embodiment of the present disclosure. As shown in fig. 12, the apparatus may further include: text detection unit 1006.

The text detection unit 1006 is configured to perform optical character recognition text line detection on the first image, so as to obtain a text box in the first image.

Optionally, the clustering unit 1002 is specifically configured to, for any two target pixel points in the first image, when a distance between pixel values of the two target pixel points is smaller than a preset distance threshold, respectively assign the pixel values of the two target pixel points to a mean value of the pixel values of the two target pixel points.

Fig. 13 is another schematic diagram of a composition of the device for detecting a text-covered area according to the embodiment of the present disclosure. As shown in fig. 13, the apparatus may further include: a preprocessing unit 1007.

The preprocessing unit 1007 is configured to compress the first image into a preset size and convert the format of the first image into an RGB format before the clustering unit 1002 merges the pixel points in the first image whose distance between the pixel values is smaller than the preset distance threshold.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, and do not violate the good customs of the public order.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

In an exemplary embodiment, an electronic device includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method according to the above embodiments.

In an exemplary embodiment, the readable storage medium may be a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method according to the above embodiments.

In an exemplary embodiment, the computer program product comprises a computer program which, when executed by a processor, implements the method according to the above embodiments.

FIG. 14 shows a schematic block diagram of an example electronic device 1400 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 14, the electronic device 1400 includes a computing unit 1401 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)1402 or a computer program loaded from a storage unit 1408 into a Random Access Memory (RAM) 1403. In the RAM 1403, various programs and data required for the operation of the device 1400 can also be stored. The calculation unit 1401, the ROM 1402, and the RAM 1403 are connected to each other via a bus 1404. An input/output (I/O) interface 1405 is also connected to bus 1404.

A number of components in the electronic device 1400 are connected to the I/O interface 1405, including: an input unit 1406 such as a keyboard, a mouse, or the like; an output unit 1407 such as various types of displays, speakers, and the like; a storage unit 1408 such as a magnetic disk, optical disk, or the like; and a communication unit 1409 such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 1409 allows the electronic device 1400 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The computing unit 1401 may be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 1401 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and the like. The calculation unit 1401 performs the respective methods and processes described above, such as the text occlusion region detection method. For example, in some embodiments, the text occlusion region detection method can be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 1408. In some embodiments, part or all of the computer program can be loaded and/or installed onto the electronic device 1400 via the ROM 1402 and/or the communication unit 1409. When a computer program is loaded into the RAM 1403 and executed by the computing unit 1401, one or more steps of the text occlusion region detection method described above may be performed. Alternatively, in other embodiments, the computing unit 1401 may be configured to perform the text occlusion region detection method by any other suitable means (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A method for detecting a text occlusion region, the method comprising:

acquiring a first image to be detected;

merging the pixel points of which the distance between the pixel values in the first image is smaller than a preset distance threshold value, and repeatedly executing the step of merging the pixel points of the first image until a preset condition is met to obtain a plurality of clustered pixel values;

carrying out example segmentation on the first image according to each clustering pixel value to obtain a plurality of color example segmentation areas corresponding to a plurality of clustering pixel values one to one;

and determining a text occlusion area in the first image from a plurality of color instance segmentation areas according to the overlapping relation between each color instance segmentation area and a text box in the first image.

2. The method according to claim 1, wherein the preset distance threshold comprises a first distance threshold and a second distance threshold, and the preset condition comprises a first preset condition and a second preset condition;

the merging of the pixel points of which the distance between the pixel values in the first image is smaller than a preset distance threshold, and the step of merging the pixel points of the first image is repeatedly executed until a preset condition is met, including:

segmenting the first image into a plurality of second images;

merging the pixel points of which the distance between the pixel values in each second image is smaller than the first distance threshold, and repeatedly executing the step of merging the pixel points of each second image until the first preset condition is met;

after the step of combining the pixel points of each second image is repeatedly executed until the first preset condition is met, combining a plurality of pixel points of which the distances between the pixel values in the second images are smaller than the second distance threshold value, and repeatedly executing the step of combining the pixel points of the plurality of second images until the second preset condition is met.

3. The method according to claim 2, wherein the first preset condition comprises: the distance between any two pixel points in each second image is smaller than a third distance threshold, or the iteration times of the step of combining the pixel points of each second image is repeatedly executed reaches a first time;

the second preset condition includes: and the distance between any two pixel points in the plurality of second images is smaller than a fourth distance threshold, or the iteration times of the step of combining the pixel points of the plurality of second images is repeatedly executed to reach a second time.

4. The method of claim 3, wherein the first distance threshold is equal to the second distance threshold.

5. The method according to claim 3 or 4, characterized in that said third distance threshold is equal to said fourth distance threshold, said first number of times being equal to said second number of times.

6. The method of any of claims 3-5, wherein the third distance threshold is equal to the first distance threshold and the fourth distance threshold is equal to the second distance threshold.

7. The method according to any one of claims 1 to 6, wherein the performing instance segmentation on the first image according to each of the cluster pixel values to obtain a plurality of color instance segmentation areas corresponding to a plurality of cluster pixel values in a one-to-one manner comprises:

performing color dimensionality reduction on the first image according to each clustering pixel value to obtain a first image after color dimensionality reduction;

according to each clustering pixel value, carrying out binarization processing on the first image subjected to color dimensionality reduction to obtain a binarization image corresponding to each clustering pixel value;

and extracting a connected region from the binary image corresponding to each cluster pixel value to obtain a color instance segmentation region corresponding to each cluster pixel value.

8. The method according to claim 7, wherein before extracting connected regions from the binarized image corresponding to each of the clustered pixel values, the method further comprises:

and sequentially carrying out corrosion treatment, expansion treatment and corrosion treatment on the binary image.

9. The method according to any one of claims 1-8, wherein determining a text occlusion region in the first image from a plurality of the color instance segmentation regions according to an overlapping relationship between each of the color instance segmentation regions and a text box in the first image comprises:

acquiring a first ratio of the area of the overlapping region of the color instance segmentation region and the text box to the area of the text box, a second ratio of the area of the overlapping region to the area of the color instance segmentation region, and a third ratio of the area of the overlapping region to the sum of the area of the color instance segmentation region and the text box;

when the maximum value of the first ratio, the second ratio and the third ratio is larger than a preset overlap ratio threshold value, determining that the color example segmentation area is a text occlusion area; the size of the preset overlap ratio threshold value is related to the service scene of the target image.

10. The method according to claim 9, wherein the determining that the color instance division area is a text occlusion area when a maximum value of the first ratio, the second ratio, and the third ratio is greater than a preset overlap ratio threshold value comprises:

when the maximum value of the first ratio, the second ratio and the third ratio is larger than a preset overlap ratio threshold value, inputting the image corresponding to the color instance segmentation area into a preset binary model to obtain a recognition result output by the binary model; the recognition result comprises whether the color instance segmentation area is a text occlusion area or not;

and when the identification result is that the color example segmentation area is a text occlusion area, determining that the color example segmentation area is the text occlusion area.

11. The method according to any one of claims 1-10, wherein before determining a text occlusion region in the first image from a plurality of the color instance segmentation regions according to the overlapping relationship between each of the color instance segmentation regions and a text box in the first image, the method further comprises:

and carrying out optical character recognition text line detection on the first image to obtain a text box in the first image.

12. The method according to any one of claims 1 to 11, wherein said merging pixels in said first image for which the distance between pixel values is smaller than a preset distance threshold comprises:

and for any two target pixel points in the first image, when the distance between the pixel values of the two target pixel points is smaller than the preset distance threshold, assigning the pixel values of the two target pixel points to be the mean value of the pixel values of the two target pixel points respectively.

13. The method according to any one of claims 1-12, wherein before said merging pixels in said first image for which the distance between pixel values is less than a preset distance threshold, the method further comprises:

compressing the first image into a preset size, and converting the format of the first image into an RGB format.

14. A text occlusion region detection apparatus, the apparatus comprising:

the device comprises an acquisition unit, a detection unit and a processing unit, wherein the acquisition unit is used for acquiring a first image to be detected;

the clustering unit is used for merging pixel points of which the distance between the pixel values in the first image is smaller than a preset distance threshold value, and repeatedly executing the step of merging the pixel points of the first image until a preset condition is met to obtain a plurality of clustered pixel values;

the segmentation unit is used for performing example segmentation on the first image according to each clustering pixel value to obtain a plurality of color example segmentation areas corresponding to a plurality of clustering pixel values one by one;

and the detection unit is used for determining a text occlusion area in the first image from the plurality of color example segmentation areas according to the overlapping relation between each color example segmentation area and the text box in the first image.

15. An electronic device, comprising: at least one processor; and a memory communicatively coupled to the at least one processor;

wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-13.

16. A non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method according to any one of claims 1-13.

17. A computer program product, characterized in that it comprises a computer program which, when being executed by a processor, carries out the method according to any one of claims 1-13.