CN113869310A

CN113869310A - Dialog box detection method and device, electronic equipment and storage medium

Info

Publication number: CN113869310A
Application number: CN202111137112.8A
Authority: CN
Inventors: 钟东宏
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2021-09-27
Filing date: 2021-09-27
Publication date: 2021-12-31

Abstract

The disclosure relates to a dialog box detection method, a dialog box detection device, an electronic device and a storage medium, wherein the method comprises the following steps: carrying out binarization processing on a video frame image to be detected to obtain a binarized image; carrying out region detection on the binary image to obtain a connected region in the binary image; determining a communication area which is a preset polygon in the communication area as a preset polygon area; and when a preset shape and/or a preset color exist at a preset position of the preset polygonal area, determining that the preset polygonal area is a dialog box of a preset dialog box type. According to the method, after the preset polygonal area is determined, the preset dialog box of the dialog box type is detected by detecting the preset shape and the preset color, so that the dialog box with local details can be accurately detected, the detection accuracy is improved, a deep learning method is not needed for detection, manual data labeling is not needed, and the labor cost is reduced.

Description

Dialog box detection method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to a dialog box detection method and apparatus, an electronic device, and a storage medium.

Background

Image target detection is a hotspot problem existing in image processing, a plurality of researches on general target detection and specific target detection exist in academia, and in actual work, certain targets (such as human faces, vehicles and the like) are generally detected in a targeted manner. The target detection method is mainly divided into a traditional method and a deep learning method at present, and the deep learning method obtains great results in the field of target detection in recent years. A target detection technology based on methods such as deep learning is a popular scheme for many detection problems at present, a target detection network model is trained by using a large amount of training data marked with detection frames, and then an image to be detected is input into the target detection network model to obtain target frame coordinate information of a target.

Currently, many dialog boxes are artificially added to many video contents to assist in describing the video contents or to increase the interest and richness of the video contents, but some foreign transport videos use a dialog box template provided by a non-self application program, so that the special dialog boxes need to be detected. The appearance difference of the special dialog box and the dialog box provided by the application program is very small, if the deep learning-based method is used for detection, the method cannot well distinguish the difference of the detection target details due to the downsampling processing existing in the deep neural network, the detection performance of the special dialog box is poor, a large amount of labeled training data is needed, and the great labor cost is consumed.

Disclosure of Invention

The present disclosure provides a dialog box detection method, apparatus, electronic device and storage medium, so as to at least solve the problems of poor dialog box detection accuracy and manpower waste in the related art. The technical scheme of the disclosure is as follows:

according to a first aspect of the embodiments of the present disclosure, there is provided a dialog box detection method, including:

carrying out binarization processing on a video frame image to be detected to obtain a binarized image;

carrying out region detection on the binary image to obtain a connected region in the binary image;

determining a communication area which is a preset polygon in the communication area as a preset polygon area;

and when a preset shape and/or a preset color exist at a preset position of the preset polygonal area, determining that the preset polygonal area is a dialog box of a preset dialog box type.

Optionally, when a preset shape and/or a preset color exists at a preset position of the preset polygonal area, determining that the preset polygonal area is a dialog box of a preset dialog box type, including:

determining a preset vertex in the preset polygonal area, and extracting a rectangular area or a square area with a preset size based on the preset vertex to be used as an area to be detected;

dividing the area to be detected into a first area and a second area based on a diagonal line of the area to be detected, wherein the first area is an area located above the diagonal line, and the second area is an area located below the diagonal line;

determining the pixel mean value of the first area and the pixel mean value of the second area in the binary image;

determining a ratio value of the pixel mean value of the first area to the pixel mean value of the second area as a first ratio value;

and when the first proportion value is larger than a judgment threshold value, determining the preset polygonal area as a dialog box comprising triangular sharp corners.

Optionally, when the first ratio value is greater than a judgment threshold, determining that the preset polygonal area is a dialog box including a triangular pointed corner, including:

when the diagonal line is the diagonal line between the lower left corner and the upper right corner of the region to be detected, and the first proportional value is larger than the judgment threshold value, determining the polygonal region as a dialog box comprising a first triangular sharp corner;

and when the diagonal line is the diagonal line between the upper left corner and the lower right corner of the region to be detected, and the first proportional value is greater than the judgment threshold value, determining the preset polygonal region as a dialog box comprising a second triangular sharp corner.

Optionally, the judgment threshold is determined by the following steps:

for a plurality of image samples of the dialog box with triangular sharp corners, respectively determining a pixel mean value of a first area of each image sample as a first pixel mean value, and respectively determining a pixel mean value of a second area of each image sample as a second pixel mean value;

determining the second pixel mean value of each image sample and the proportion value of the first pixel mean value as a second proportion value corresponding to each image sample respectively;

determining a plurality of candidate threshold values according to the difference of the second proportional values corresponding to each image sample;

and determining the judgment threshold according to the candidate thresholds.

when the preset polygonal area is a rectangular area, extracting a lower boundary range area and/or a right boundary range area of the rectangular area from the video frame image to be detected, and taking the lower boundary range area and/or the right boundary range area as the area to be detected;

and determining the pixel mean value of the region to be detected, and determining the preset polygonal region as a dialog box comprising preset colors when the pixel mean value is within a preset color range.

Optionally, the preset polygon is a rectangle;

determining a region which is a preset polygon in the communication region as a preset polygon region, including:

performing polygon approximation processing on the connected region, determining a polygon boundary of the connected region, and determining a polygon region in the connected region based on the polygon boundary of the connected region;

when the polygonal area has four polygonal boundaries, determining the included angle degree between every two intersected edges of the polygonal area;

and when the degree of the included angle between every two intersected edges is within a preset degree range, determining that the polygonal area is a rectangular area.

According to a second aspect of the embodiments of the present disclosure, there is provided a dialog box detection apparatus, including:

the binarization processing module is configured to perform binarization processing on a video frame image to be detected to obtain a binarization image;

the region detection module is configured to perform region detection on the binary image to obtain a connected region in the binary image;

a region determination module configured to perform determination of a connected region that is a preset polygon in the connected region as a preset polygon region;

the dialog box detection module is configured to execute dialog box determination that the preset polygonal area is a preset dialog box type when a preset shape and/or a preset color exists at a preset position of the preset polygonal area.

Optionally, the dialog box detection module includes:

a first region-to-be-detected determining unit configured to perform determining a preset vertex in the preset polygonal region, and extract a rectangular region or a square region of a preset size as a region to be detected based on the preset vertex;

the area dividing unit is configured to divide the area to be detected into a first area and a second area based on a diagonal line of the area to be detected, wherein the first area is an area located above the diagonal line, and the second area is an area located below the diagonal line;

a pixel mean value determination unit configured to perform determination of a pixel mean value of the first region and a pixel mean value of a second region in the binarized image;

a determination data determination unit configured to perform determination of a ratio value of a pixel mean value of the first region to a pixel mean value of a second region as a first ratio value;

a first dialog box determination unit configured to perform, when the first scale value is greater than a determination threshold value, determining that the preset polygonal area is a dialog box including a triangular-shaped tip angle.

Optionally, the dialog box determining unit is configured to perform:

Optionally, the judgment threshold is determined by the following steps:

determining a plurality of candidate threshold values according to the second proportional value corresponding to each image sample;

and determining the judgment threshold according to the candidate thresholds.

Optionally, the dialog box detection module includes:

a second to-be-detected region determining unit configured to extract a lower boundary range region and/or a right boundary range region of the rectangular region from the to-be-detected video frame image when the preset polygonal region is a rectangular region, and take the lower boundary range region and/or the right boundary range region as the to-be-detected region;

and the second dialog box determining unit is configured to determine a pixel mean value of the region to be detected, and when the pixel mean value is within a preset color range, determine the preset polygonal region as a dialog box comprising a preset color.

Optionally, the preset polygon is a rectangle;

the region determination module includes:

a polygon approximation unit configured to perform polygon approximation processing on the connected region, determine a polygon boundary of the connected region, and determine a polygon region in the connected region based on the polygon boundary of the connected region;

an included angle determining unit configured to determine a degree of an included angle between two intersecting edges of the polygonal area when the polygonal area has four polygonal boundaries;

a rectangular area determination unit configured to determine the polygonal area to be a rectangular area when the degree of the included angle between the two intersecting sides is within a preset degree range.

According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the dialog box detection method according to the first aspect.

According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium, wherein instructions, when executed by a processor of an electronic device, enable the electronic device to perform the dialog detection method according to the first aspect.

According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product comprising a computer program or computer instructions which, when executed by a processor, implements the dialog box detection method of the first aspect.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

the method comprises the steps of conducting binarization processing on a video frame image to be detected to obtain a binarized image, conducting region detection on the binarized image to obtain a communicated region in the binarized image, determining the communicated region which is a preset polygon in the communicated region to be used as a preset polygon region, determining the preset polygon region to be a dialog box of a preset dialog box type when a preset shape and/or a preset color exist at a preset position of the preset polygon region, and detecting the dialog box of the preset dialog box type by detecting the preset shape and the preset color after the preset polygon region is determined.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

FIGS. 1a-1c are stylistic views of a dialog to be detected in the present disclosure;

FIGS. 2a-2b are style diagrams of a generic dialog box in the present disclosure;

FIG. 3 is a flow diagram illustrating a dialog detection method in accordance with an exemplary embodiment;

4a-4b are schematic diagrams of the partitioning of the area to be detected in the embodiments of the present disclosure;

FIG. 5 is a block diagram illustrating a dialog detection device in accordance with an exemplary embodiment;

FIG. 6 is a block diagram illustrating an electronic device in accordance with an example embodiment.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

Fig. 1a to 1c are schematic diagrams of styles of dialog boxes to be detected in the present disclosure, fig. 2a to 2b are schematic diagrams of styles of normal dialog boxes in the present disclosure, and a normal dialog box may be a dialog box used by an application program itself, as shown in fig. 1a to 1c and fig. 2a to 2b, a dialog box to be detected and a normal dialog box have little difference, some slight difference or even only some slight difference, if a deep learning type detection network is used, the slight difference cannot be distinguished, and accurate detection cannot be performed.

Although the appearance of the object to be detected is similar to that of the non-detected object, the object to be detected is different from the non-detected object in some detail parts, so that a reasonable detection method can be designed by combining some image processing algorithms. Embodiments of the present disclosure provide the following processing logic flow, which may address this issue.

Fig. 3 is a flowchart illustrating a dialog box detection method according to an exemplary embodiment, which is used in an electronic device such as a server, a computer, etc. as shown in fig. 3, and includes the following steps.

In step S31, a binarization process is performed on the video frame image to be detected, so as to obtain a binarized image.

And the video frame image to be detected is a color image.

Firstly, carrying out binarization processing on a video frame image to be detected, and converting the video frame image to be detected into a binarized image, namely converting a color video frame image into a black and white image to obtain the binarized image.

In an exemplary embodiment, the binarizing processing on the video frame image to be detected to obtain a binarized image includes: carrying out gray level processing on the video frame image to be detected to obtain a gray level image of the video frame image to be detected; and carrying out binarization processing on the gray level image to obtain the binarized image.

Firstly, carrying out gray level processing on a video frame image to be detected, converting a three-channel color image into a single-channel gray level image, eliminating interference of partial noise by converting the three-channel color image into the gray level image, then converting the gray level image into a binary image based on a gray level threshold value, namely, assigning the pixel points with the gray level values larger than the gray level threshold value in the gray level image to be 255, and assigning the pixel points with the gray level values smaller than the gray level threshold value in the gray level image to be 0, thereby obtaining the binary image. The gray value threshold may be set as needed, for example, if the color of the dialog box is a light color such as white, beige, etc., the gray value threshold may be set to 250, etc.

After the video frame image to be detected is converted into the binary image, subsequent dialog box detection is carried out, interference caused by irrelevant pixel distribution can be eliminated, and detection accuracy is improved.

In step S32, region detection is performed on the binarized image to obtain a connected region in the binarized image.

And performing area detection on the binary image by using an area detection algorithm, traversing the binary image, determining connected areas in the binary image, and obtaining the positions of the connected areas. The region detection algorithm may use the region detection algorithm of opencv.

In step S33, a connected region that is a preset polygon in the connected regions is determined as a preset polygon region.

Wherein, the preset polygon is the area shape of the general dialog box. The dialog box to be detected and the general dialog box have some small differences, and the preset shape and/or the preset color generally exist around the preset polygon, so that the position of the general dialog box needs to be found firstly, namely the preset polygon area needs to be found firstly.

And processing each connected region respectively to determine whether the connected region is a preset polygonal region. And if the polygon formed by the boundaries of the connected region is the preset polygon, determining that the connected region is the preset polygon region, and if the polygon formed by the boundaries of the connected region is not the preset polygon, excluding the connected region. Through the above processing, a preset polygonal area in the communication area can be obtained.

In one exemplary embodiment, the preset polygon is a rectangle;

and if the included angle degree between every two intersected edges is within a preset degree range, determining that the polygonal area is a rectangular area.

When the shape of a dialog box used by an application program is a rectangle, presetting a polygon as the rectangle, determining whether a connected region is the rectangular region, firstly, performing polygon approximation processing on the connected region, determining the polygon boundary of each connected region, determining two intersection points in a plurality of polygon boundaries of one connected region, wherein a region defined by the polygon boundaries is the polygon region in the connected region, screening out the polygon region with four polygon boundaries from the polygon region, filtering out the polygon region with less than or more than four polygon boundaries, calculating the included angle degree between two intersected sides of the polygon region with four polygon boundaries, and if the included angle degree between two intersected sides is within the preset degree range, determining the polygon region as the rectangular region. The preset degree range is 90 degrees plus or minus a preset degree, and the preset degree can be 10 degrees or 15 degrees, for example.

Based on the polygon approximation, the boundary number judgment and the included angle judgment, a roughly rectangular region can be screened out so as to screen out a dialog box of the application program and a dialog box similar to the dialog box used by the application program, and therefore the dialog box which is approximately rectangular can be detected accurately.

In step S34, when a preset shape and/or a preset color exists at a preset position of the preset polygonal area, the preset polygonal area is determined to be a dialog box of a preset dialog box type.

Since the dialog box of the preset dialog box type to be detected generally has the preset shape and/or the preset color at the preset position of the preset polygonal area, a corresponding detection mode can be set for the dialog box of the preset dialog box type.

After the preset polygonal area is determined, the preset polygonal area can be detected by using at least one detection algorithm of the preset dialog box type, whether the preset polygonal area is a dialog box of the preset dialog box type or not is determined, and a specific preset dialog box type is determined. The method comprises the steps of firstly determining a preset position of a preset polygonal area, determining an area near the preset position, detecting whether a preset shape and/or a preset color exist in the area, if the preset shape and/or the preset color exist in the area, determining the preset polygonal area as a dialog box of a preset dialog box type, and obtaining a specific preset dialog box type based on the preset shape and/or the preset color existing in the area. The preset dialog box type can be a preset shape, a preset color, or a preset shape and a preset color.

In an exemplary embodiment, when a preset shape and/or a preset color exists at a preset position of the preset polygonal area, determining that the preset polygonal area is a dialog box of a preset dialog box type includes:

If the preset dialog box type is that a triangular sharp corner exists near a preset vertex, as shown in fig. 1a and 1b, the position of the preset vertex in the preset polygonal region is first determined based on the position of the preset polygonal region (if the preset polygonal region is a rectangular region or a hexagonal region, the preset vertex may be one of two vertices of the next side), and in fig. 1a and 1b, the preset polygonal region is a rectangular region and the preset vertex is a vertex at the lower left corner of the rectangular region. After the preset vertexes in the preset polygonal area are determined, a rectangular area or a square area with preset size including the preset vertexes is extracted, and the extracted rectangular area or the square area is used as an area to be detected so as to determine whether triangular sharp corners exist in the area to be detected. The preset size may be set according to the size of the triangular cusp to be detected, and may be, for example, 10 × 10 pixels, 20 × 20 pixels, 30 × 30 pixels, or the like. When a rectangular region or a square region with a preset size including a preset vertex is extracted, the preset vertex may be used as one vertex of the rectangular region or the square region, or the preset vertex may also be inside the rectangular region or the square region and preset a pixel position away from one vertex, for example, for the dialog boxes in fig. 1a and 1b, when the extracted region is the square region, the preset vertex (i.e., the lower left vertex) of the rectangular region may be used as the upper left vertex of the square region to be extracted and the square region is extracted based on the preset size, or two upward pixels of the preset vertex (i.e., the lower left vertex) may be used as the upper left vertex of the square region to be extracted and the square region is extracted based on the preset size. When the region to be detected is extracted, specifically, the preset vertex is used as a vertex of the region to be detected or is located at an internal position of the region to be detected, a relative position relationship between the preset vertex and a vertex of the region to be detected (for example, a dialog box shown in fig. 1a and 1b, and a vertex of the region to be detected is an upper left vertex) may be set as a hyper-parameter, and the hyper-parameter is dynamically adjusted in advance according to the dialog box to be detected, so as to determine a suitable value.

After the region to be detected is extracted, the region to be detected is divided into a first region and a second region based on a diagonal line of the region to be detected, the region to be detected is divided into the first region and the second region based on the type of the triangular sharp corner to be detected, the diagonal line corresponding to the type is used, the region located above the diagonal line is the first region, the region located below the diagonal line is the second region, for example, for the dialog box shown in fig. 1a, the region to be detected is divided by using the diagonal line between the top left corner vertex and the bottom right corner vertex of the region to be detected, and for the dialog box shown in fig. 1b, the diagonal line between the top left corner vertex and the top right corner vertex of the region to be detected is used for dividing the region to be detected.

After the area to be detected is divided into a first area and a second area, in the binary image, the pixel mean value of the first area and the pixel mean value of the second area are counted, the proportion value of the pixel mean value of the first area and the pixel mean value of the second area is determined, the proportion value is a first proportion value, and if the first proportion value is larger than a judgment threshold value, the preset polygonal area is determined to be a dialog box comprising triangular sharp corners.

When the preset polygonal region is determined to be a dialog box including a triangular cusp when the first scale value is larger than the determination threshold value, the region of the dialog box is processed to be white in the binarized image.

By extracting the area to be detected near the preset vertex and dividing the area to be detected into the first area and the second area based on the diagonal line, the dialog box comprising the triangular-shaped sharp corner can be accurately judged based on the pixel mean value of the first area and the pixel mean value of the second area, and the judgment accuracy of the dialog box comprising the triangular-shaped sharp corner can be improved.

In an exemplary embodiment, when the first proportion value is greater than a judgment threshold, determining the preset polygonal area as a dialog box including a triangular cusp includes:

The first type of trigonal points is shown in FIG. 1a and the second type of trigonal points is shown in FIG. 1 b. When judging whether the first triangular sharp corner is included, dividing the region to be detected by using a diagonal line between the lower left corner and the upper right corner of the region to be detected, as shown in fig. 4a, and if the first proportion value is larger than the judgment threshold value, determining the polygonal region as a dialog box including the first triangular sharp corner. When judging whether the second triangular sharp corner is included, dividing the region to be detected by using a diagonal line between the upper left corner and the lower right corner of the region to be detected, as shown in fig. 4b, and if the first proportion value is larger than the judgment threshold value, determining the polygonal region as a dialog box including the second triangular sharp corner. Different types of dialog boxes with triangular sharp corners can be distinguished through different diagonals, and the dialog boxes with corresponding types can be accurately determined.

In one exemplary embodiment, the determination threshold is determined by the steps including: for a plurality of image samples of the dialog box with triangular sharp corners, respectively determining a pixel mean value of a first area of each image sample as a first pixel mean value, and respectively determining a pixel mean value of a second area of each image sample as a second pixel mean value; determining the second pixel mean value of each image sample and the proportion value of the first pixel mean value as a second proportion value corresponding to each image sample respectively; determining a plurality of candidate threshold values according to the second proportional value corresponding to each image sample; and determining the judgment threshold according to the candidate thresholds.

Acquiring an image sample of a dialog box with a plurality of triangular sharp corners, wherein the position of the dialog box with the triangular sharp corners can be marked in the image sample, determining a region to be detected based on the position, dividing the region to be detected into a first region and a second region based on a diagonal line of the region to be detected, the first region is a region above the diagonal line, the second region is a region below the diagonal line, when the dialog box is binarized into a white region in a binarized image corresponding to the image sample, determining a pixel mean value of the first region of each image sample as a first pixel mean value, respectively determining a pixel mean value of the second region of each image sample as a second pixel mean value, respectively determining a ratio value of the second pixel mean value and the first pixel mean value of each image sample as a second ratio value corresponding to each image sample, determining the difference between 1 and each second proportional value as a candidate threshold, i.e. determining the candidate threshold corresponding to each image sample according to the following formula:

thr＝1-P_bottom/P_top

wherein thr is a candidate threshold corresponding to an image sample, P_topIs the first pixel mean, P, of the image sample_bottomIs the second pixel mean, P, of the image sample_bottom/P_topThe second ratio value is corresponding to the image sample.

According to the above formula, a plurality of candidate threshold values are obtained based on a plurality of image samples, distribution of the plurality of candidate threshold values is determined, a range where most (e.g., more than 50% or the like) of the candidate threshold values are located is statistically determined, and a determination threshold value is determined based on the range, for example, for 100 image samples, the second ratio values of 50 image samples are all between 0.88 and 0.92, and the second ratio values of other image samples are relatively dispersed, so that the determination threshold value can be determined to be 0.9. By dynamically determining the judgment threshold value based on the first pixel mean value and the second pixel mean value of the plurality of image samples, the accuracy of the dialog box type judgment can be improved.

In another exemplary embodiment, when a preset shape and/or a preset color exists at a preset position of the preset polygonal area, determining that the preset polygonal area is a dialog box of a preset dialog box type includes: when the preset polygonal area is a rectangular area, extracting a lower boundary range area and/or a right boundary range area of the rectangular area from the video frame image to be detected, and taking the lower boundary range area and/or the right boundary range area as the area to be detected; and determining the pixel mean value of the region to be detected, and determining the preset polygonal region as a dialog box comprising preset colors when the pixel mean value is within a preset color range.

The dialog box shown in fig. 1c is a rectangular dialog box, and there are specific colors in the areas near the lower boundary and the right boundary, when detecting such dialog boxes, the lower boundary range area and/or the right boundary range area of the rectangular area are extracted from the video frame image to be detected, and the extracted lower boundary range area and/or the right boundary range area are used as the area to be detected. Counting the pixel mean value of the area to be detected, judging whether the pixel mean value is located in a preset color range, if so, determining that the preset polygonal area is a dialog box comprising preset colors, and if not, determining that the preset polygonal area is not a dialog box comprising the preset colors. By counting the pixel mean values of the lower boundary range area and/or the right boundary range area of the preset polygonal area, the dialog box with the preset color at the preset position can be accurately judged, and the judgment accuracy of the dialog box with the preset color is improved.

The dialog box detection method provided by the present exemplary embodiment obtains a binarized image by performing binarization processing on a video frame image to be detected, performing area detection on the binary image to obtain a connected area in the binary image, determining the connected area as a preset polygon in the connected area as a preset polygon area, if a preset shape and/or a preset color exists at a preset position of the preset polygon area, determining the preset polygon area as a preset dialog box of a dialog box type, because the dialog box with the preset dialog box type is detected by detecting the preset shape and the preset color after the preset polygonal area is determined, the dialog box with local details can be accurately detected, the detection accuracy is improved, and the detection is carried out without a deep learning method, the data is not required to be marked manually, and the labor cost is reduced.

Fig. 5 is a block diagram illustrating a dialog detection device according to an example embodiment. Referring to fig. 5, the apparatus includes a binarization processing module 51, a region detection module 52, a region determination module 53, and a dialog detection module 54.

The binarization processing module 51 is configured to perform binarization processing on a video frame image to be detected to obtain a binarized image;

the region detection module 52 is configured to perform region detection on the binarized image, resulting in a connected region in the binarized image;

the region determining module 53 is configured to perform determining a connected region that is a preset polygon in the connected regions as a preset polygon region;

the dialog box detection module 54 is configured to perform determining that the preset polygonal area is a dialog box of a preset dialog box type when a preset shape and/or a preset color exists at a preset position of the preset polygonal area.

Optionally, the dialog box detection module includes:

Optionally, the dialog box determining unit is configured to perform:

and when the diagonal line is the diagonal line between the upper left corner and the lower right corner of the region to be detected, and the first proportion value is greater than the judgment threshold value, determining the preset polygonal region as a dialog box comprising a second triangular sharp corner.

Optionally, the judgment threshold is determined by the following steps:

and determining the judgment threshold according to the candidate thresholds.

Optionally, the dialog box detection module includes:

Optionally, the preset polygon is a rectangle;

the region determination module includes:

Optionally, the binarization processing module includes:

the gray processing unit is configured to perform gray processing on the video frame image to be detected to obtain a gray image of the video frame image to be detected;

and the binarization processing unit is configured to perform binarization processing on the gray level image to obtain the binarization image.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

FIG. 6 is a block diagram illustrating an electronic device in accordance with an example embodiment. For example, the electronic device 600 may be provided as a server. Referring to fig. 6, electronic device 600 includes a processing component 622 that further includes one or more processors and memory resources, represented by memory 632, for storing instructions, such as application programs, that are executable by processing component 622. The application programs stored in memory 632 may include one or more modules that each correspond to a set of instructions. Further, the processing component 622 is configured to execute instructions to perform the dialog box detection method described above.

The electronic device 600 may also include a power component 626 configured to perform power management for the electronic device 600, a wired or wireless network interface 650 configured to connect the electronic device 600 to a network, and an input/output (I/O) interface 658. The electronic device 600 may operate based on an operating system stored in the memory 632, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, or the like.

In an exemplary embodiment, a computer-readable storage medium comprising instructions, such as the memory 632 comprising instructions, executable by the processing component 622 of the electronic device 600 to perform the above-described dialog detection method is also provided. Alternatively, the computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

In an exemplary embodiment, a computer program product is also provided, comprising a computer program or computer instructions, which when executed by a processor, implements the dialog box detection method described above.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims

1. A dialog box detection method, comprising:

2. The method according to claim 1, wherein when a preset shape and/or a preset color exists at a preset position of the preset polygonal area, determining the preset polygonal area as a dialog box of a preset dialog box type comprises:

3. The method according to claim 2, wherein when the first proportion value is greater than a judgment threshold, determining the preset polygonal area as a dialog box including a triangular cusp comprises:

4. The method of claim 2, wherein the decision threshold is determined by:

and determining the judgment threshold according to the candidate thresholds.

5. The method according to claim 1, wherein when a preset shape and/or a preset color exists at a preset position of the preset polygonal area, determining the preset polygonal area as a dialog box of a preset dialog box type comprises:

6. The method according to any one of claims 1 to 5, wherein the predetermined polygon is a rectangle;

7. A dialog box detection device, comprising:

8. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the dialog detection method of any of claims 1 to 6.

9. A computer-readable storage medium whose instructions, when executed by a processor of an electronic device, enable the electronic device to perform the dialog detection method of any of claims 1-6.

10. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the dialog detection method of any of claims 1 to 6.