CN101799922A

CN101799922A - Method and device for detecting strokes of characters, method and device for locating lines of characters, and method and device for judging repeat of subtitles

Info

Publication number: CN101799922A
Application number: CN200910078007A
Authority: CN
Inventors: 苗广艺; 徐成华; 周景超; 鲍东山
Original assignee: BEIJING NUFRONT SOFTWARE TECHNOLOGY Co Ltd
Current assignee: BEIJING NUFRONT SOFTWARE TECHNOLOGY Co Ltd
Priority date: 2009-02-09
Filing date: 2009-02-09
Publication date: 2010-08-11

Abstract

The invention relates to a method for detecting strokes of characters in images, comprising the steps of: receiving an image; calculating response values of light strokes and response values of dark strokes of each pixel point in the image; carrying out treatment respectively on the response values of the light strokes and the response values of the dark strokes of each pixel point to obtain a light stroke picture and a dark stroke picture; and merging the light stroke picture with the dark stroke picture to obtain a combined stroke picture and the distribution of the strokes. The invention also discloses a device for detecting strokes of characters in images, a method and a device for locating lines of characters in images, and a method and a device for judging repeat of subtitles.

Description

Method and device for detecting character strokes, method and device for positioning character lines, and method and device for judging repeated subtitles

Technical Field

The present invention relates to a technology for processing characters in an image, and in particular, to a method and an apparatus for detecting strokes of characters in an image, a method and an apparatus for locating lines of characters in an image, and a method and an apparatus for determining duplication of subtitles.

Background

With the increasing of internet video contents and a large number of multimedia applications such as digital libraries, video on demand, remote teaching and the like, how to search out required data from massive videos is very important. The traditional video retrieval based on keyword description cannot meet the requirement of massive video retrieval due to the reasons of limited description capability, strong subjectivity, manual labeling and the like. Therefore, since the 20 actual 90 s, content-based video retrieval technology has become a hot issue of research, and video subtitle recognition technology is a key technology for realizing video retrieval. The currently proposed video subtitle detection methods can be roughly classified into three types, namely region-based, edge-based and texture-based, according to the used features. Many algorithms actually take advantage of some two or all three of the above features in combination.

At present, a stroke-based caption detection scheme is proposed. The technical scheme of performing subtitle detection based on strokes needs to design a stroke filter, which is different from the traditional edge texture filter, can detect strip structures with different scales in an image, and is insensitive to edges and textures without the strip structures, so that the stroke filter has better robustness to non-character background interference.

The technical scheme of subtitle detection based on stroke detection is significant, but the application of the currently designed stroke filter is very simple, the influence of stroke line brightness consistency, stroke angular points and intersection points is not fully considered, and the stroke detection effect is reduced.

Disclosure of Invention

In view of the above, the technical problem to be solved by the present invention is to provide a method for detecting strokes of characters in an image, so as to improve the stroke detection effect. In some optional embodiments, the method for detecting strokes of characters in an image comprises: receiving an image; calculating a response value of a bright stroke and a response value of a dark stroke of each pixel point in the image; respectively processing the response value of the bright stroke and the response value of the dark stroke of each pixel point to obtain a bright stroke graph and a dark stroke graph; and combining the bright stroke graph and the dark stroke graph to obtain a combined stroke graph and the distribution of strokes.

When the embodiment is adopted to detect the character strokes in the image, the influences of the consistency of the line brightness of the strokes and the angular points and the cross points of the strokes are fully considered, so that the stroke detection effect is greatly improved.

Another technical problem to be solved by the present invention is to provide a method for locating a text line in an image. In some optional embodiments, the method of locating a text line in an image comprises: receiving an image; calculating to obtain a light stroke image and a dark stroke image of the image; calculating to obtain a stroke density graph and a character distribution area by using the bright stroke graph and the dark stroke graph; projecting each character distribution area in the highlight drawing by two modes; dividing each character distribution area into at least one character line; the upper and lower boundaries of each line of text are determined.

When the embodiment is adopted to position the character lines in the image, the advantages of stroke density and double projection are combined, the character lines can be positioned more accurately, and the noise is better in anti-interference performance. The double projection positioning method optimizes the result of character line positioning by using the stroke distribution characteristics on the basis of region aggregation, so that the boundary of the character line is more accurate.

Another technical problem to be solved by the present invention is to provide a method for determining subtitle repetition. In some optional embodiments, the method comprises: after the character line of the previous image is positioned, the character line position, the image content and the stroke distribution map of the previous image are saved; before the character line of the current image is positioned, whether the distance between the character lines of the current image and the character line of the previous image is larger than a fifth threshold value is judged by utilizing the stored information; if so, positioning the character line of the current image; otherwise, the result is located along the lines of the previous image.

The invention aims to solve another technical problem of providing a device for detecting character strokes in an image. In some optional embodiments, the apparatus for detecting strokes of characters in an image includes a receiving unit for receiving the image, and further includes: the first unit is used for calculating a light stroke response value and a dark stroke response value of each pixel point in the image; a second unit for respectively processing the response value of the bright stroke and the response value of the dark stroke of each pixel point to obtain a bright stroke graph and a dark stroke graph; and a third unit for combining the light stroke graph and the dark stroke graph to obtain a combined stroke graph and the distribution of strokes.

Another object of the present invention is to provide a device for locating lines of text in an image. In some optional embodiments, the apparatus for locating a text line in an image includes a receiving unit for receiving an image, and further includes: the first unit is used for calculating a light stroke response value and a dark stroke response value of each pixel point in the image; a second unit for respectively processing the response value of the bright stroke and the response value of the dark stroke of each pixel point to obtain a bright stroke graph and a dark stroke graph; calculating to obtain a stroke density graph and a fourth unit of a character distribution area by using the bright stroke graph and the dark stroke graph; a fifth unit for projecting each character distribution area in two ways in the highlight drawing; a sixth unit dividing each character distribution area into at least one character line; and a seventh unit for determining upper and lower boundaries of each character line.

When the character line is positioned, the stroke density graph is formed by using the region aggregation algorithm, and the character region coarse positioning method has a good effect on coarse positioning of the character region. The stroke information of the bright stroke graph is utilized to carry out double projection, so that the upper boundary and the lower boundary of each character line can be accurately positioned.

Another technical problem to be solved by the present invention is to provide an apparatus for determining caption repetition. In some optional embodiments, the apparatus includes a receiving unit for receiving the image, a storage unit, and a positioning unit for positioning the text line of the image, and further includes an eighth unit for saving the text line position, the image content, and the stroke distribution map of the previous image to the storage unit after the text line of the previous image is positioned; before the character line of the current image is positioned, judging whether the distance between the character lines of the current image and the previous image is greater than a fifth threshold value by using the information stored in the storage unit; if the current image is larger than the preset image, starting a positioning unit to position the character line of the current image; otherwise, the character line positioning result of the previous image stored by the storage unit is used.

It can be seen that by retaining the stroke information of the current image, and comparing whether the subtitles detected by the adjacent frames are the same or not by using the stroke difference of the adjacent frames before the character detection of the next image, a large amount of repeated subtitles can be eliminated, the repeated detection is reduced, and the character detection efficiency is further improved.

Drawings

FIG. 1 is a flow chart of a method for detecting strokes of characters in an image according to the present invention;

FIG. 2 is a schematic diagram of a stroke filter;

FIG. 3 is a flow chart of a method of locating lines of text in an image provided by the present invention;

FIG. 4 is a schematic diagram of an apparatus for detecting strokes of characters in an image according to the present invention;

FIG. 5 is a schematic diagram of an apparatus for locating lines of text in an image provided by the present invention;

fig. 6 is a schematic diagram of an apparatus for determining caption overlap according to the present invention.

Detailed Description

FIG. 1 illustrates an alternative method of detecting strokes of a word.

Step 11, receiving an image.

And step 12, calculating the response value of the light stroke and the response value of the dark stroke of each pixel point in the image.

And step 13, respectively processing the response value of the bright strokes and the response value of the dark strokes of each pixel point to obtain a bright stroke graph and a dark stroke graph.

And 14, combining the bright stroke graph and the dark stroke graph to obtain a combined stroke graph and the distribution of strokes.

Characters are composed of strokes, and a character line is composed of a plurality of strokes according to a certain rule. The stroke is represented by a line structure which has a certain direction, width and length, and the color of a pixel on the stroke has a greater contrast with the color of a non-stroke pixel in the neighborhood. The stroke line filter can be designed according to the characteristics of strokes. The stroke line filter generally has a plurality of directions and detection dimensions, which are specifically defined as shown in fig. 2. The black dots in fig. 2 represent the pixel points in the center of the filter, i.e., the pixel points being processed. The three strip-shaped areas (1), (2) and (3) are arranged in parallel, and the length, the width and the direction of the three strip-shaped areas are the same. The horizontal included angle alpha of the strip-shaped area can take a plurality of values, the detection scale of the stroke line filter is determined by the distance d of the strip-shaped area, and the length l of the strip-shaped area determines the minimum length of the stroke. Stroke filters of different detection scales can detect character strokes of different widths.

Strokes with a brightness above the background are defined herein as light strokes and strokes with a brightness below the background are defined herein as dark strokes. Under the set detection scale s, an alternative way is to calculate the response value of the bright stroke of each pixel point in the whole gray image according to the formula (1.1)

Calculating the response value of dark strokes of each pixel point in the whole gray level image according to the formula (1.2)

In the formulae (1.1) and (1.2), u₁、u₂、u₃Respectively representing the average brightness values of the strip-shaped areas (1), (2) and (3),

the luminance variances of the stripe regions (1), (2), and (3) are shown, respectively.

An alternative way of taking the transverse angle alpha of the strip-shaped area is 0,

And

these 4 directions. In this case, there are 4 response values of the bright strokes of each pixel point, which are respectively

And

the response value of dark strokes of each pixel point is also 4, and is respectively

And

after obtaining all the response values of the bright strokes and the response values of the dark strokes of each pixel point:

firstly, get eachMaximum response value of bright stroke of each pixel point

And maximum dark stroke response value

Then, the response value of each pixel point vertical to the maximum bright stroke is taken

Response value of the bright stroke in the direction

And perpendicular to the maximum dark stroke response value

Dark stroke response value of direction

For a certain pixel point, if

Is composed of

ThenIs composed of

And vice versa; if it is not

Is composed of

Then

Is composed of

And vice versa.

For a certain pixel point, if

Is composed of

Then

Is composed ofAnd vice versa; if it is notIs composed of

Then

Is composed of

And vice versa.

And finally, calculating the bright stroke value of each pixel point according to the formula (1.3) and the formula (1.4) respectively

And dark stroke value

And gets a light stroke map (BSM) and a Dark Stroke Map (DSM).

By the light stroke value of all pixel points

The formed image is a bright stroke image BSM consisting of dark strokes of all pixel pointsValue of

The image formed is a dark stroke DSM.

On the bright stroke image BSM, bright strokes are detected to be prominent, and meanwhile, a part of background pixels among dark strokes are also detected to be prominent; on the dark stroke pattern DSM, the dark strokes are detected to be highlighted, while a portion of the background pixels between the light strokes are also detected to be highlighted.

And after obtaining the bright stroke map BSM and the dark stroke map DSM, calculating the joint stroke value of each pixel point according to the formula (1.5) and obtaining a joint stroke map (USM).

R_{s}^{U} (x, y) = \max {R_{s}^{B} (x, y), R_{s}^{D} (x, y)} - - - (1.5)

By joint stroke values of all pixel pointsThe constructed image is a joint stroke USM. The united stroke map USM integrates the results of the bright stroke BSM and the dark stroke map DSM, and on the united stroke map USM, the stroke pixels and part of the background pixels in the neighborhood thereof are detected to be highlighted, thus highlighting the pixels of the region where the text appears as a whole, thereby detecting the distribution of the strokes.

Based on the method for detecting the strokes of the characters, the invention also provides a method for positioning the character lines in the image. FIG. 3 illustrates an alternative embodiment for locating lines of text in an image.

Step 31, receiving an image.

And step 32, calculating to obtain a light stroke graph and a dark stroke graph.

And step 33, calculating by using the light stroke graph and the dark stroke graph to obtain a stroke density graph and a character distribution area.

And step 34, projecting each character distribution area in a highlight drawing by two modes.

And 35, dividing each character distribution area into at least one character line.

Step 36, determine the upper and lower boundaries of each line of text.

After the method described in the foregoing embodiment is adopted and the light stroke map BSM, the dark stroke map DSM, and the combined stroke map USM are obtained through calculation, the stroke density map (USM) is obtained through calculation.

In the stroke density calculation step, a stroke density map SDM can be calculated according to the combined stroke map USM. There are many related algorithms, and one alternative is to use existing morphology-based region aggregation algorithms. Another alternative is to aggregate existing density-based regions.

The principle of one of the proposed density-based regional polymerization algorithms is shown in equation (1.6).

Wherein Dens (x, y) represents the stroke density in the neighborhood region centered on the pixel point (x, y), the area of the neighborhood region is (2w +1) × (2h +1),

representing the value of a pixel point (x, y) on the joint stroke map USM.

The density calculation is carried out on each pixel point by the density-based regional aggregation algorithm, so that the regional aggregation effect is relatively accurate, and the robustness to low-density noise interference is good.

After obtaining the stroke density map SDM, candidate regions of text distribution need to be determined. The specific processing flow comprises the following steps:

the highlight map BSM is binarized using a set threshold value.

And carrying out OR operation on the binarized bright stroke graph and the stroke density graph SDM to obtain a new stroke density graph.

On the new stroke density graph, white pixels are connected into a plurality of areas, and the connected areas are candidate areas of character distribution.

The step of performing or operation on the binarized bright stroke map and the stroke density map SDM means that corresponding pixel points in the binarized bright stroke map and the stroke density map SDM are subjected to or operation.

After candidate areas of the character distribution are obtained, in an original bright stroke map BSM without binarization, double projection is carried out in each candidate area:

firstly, projecting the brightness value of a pixel point in each candidate region to obtain a brightness histogram;

and then accumulating the times of changing each row of pixel points in each candidate region from zero to non-zero to obtain an intersection point histogram. The intersection histogram is called because it is equivalent to traversing a candidate region with a horizontal straight line, and counting the number of intersections of the straight line and the character strokes.

After the luminance histogram and the intersection histogram are obtained, the segmentation points are counted on the two histograms. If the value of a point on the luminance histogram is less than the first empirical threshold and the value on the intersection histogram is less than the second empirical threshold, then the point is marked as a split point. The candidate region is horizontally divided along the dividing points, i.e. a plurality of candidate character lines are formed.

For each candidate character line, finding the maximum value of the horizontal brightness histogram, respectively finding boundary points from the maximum value to the upper direction and the lower direction, and stopping finding under the conditions that: the value of the point on the luminance histogram is less than a third threshold, or the value on the intersection histogram is less than a fourth empirical threshold. And horizontally cutting along the boundary point to form the upper and lower boundaries of the character line.

After each line is located, two verifications, shape-based and stroke-based, may be used, as the detected lines may be misdetected. Among the shape-based decision rules are: the size, height and width of the character area, the aspect ratio of the character area and the position of the character area. The stroke-based judgment rules include: the density of the stroke pixels, the proportions of the various directions of the stroke pixels, the length of the stroke, etc. Since the rules are basically heuristic rules defined by themselves, they are not described in detail here.

In addition, the text in the video typically stays for several seconds, and the same text line is detected in the images of consecutive frames. If stroke detection and character line positioning are carried out on multi-frame images with the same character lines, resources and processing time are consumed meaninglessly. Therefore, verification can be carried out before stroke detection and character line positioning, whether the character line of the current frame image is the same as the character line of the previous frame image or not is judged, if yes, the current frame image is skipped, and repeated stroke detection and character line positioning are not carried out. The specific verification steps include:

1) for the image of the ith frame, detecting character strokes in the image, positioning character lines in the image, and storing relevant information of the frame i, wherein the method comprises the following steps: color image (RGB values of pixel points on image are

) A stroke distribution diagram (an alternative way is to use a bright stroke diagram as the stroke distribution diagram, and the pixel point value on the image is

) And the position of the candidate region of the character distribution (M candidate regions RECT in total)₁To RECT_m)。

Wherein the value of each pixel on the stroke profile represents a probability that the pixel is a text stroke. The calculation method of the stroke distribution diagram can be determined according to actual conditions. In the example, assuming that the brightness of the character strokes is larger than the background, a bright stroke graph is used as the stroke distribution graph, and the process of normalizing the probability value to be in the range of 0-1 is omitted.

2) For the (i +1) th frame image,calculating the color distance between the pixel points corresponding to the adjacent frame images according to the formula (1.7); calculating candidate region RECT according to equation (1.8)_mThe distance between adjacent frames; calculating the total text line distance of the adjacent frames according to the formula (1.9), wherein size (RECT)_m) Representing a candidate region RECT_mThe area of (a).

<math><mrow><msub><mi>CortDist</mi><mrow><mi>i</mi><mo>+</mo><mn>1</mn></mrow></msub><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>=</mo><msup><mrow><mo>{</mo><mfrac><mn>1</mn><mn>3</mn></mfrac><munder><mi>Σ</mi><mrow><mi>cor</mi><mo>=</mo><mi>R</mi><mo>,</mo><mi>G</mi><mo>,</mo><mi>B</mi></mrow></munder><msup><mrow><mo>[</mo><msubsup><mi>P</mi><mrow><mi>i</mi><mo>+</mo><mn>1</mn></mrow><mi>cor</mi></msubsup><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>-</mo><msubsup><mi>P</mi><mi>i</mi><mi>cor</mi></msubsup><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>]</mo></mrow><mn>2</mn></msup><mo>}</mo></mrow><mfrac><mn>1</mn><mn>2</mn></mfrac></msup><mo>-</mo><mo>-</mo><mo>-</mo><mrow><mo>(</mo><mn>1.7</mn><mo>)</mo></mrow></mrow></math>

<math><mrow><msub><mi>RectDist</mi><mrow><mi>i</mi><mo>+</mo><mn>1</mn></mrow></msub><mrow><mo>(</mo><mi>m</mi><mo>)</mo></mrow><mo>=</mo><mo>[</mo><munder><mi>Σ</mi><mrow><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>&Element;</mo><msub><mi>RECT</mi><mi>m</mi></msub></mrow></munder><msubsup><mi>R</mi><mi>i</mi><mi>B</mi></msubsup><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>×</mo><msub><mi>CorDist</mi><mrow><mi>i</mi><mo>+</mo><mn>1</mn></mrow></msub><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>]</mo><mo>/</mo><munder><mi>Σ</mi><mrow><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>&Element;</mo><msub><mi>RECT</mi><mi>m</mi></msub></mrow></munder><msubsup><mi>R</mi><mrow><mi>i</mi><mo>+</mo><mn>1</mn></mrow><mi>B</mi></msubsup><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>-</mo><mo>-</mo><mo>-</mo><mrow><mo>(</mo><mn>1.8</mn><mo>)</mo></mrow></mrow></math>

<math><mrow><msub><mi>FrameDist</mi><mrow><mi>i</mi><mo>+</mo><mn>1</mn></mrow></msub><mo>=</mo><mo>[</mo><munderover><mi>Σ</mi><mrow><mi>m</mi><mo>=</mo><mn>1</mn></mrow><mi>M</mi></munderover><mi>size</mi><mrow><mo>(</mo><msub><mi>RECT</mi><mi>m</mi></msub><mo>)</mo></mrow><mo>×</mo><msub><mi>RectDist</mi><mrow><mi>i</mi><mo>+</mo><mn>1</mn></mrow></msub><mrow><mo>(</mo><mi>m</mi><mo>)</mo></mrow><mo>]</mo><mo>/</mo><munderover><mi>Σ</mi><mrow><mi>m</mi><mo>=</mo><mn>1</mn></mrow><mi>M</mi></munderover><mi>size</mi><mrow><mo>(</mo><msub><mi>RECT</mi><mi>m</mi></msub><mo>)</mo></mrow><mo>-</mo><mo>-</mo><mo>-</mo><mrow><mo>(</mo><mn>1.9</mn><mo>)</mo></mrow></mrow></math>

Equation (1.7) is a method for calculating the color distance, and other distances may be used instead of the color distance, and here, only the RGB color distance is taken as an example for convenience of description.

3) Using an empirical threshold w for literal line distance frameDist_i+1Make a judgment if FrameDist_i+1If the difference is smaller than the experience threshold value w, the difference between the lines of the i +1 th frame image and the i th frame image is very small, and the lines of the i +1 th frame image and the i th frame image are the same line of the text, repeated detection is not needed, and the frame can be skipped directly; if FrameDist_i+1If the value is larger than the threshold value, the difference between the character lines of the i +1 th frame and the ith frame is larger, and the character lines of the i +1 th frame are different from the ith frame, so that stroke detection and character line positioning need to be carried out again.

And after verification, updating corresponding information of the ith frame by using the color image, the stroke distribution diagram and the candidate region position of the character distribution of the (i +1) th frame.

Before character detection, the stroke distribution of adjacent frames is compared, whether the stroke distribution is repeated with the character line of the previous frame is judged, and then whether character detection is carried out again is determined. The method is compared and judged before detection, so that the detection process of repeated character lines is avoided, and much detection time is saved for the characters of the video character lines; the algorithm fully considers the action of the stroke pixels, has good anti-interference performance on the background pixels, and has better judgment effect on whether the character lines are repeated.

It can be seen that the method for determining whether to repeat before locating the text line is not limited to the embodiments provided by the present invention, and the method can be applied to other methods for locating the text line. No matter which method is used for positioning the character row in the image, the character stroke information can be utilized to judge whether the distance between the character row of the current image and the character row of the previous image is larger than an experience threshold value w or not before the character row positioning operation is carried out on the current image; if so, positioning the character line of the current image; otherwise, the result is located along the lines of the previous image.

FIG. 4 shows an apparatus 400 for detecting strokes of characters in an image, the apparatus 400 including a receiving unit S40, a first unit S41, a second unit S42 and a third unit S43

The receiving unit S40 is for receiving an image.

The first unit S41 is configured to calculate a light stroke response value and a dark stroke response value of each pixel point in the image; the second unit S42 is configured to process the light stroke response value and the dark stroke response value of each pixel point, respectively, to obtain a light stroke graph and a dark stroke graph; the third unit S43 is configured to merge the light stroke graph and the dark stroke graph to obtain a combined stroke graph and a distribution of strokes.

The processing of the first unit S41, the second unit S42 and the third unit S43 is described above and will not be described herein.

FIG. 5 shows an apparatus 500 for locating lines of text in an image, the apparatus 500 comprising a receiving unit S40, a first unit S41, a second unit S42, a fourth unit S54, a fifth unit S55, a sixth unit S56, and a seventh unit S57.

The fourth unit S54 calculates and obtains a stroke density map and a character distribution area using the light stroke map and the dark stroke map; a fifth unit S55 projects each character distribution region in the highlight map in two ways; the sixth unit S56 divides each letter distribution area into at least one letter row; the seventh unit S57 is for determining the upper and lower boundaries of each text line.

The processing of the fourth unit S54, the fifth unit S55, the sixth unit S56 and the seventh unit S57 is described above and will not be described herein.

The verification can be carried out before stroke detection and character line positioning, whether the character line of the current frame image is the same as the character line of the previous frame image or not is judged, if yes, the processing of the current frame image is skipped, and repeated stroke detection and character line positioning are not carried out. In this case, an eighth unit may be further added to the apparatus 400 or the apparatus 500.

The eighth unit is used for judging whether the distance between the text lines of the current image and the previous image is greater than a fifth threshold value; if so, the first element S41 is activated to locate the text line of the current image. Otherwise, the seventh unit S57 is started to output the character line positioning result of the previous image; alternatively, the third unit S43 is activated to output the stroke detection result of the previous image.

The processing procedure of the eighth unit is described above, and is not described herein.

Fig. 6 shows an apparatus 600 for judging duplication of subtitles, the apparatus 600 including a receiving unit S40, a locating unit S61, a storing unit S63 and an eighth unit S62.

The positioning unit S61 is used to position the text line of the image. An eighth unit for saving the character line position, the image content, and the stroke distribution map of the previous image to the storage unit S63 after the character line of the previous image is positioned; before the text line of the current image is positioned, whether the text line distance between the current image and the previous image is greater than a fifth threshold is judged by using the information stored in the storage unit S63; if yes, the positioning unit S61 is started to position the character line of the current image; otherwise, the result is located along the text line of the previous image held by the storage unit S63.

Those of skill in the art will understand that the various exemplary method steps and apparatus elements described in connection with the embodiments disclosed herein can be implemented as electronic hardware, software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative steps and elements have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The various illustrative elements described in connection with the embodiments disclosed herein may be implemented or performed with a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.

The steps of a method described in connection with the embodiments disclosed above may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a subscriber station. In the alternative, the processor and the storage medium may reside as discrete components in a subscriber station.

The disclosed embodiments are provided to enable those skilled in the art to make or use the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the scope or spirit of the invention. The above-described embodiments are merely preferred embodiments of the present invention, which should not be construed as limiting the invention, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A method for detecting strokes of characters in an image, comprising:

receiving an image;

calculating a response value of a bright stroke and a response value of a dark stroke of each pixel point in the image;

respectively processing the response value of the bright stroke and the response value of the dark stroke of each pixel point to obtain a bright stroke graph and a dark stroke graph;

and combining the bright stroke graph and the dark stroke graph to obtain a combined stroke graph and the distribution of strokes.

2. The method of claim 1, wherein for each pixel point:

with the pixel point as the center, 3 parallel strip-shaped areas with equal length and equal width are respectively arranged in a plurality of directions;

respectively calculating the brightness mean value and the brightness variance of each strip-shaped area in each direction;

and calculating the response value of the bright stroke and the response value of the dark stroke of the pixel point in each direction by using the brightness mean value and the brightness variance of each strip-shaped area in each direction.

3. The method of claim 2, wherein for each pixel point, the largest response value of the highlighting strokes in its respective direction is taken

And a response value perpendicular to the maximum light strokeThe response value of the bright stroke in the direction

Calculating to obtain the bright stroke value of each pixel point

By the light stroke value of all pixel points

The constructed image is a highlighted drawing.

4. The method of claim 3, wherein for each pixel point, the largest dark stroke response value in each direction is takenAnd a response value perpendicular to the largest dark stroke

Dark stroke response value of direction in which

Calculating to obtain the dark stroke value of each pixel point

From the dark stroke values of all pixel points

The constructed image is a dark stroke image.

5. The method of claim 4, wherein the same pixel (x, y) in the light stroke map and the dark stroke map is assigned its light stroke value

And dark stroke value

The larger one of the two is used as the combined stroke value of the pixel point

By joint stroke values of all pixel points

The formed image is a combined stroke picture; the distribution of strokes is characterized by the salient text pixels in the joint stroke graph.

6. The method of claim 2, wherein the pixel point is centered at a transverse included angle of 0,

And

in the 4 directions, 3 parallel strip-shaped areas with equal length and equal width are respectively arranged.

7. The method of claim 6 wherein the bright stroke response value for a pixel having coordinates (x, y) in the direction of lateral angle α is calculated according to the following formula

And dark stroke response value

s is the configured detection scale, d is the interval between each strip-shaped area;

u₁、u₂、u₃respectively representing the average of the luminance of 3 strip-shaped regions in the direction of the transverse angle alpha,

denotes the variance of the luminance of each strip in the direction of the transverse angle α, where u₁The luminance average of the strip region where the pixel point (x, y) is located.

8. The method of claim 3, wherein the computing is performed

Andobtaining the mean value of

9. The method of claim 4, wherein the computing is performed

And

obtaining the mean value of

10. A method for locating lines of text in an image, comprising:

receiving an image;

calculating to obtain a light stroke image and a dark stroke image of the image;

calculating to obtain a stroke density graph and a character distribution area by using the bright stroke graph and the dark stroke graph;

projecting each character distribution area in the highlight drawing by two modes;

dividing each character distribution area into at least one character line;

the upper and lower boundaries of each line of text are determined.

11. The method according to claim 10, characterized in that binarization processing is performed on the bright stroke map, and an or operation is performed on the bright stroke map after binarization processing and the stroke density map;

and taking the area formed by connecting white pixel points in the new stroke density graph as a character distribution area.

12. The method of claim 10, wherein the luminance value of the pixel on the light stroke map is projected in the horizontal direction in each text distribution area to obtain a luminance histogram of each text distribution area;

and accumulating the times of changing each line of pixels from zero to non-zero on the highlight drawing in each character distribution area to obtain an intersection point histogram of each character distribution area.

13. The method of claim 12, wherein the text distribution area is divided horizontally at each text distribution area along the searched dividing points to form a plurality of text lines;

wherein the division point satisfies the following condition:

the value of the point on the luminance histogram is less than a first threshold and the value on the intersection histogram is less than a second threshold.

14. The method of claim 13, wherein for each text line, the boundary points of the text line are searched from the maximum value of the luminance histogram in the upper and lower directions, and are horizontally divided along the boundary points to form the upper and lower boundaries of the text line;

wherein the boundary points satisfy the following conditions:

the value of the point on the luminance histogram is less than a third threshold, or the value on the intersection histogram is less than a fourth threshold.

15. A method for judging subtitle repetition is characterized by comprising the following steps:

after the character line of the previous image is positioned, the character line position, the image content and the stroke distribution map of the previous image are saved;

before the character line of the current image is positioned, whether the distance between the character lines of the current image and the character line of the previous image is larger than a fifth threshold value is judged by utilizing the stored information; if so, positioning the character line of the current image; otherwise, the result is located along the lines of the previous image.

16. The method of claim 15, wherein the method is performed by computing

<math><mrow><msub><mi>FrameDist</mi><mrow><mi>i</mi><mo>+</mo><mn>1</mn></mrow></msub><mo>=</mo><mo>[</mo><munderover><mi>Σ</mi><mrow><mi>m</mi><mo>=</mo><mn>1</mn></mrow><mi>M</mi></munderover><mi>size</mi><mrow><mo>(</mo><msub><mi>RECT</mi><mi>m</mi></msub><mo>)</mo></mrow><mo>×</mo><msub><mi>RectDist</mi><mrow><mi>i</mi><mo>+</mo><mn>1</mn></mrow></msub><mrow><mo>(</mo><mi>m</mi><mo>)</mo></mrow><mo>]</mo><mo>/</mo><munderover><mi>Σ</mi><mrow><mi>m</mi><mo>=</mo><mn>1</mn></mrow><mi>M</mi></munderover><mi>size</mi><mrow><mo>(</mo><msub><mi>RECT</mi><mi>m</mi></msub><mo>)</mo></mrow><mo>,</mo></mrow></math>

Obtaining the line distance FrameDist between the current image and the previous image_i+1；

Wherein, size (RECT)_m) Representing a text region RECT_MArea of (1), RectDist_i+1(M) represents a distance between the text distribution areas of the current image and the previous image, and M represents the total number of text distribution areas.

17. The method of claim 16, wherein the computing is performed by

<math><mrow><msub><mi>RectDist</mi><mrow><mi>i</mi><mo>+</mo><mn>1</mn></mrow></msub><mrow><mo>(</mo><mi>m</mi><mo>)</mo></mrow><mo>=</mo><mo>[</mo><munder><mi>Σ</mi><mrow><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>&Element;</mo><msub><mi>RECT</mi><mi>m</mi></msub></mrow></munder><msubsup><mi>R</mi><mi>i</mi><mi>B</mi></msubsup><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>×</mo><msub><mi>CorDist</mi><mrow><mi>i</mi><mo>+</mo><mn>1</mn></mrow></msub><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>]</mo><mo>/</mo><munder><mi>Σ</mi><mrow><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>&Element;</mo><msub><mi>RECT</mi><mi>m</mi></msub></mrow></munder><msubsup><mi>R</mi><mrow><mi>i</mi><mo>+</mo><mn>1</mn></mrow><mi>B</mi></msubsup><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>,</mo><mo></mo></mrow></math>

Obtaining the distance between the text distribution areas of the current image and the previous image;

wherein,

representing the value of a pixel (x, y) on the stroke profile of the previous image, CortDist_i+1And (x, y) represents the color distance between corresponding pixel points of the current image and the previous image.

18. The method of claim 17,

<math><mrow><msub><mi>CortDist</mi><mrow><mi>i</mi><mo>+</mo><mn>1</mn></mrow></msub><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>=</mo><msup><mrow><mo>{</mo><mfrac><mn>1</mn><mn>3</mn></mfrac><munder><mi>Σ</mi><mrow><mi>cor</mi><mo>=</mo><mi>R</mi><mo>,</mo><mi>G</mi><mo>,</mo><mi>B</mi></mrow></munder><msup><mrow><mo>[</mo><msubsup><mi>P</mi><mrow><mi>i</mi><mo>+</mo><mn>1</mn></mrow><mi>cor</mi></msubsup><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>-</mo><msubsup><mi>P</mi><mi>i</mi><mi>cor</mi></msubsup><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>]</mo></mrow><mn>2</mn></msup><mo>}</mo></mrow><mfrac><mn>1</mn><mn>2</mn></mfrac></msup></mrow></math>

wherein,

and

and respectively representing RGB color values of corresponding pixel points of the current image and the previous image.

19. An apparatus for detecting strokes of characters in an image, comprising a receiving unit for receiving the image, characterized by further comprising:

the first unit is used for calculating a light stroke response value and a dark stroke response value of each pixel point in the image;

a second unit for respectively processing the response value of the bright stroke and the response value of the dark stroke of each pixel point to obtain a bright stroke graph and a dark stroke graph; and,

and combining the bright stroke graph and the dark stroke graph to obtain a third unit of the distribution of the combined stroke graph and the strokes.

20. The apparatus of claim 19, wherein for each pixel:

the first unit takes the pixel point as a center, and 3 parallel strip-shaped areas with equal length and equal width are respectively arranged in a plurality of directions; respectively calculating the brightness mean value and the brightness variance of each strip-shaped area in each direction; and calculating the response value of the bright stroke and the response value of the dark stroke of the pixel point in each direction by using the brightness mean value and the brightness variance of each strip-shaped area in each direction.

21. The apparatus of claim 20, wherein for each pixel point, the second element takes the largest of its bright stroke response values in each direction

And a response value perpendicular to the maximum light stroke

The response value of the bright stroke in the direction

Calculating to obtain the bright stroke value of each pixel point

By the light stroke value of all pixel pointsThe constructed image is a highlighted drawing.

22. The apparatus of claim 21, wherein for each pixel point, the second element takes the largest dark stroke response value in its respective direction

And a response value perpendicular to the largest dark stroke

Dark stroke response value of direction in which

Calculating to obtain the dark stroke value of each pixel point

From the dark stroke values of all pixel points

The constructed image is a dark stroke image.

23. The apparatus of claim 22, wherein the third unit takes the light stroke value of the same pixel (x, y) in the light stroke map and the dark stroke map

And dark stroke valueThe larger one of the two is used as the combined stroke value of the pixel point

By joint stroke values of all pixel points

The formed image is a combined stroke picture;

the distribution of strokes is characterized by the salient text pixels in the joint stroke graph.

24. The apparatus of claim 20, wherein the pixel is centered at a transverse included angle of 0,

And

25. The apparatus of claim 24 wherein the bright stroke response value for a pixel having coordinates (x, y) in a direction having a transverse angle α is calculated according to the following formulaAnd dark stroke response value

denotes the variance of the luminance of each strip in the direction of the transverse angle α, where u₁Is likeAnd (5) the brightness average value of the strip-shaped area where the pixel point (x, y) is located.

26. The apparatus of claim 21, wherein the computing of the

And

obtaining the mean value of

27. The apparatus of claim 22, wherein the computing of the

And

obtaining the mean value of

28. An apparatus for locating a line of text in an image, comprising a receiving unit for receiving the image, characterized by:

a second unit for respectively processing the response value of the bright stroke and the response value of the dark stroke of each pixel point to obtain a bright stroke graph and a dark stroke graph;

calculating to obtain a stroke density graph and a fourth unit of a character distribution area by using the bright stroke graph and the dark stroke graph;

a fifth unit for projecting each character distribution area in two ways in the highlight drawing;

a sixth unit dividing each character distribution area into at least one character line; and,

and a seventh unit for determining upper and lower boundaries of each text line.

29. The apparatus according to claim 28, wherein a fourth unit performs binarization processing on the bright stroke map, and performs an or operation on the binarized bright stroke map and the stroke density map;

30. The apparatus according to claim 28, wherein the fifth unit performs projection of the luminance value of the pixel on the highlight map in the horizontal direction in each of the character distribution regions to obtain a luminance histogram in each of the character distribution regions; and accumulating the times of changing each line of pixels from zero to non-zero on the highlight drawing in each character distribution area to obtain an intersection point histogram of each character distribution area.

31. The apparatus of claim 30, wherein the sixth means divides the text distribution area horizontally at each text distribution area along the located dividing points to form a plurality of text lines; wherein the division point satisfies the following condition:

32. The apparatus of claim 31, wherein for each text line, the seventh unit searches for boundary points of the text line in two directions, respectively up and down, from a maximum value of the luminance histogram, and divides horizontally along the boundary points to form upper and lower boundaries of the text line; wherein the boundary points satisfy the following conditions:

33. An apparatus for determining duplication of subtitles, comprising a receiving unit for receiving an image, a storage unit, and a positioning unit for positioning a text line of the image, the apparatus comprising:

the eighth unit is used for storing the character line position, the image content and the stroke distribution map of the previous image into the storage unit after the character line of the previous image is positioned; before the character line of the current image is positioned, judging whether the distance between the character lines of the current image and the previous image is greater than a fifth threshold value by using the information stored in the storage unit; if the current image is larger than the preset image, starting a positioning unit to position the character line of the current image; otherwise, the character line positioning result of the previous image stored by the storage unit is used.

34. The apparatus of claim 33, wherein the eighth unit calculates

Wherein, size (RECT)_m) Representing a text region RECT_mArea of (1), RectDist_i+1(M) represents the distance between the text distribution areas of the current image and the previous image, and M represents the text distributionThe total number of regions.

35. The apparatus of claim 34, wherein the calculation is performed by

wherein,

36. The apparatus of claim 35,

wherein,

and