CN101799922A - Method and device for detecting strokes of characters, method and device for locating lines of characters, and method and device for judging repeat of subtitles - Google Patents

Method and device for detecting strokes of characters, method and device for locating lines of characters, and method and device for judging repeat of subtitles Download PDF

Info

Publication number
CN101799922A
CN101799922A CN200910078007A CN200910078007A CN101799922A CN 101799922 A CN101799922 A CN 101799922A CN 200910078007 A CN200910078007 A CN 200910078007A CN 200910078007 A CN200910078007 A CN 200910078007A CN 101799922 A CN101799922 A CN 101799922A
Authority
CN
China
Prior art keywords
mrow
stroke
msub
image
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN200910078007A
Other languages
Chinese (zh)
Inventor
苗广艺
徐成华
周景超
鲍东山
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING NUFRONT SOFTWARE TECHNOLOGY Co Ltd
Original Assignee
BEIJING NUFRONT SOFTWARE TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING NUFRONT SOFTWARE TECHNOLOGY Co Ltd filed Critical BEIJING NUFRONT SOFTWARE TECHNOLOGY Co Ltd
Priority to CN200910078007A priority Critical patent/CN101799922A/en
Publication of CN101799922A publication Critical patent/CN101799922A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention relates to a method for detecting strokes of characters in images, comprising the steps of: receiving an image; calculating response values of light strokes and response values of dark strokes of each pixel point in the image; carrying out treatment respectively on the response values of the light strokes and the response values of the dark strokes of each pixel point to obtain a light stroke picture and a dark stroke picture; and merging the light stroke picture with the dark stroke picture to obtain a combined stroke picture and the distribution of the strokes. The invention also discloses a device for detecting strokes of characters in images, a method and a device for locating lines of characters in images, and a method and a device for judging repeat of subtitles.

Description

Method and device for detecting character strokes, method and device for positioning character lines, and method and device for judging repeated subtitles
Technical Field
The present invention relates to a technology for processing characters in an image, and in particular, to a method and an apparatus for detecting strokes of characters in an image, a method and an apparatus for locating lines of characters in an image, and a method and an apparatus for determining duplication of subtitles.
Background
With the increasing of internet video contents and a large number of multimedia applications such as digital libraries, video on demand, remote teaching and the like, how to search out required data from massive videos is very important. The traditional video retrieval based on keyword description cannot meet the requirement of massive video retrieval due to the reasons of limited description capability, strong subjectivity, manual labeling and the like. Therefore, since the 20 actual 90 s, content-based video retrieval technology has become a hot issue of research, and video subtitle recognition technology is a key technology for realizing video retrieval. The currently proposed video subtitle detection methods can be roughly classified into three types, namely region-based, edge-based and texture-based, according to the used features. Many algorithms actually take advantage of some two or all three of the above features in combination.
At present, a stroke-based caption detection scheme is proposed. The technical scheme of performing subtitle detection based on strokes needs to design a stroke filter, which is different from the traditional edge texture filter, can detect strip structures with different scales in an image, and is insensitive to edges and textures without the strip structures, so that the stroke filter has better robustness to non-character background interference.
The technical scheme of subtitle detection based on stroke detection is significant, but the application of the currently designed stroke filter is very simple, the influence of stroke line brightness consistency, stroke angular points and intersection points is not fully considered, and the stroke detection effect is reduced.
Disclosure of Invention
In view of the above, the technical problem to be solved by the present invention is to provide a method for detecting strokes of characters in an image, so as to improve the stroke detection effect. In some optional embodiments, the method for detecting strokes of characters in an image comprises: receiving an image; calculating a response value of a bright stroke and a response value of a dark stroke of each pixel point in the image; respectively processing the response value of the bright stroke and the response value of the dark stroke of each pixel point to obtain a bright stroke graph and a dark stroke graph; and combining the bright stroke graph and the dark stroke graph to obtain a combined stroke graph and the distribution of strokes.
When the embodiment is adopted to detect the character strokes in the image, the influences of the consistency of the line brightness of the strokes and the angular points and the cross points of the strokes are fully considered, so that the stroke detection effect is greatly improved.
Another technical problem to be solved by the present invention is to provide a method for locating a text line in an image. In some optional embodiments, the method of locating a text line in an image comprises: receiving an image; calculating to obtain a light stroke image and a dark stroke image of the image; calculating to obtain a stroke density graph and a character distribution area by using the bright stroke graph and the dark stroke graph; projecting each character distribution area in the highlight drawing by two modes; dividing each character distribution area into at least one character line; the upper and lower boundaries of each line of text are determined.
When the embodiment is adopted to position the character lines in the image, the advantages of stroke density and double projection are combined, the character lines can be positioned more accurately, and the noise is better in anti-interference performance. The double projection positioning method optimizes the result of character line positioning by using the stroke distribution characteristics on the basis of region aggregation, so that the boundary of the character line is more accurate.
Another technical problem to be solved by the present invention is to provide a method for determining subtitle repetition. In some optional embodiments, the method comprises: after the character line of the previous image is positioned, the character line position, the image content and the stroke distribution map of the previous image are saved; before the character line of the current image is positioned, whether the distance between the character lines of the current image and the character line of the previous image is larger than a fifth threshold value is judged by utilizing the stored information; if so, positioning the character line of the current image; otherwise, the result is located along the lines of the previous image.
The invention aims to solve another technical problem of providing a device for detecting character strokes in an image. In some optional embodiments, the apparatus for detecting strokes of characters in an image includes a receiving unit for receiving the image, and further includes: the first unit is used for calculating a light stroke response value and a dark stroke response value of each pixel point in the image; a second unit for respectively processing the response value of the bright stroke and the response value of the dark stroke of each pixel point to obtain a bright stroke graph and a dark stroke graph; and a third unit for combining the light stroke graph and the dark stroke graph to obtain a combined stroke graph and the distribution of strokes.
Another object of the present invention is to provide a device for locating lines of text in an image. In some optional embodiments, the apparatus for locating a text line in an image includes a receiving unit for receiving an image, and further includes: the first unit is used for calculating a light stroke response value and a dark stroke response value of each pixel point in the image; a second unit for respectively processing the response value of the bright stroke and the response value of the dark stroke of each pixel point to obtain a bright stroke graph and a dark stroke graph; calculating to obtain a stroke density graph and a fourth unit of a character distribution area by using the bright stroke graph and the dark stroke graph; a fifth unit for projecting each character distribution area in two ways in the highlight drawing; a sixth unit dividing each character distribution area into at least one character line; and a seventh unit for determining upper and lower boundaries of each character line.
When the character line is positioned, the stroke density graph is formed by using the region aggregation algorithm, and the character region coarse positioning method has a good effect on coarse positioning of the character region. The stroke information of the bright stroke graph is utilized to carry out double projection, so that the upper boundary and the lower boundary of each character line can be accurately positioned.
Another technical problem to be solved by the present invention is to provide an apparatus for determining caption repetition. In some optional embodiments, the apparatus includes a receiving unit for receiving the image, a storage unit, and a positioning unit for positioning the text line of the image, and further includes an eighth unit for saving the text line position, the image content, and the stroke distribution map of the previous image to the storage unit after the text line of the previous image is positioned; before the character line of the current image is positioned, judging whether the distance between the character lines of the current image and the previous image is greater than a fifth threshold value by using the information stored in the storage unit; if the current image is larger than the preset image, starting a positioning unit to position the character line of the current image; otherwise, the character line positioning result of the previous image stored by the storage unit is used.
It can be seen that by retaining the stroke information of the current image, and comparing whether the subtitles detected by the adjacent frames are the same or not by using the stroke difference of the adjacent frames before the character detection of the next image, a large amount of repeated subtitles can be eliminated, the repeated detection is reduced, and the character detection efficiency is further improved.
Drawings
FIG. 1 is a flow chart of a method for detecting strokes of characters in an image according to the present invention;
FIG. 2 is a schematic diagram of a stroke filter;
FIG. 3 is a flow chart of a method of locating lines of text in an image provided by the present invention;
FIG. 4 is a schematic diagram of an apparatus for detecting strokes of characters in an image according to the present invention;
FIG. 5 is a schematic diagram of an apparatus for locating lines of text in an image provided by the present invention;
fig. 6 is a schematic diagram of an apparatus for determining caption overlap according to the present invention.
Detailed Description
FIG. 1 illustrates an alternative method of detecting strokes of a word.
Step 11, receiving an image.
And step 12, calculating the response value of the light stroke and the response value of the dark stroke of each pixel point in the image.
And step 13, respectively processing the response value of the bright strokes and the response value of the dark strokes of each pixel point to obtain a bright stroke graph and a dark stroke graph.
And 14, combining the bright stroke graph and the dark stroke graph to obtain a combined stroke graph and the distribution of strokes.
Characters are composed of strokes, and a character line is composed of a plurality of strokes according to a certain rule. The stroke is represented by a line structure which has a certain direction, width and length, and the color of a pixel on the stroke has a greater contrast with the color of a non-stroke pixel in the neighborhood. The stroke line filter can be designed according to the characteristics of strokes. The stroke line filter generally has a plurality of directions and detection dimensions, which are specifically defined as shown in fig. 2. The black dots in fig. 2 represent the pixel points in the center of the filter, i.e., the pixel points being processed. The three strip-shaped areas (1), (2) and (3) are arranged in parallel, and the length, the width and the direction of the three strip-shaped areas are the same. The horizontal included angle alpha of the strip-shaped area can take a plurality of values, the detection scale of the stroke line filter is determined by the distance d of the strip-shaped area, and the length l of the strip-shaped area determines the minimum length of the stroke. Stroke filters of different detection scales can detect character strokes of different widths.
Strokes with a brightness above the background are defined herein as light strokes and strokes with a brightness below the background are defined herein as dark strokes. Under the set detection scale s, an alternative way is to calculate the response value of the bright stroke of each pixel point in the whole gray image according to the formula (1.1)
Figure B2009100780074D0000041
Calculating the response value of dark strokes of each pixel point in the whole gray level image according to the formula (1.2)
<math><mrow><msubsup><mi>R</mi><mrow><mi>&alpha;</mi><mo>,</mo><mi>s</mi></mrow><mi>B</mi></msubsup><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>=</mo><mrow><mo>(</mo><msub><mi>u</mi><mn>1</mn></msub><mo>-</mo><msub><mi>u</mi><mn>2</mn></msub><mo>)</mo></mrow><mo>+</mo><mrow><mo>(</mo><msub><mi>u</mi><mn>1</mn></msub><mo>-</mo><msub><mi>u</mi><mn>3</mn></msub><mo>)</mo></mrow><mo>-</mo><mo>|</mo><msub><mi>u</mi><mn>2</mn></msub><mo>-</mo><msub><mi>u</mi><mn>3</mn></msub><mo>|</mo><mo>-</mo><mfrac><mn>1</mn><mi>d</mi></mfrac><munderover><mi>&Sigma;</mi><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mn>3</mn></munderover><msubsup><mi>&sigma;</mi><mi>i</mi><mn>2</mn></msubsup><mo>-</mo><mo>-</mo><mo>-</mo><mrow><mo>(</mo><mn>1.1</mn><mo>)</mo></mrow></mrow></math>
<math><mrow><msubsup><mi>R</mi><mrow><mi>&alpha;</mi><mo>,</mo><mi>s</mi></mrow><mi>D</mi></msubsup><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>=</mo><mrow><mo>(</mo><msub><mi>u</mi><mn>2</mn></msub><mo>-</mo><msub><mi>u</mi><mn>1</mn></msub><mo>)</mo></mrow><mo>+</mo><mrow><mo>(</mo><msub><mi>u</mi><mn>3</mn></msub><mo>-</mo><msub><mi>u</mi><mn>1</mn></msub><mo>)</mo></mrow><mo>-</mo><mo>|</mo><msub><mi>u</mi><mn>2</mn></msub><mo>-</mo><msub><mi>u</mi><mn>3</mn></msub><mo>|</mo><mo>-</mo><mfrac><mn>1</mn><mi>d</mi></mfrac><munderover><mi>&Sigma;</mi><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mn>3</mn></munderover><msubsup><mi>&sigma;</mi><mi>i</mi><mn>2</mn></msubsup><mo>-</mo><mo>-</mo><mo>-</mo><mrow><mo>(</mo><mn>1</mn><mo>.</mo><mn>2</mn><mo>)</mo></mrow></mrow></math>
In the formulae (1.1) and (1.2), u1、u2、u3Respectively representing the average brightness values of the strip-shaped areas (1), (2) and (3),
Figure B2009100780074D0000053
the luminance variances of the stripe regions (1), (2), and (3) are shown, respectively.
An alternative way of taking the transverse angle alpha of the strip-shaped area is 0,
Figure B2009100780074D0000054
Figure B2009100780074D0000055
And
Figure B2009100780074D0000056
these 4 directions. In this case, there are 4 response values of the bright strokes of each pixel point, which are respectively
Figure B2009100780074D0000057
Figure B2009100780074D0000058
Figure B2009100780074D0000059
And
Figure B2009100780074D00000510
the response value of dark strokes of each pixel point is also 4, and is respectively
Figure B2009100780074D00000511
Figure B2009100780074D00000512
Figure B2009100780074D00000513
And
Figure B2009100780074D00000514
after obtaining all the response values of the bright strokes and the response values of the dark strokes of each pixel point:
firstly, get eachMaximum response value of bright stroke of each pixel point
Figure B2009100780074D00000515
And maximum dark stroke response value
Figure B2009100780074D00000516
Then, the response value of each pixel point vertical to the maximum bright stroke is taken
Figure B2009100780074D00000517
Response value of the bright stroke in the direction
Figure B2009100780074D00000518
And perpendicular to the maximum dark stroke response value
Figure B2009100780074D00000519
Dark stroke response value of direction
Figure B2009100780074D00000520
For a certain pixel point, if
Figure B2009100780074D00000521
Is composed of
Figure B2009100780074D00000522
ThenIs composed of
Figure B2009100780074D00000524
And vice versa; if it is not
Figure B2009100780074D00000525
Is composed of
Figure B2009100780074D00000526
Then
Figure B2009100780074D00000527
Is composed of
Figure B2009100780074D00000528
And vice versa.
For a certain pixel point, if
Figure B2009100780074D00000529
Is composed of
Figure B2009100780074D00000530
Then
Figure B2009100780074D00000531
Is composed ofAnd vice versa; if it is notIs composed of
Figure B2009100780074D00000534
Then
Figure B2009100780074D00000535
Is composed of
Figure B2009100780074D00000536
And vice versa.
And finally, calculating the bright stroke value of each pixel point according to the formula (1.3) and the formula (1.4) respectively
Figure B2009100780074D0000061
And dark stroke value
Figure B2009100780074D0000062
And gets a light stroke map (BSM) and a Dark Stroke Map (DSM).
<math><mrow><msubsup><mi>R</mi><mi>s</mi><mi>B</mi></msubsup><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>=</mo><mo>[</mo><msubsup><mi>R</mi><mrow><mi>max</mi><mi>&alpha;</mi><mo>,</mo><mi>s</mi></mrow><mi>B</mi></msubsup><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>+</mo><msubsup><mi>R</mi><mrow><mi>max</mi><mi>&alpha;</mi><mo>&perp;</mo><mo>,</mo><mi>s</mi></mrow><mi>B</mi></msubsup><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>]</mo><mo>/</mo><mn>2</mn><mo>-</mo><mo>-</mo><mo>-</mo><mrow><mo>(</mo><mn>1.3</mn><mo>)</mo></mrow></mrow></math>
<math><mrow><msubsup><mi>R</mi><mi>s</mi><mi>D</mi></msubsup><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>=</mo><mo>[</mo><msubsup><mi>R</mi><mrow><mi>max</mi><mi>&alpha;</mi><mo>,</mo><mi>s</mi></mrow><mi>D</mi></msubsup><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>+</mo><msubsup><mi>R</mi><mrow><mi>max</mi><mi>&alpha;</mi><mo>&perp;</mo><mo>,</mo><mi>s</mi></mrow><mi>D</mi></msubsup><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>]</mo><mo>/</mo><mn>2</mn><mo>-</mo><mo>-</mo><mo>-</mo><mrow><mo>(</mo><mn>1</mn><mo>.</mo><mn>4</mn><mo>)</mo></mrow></mrow></math>
By the light stroke value of all pixel points
Figure B2009100780074D0000065
The formed image is a bright stroke image BSM consisting of dark strokes of all pixel pointsValue of
Figure B2009100780074D0000066
The image formed is a dark stroke DSM.
On the bright stroke image BSM, bright strokes are detected to be prominent, and meanwhile, a part of background pixels among dark strokes are also detected to be prominent; on the dark stroke pattern DSM, the dark strokes are detected to be highlighted, while a portion of the background pixels between the light strokes are also detected to be highlighted.
And after obtaining the bright stroke map BSM and the dark stroke map DSM, calculating the joint stroke value of each pixel point according to the formula (1.5) and obtaining a joint stroke map (USM).
R s U ( x , y ) = max { R s B ( x , y ) , R s D ( x , y ) } - - - ( 1.5 )
By joint stroke values of all pixel pointsThe constructed image is a joint stroke USM. The united stroke map USM integrates the results of the bright stroke BSM and the dark stroke map DSM, and on the united stroke map USM, the stroke pixels and part of the background pixels in the neighborhood thereof are detected to be highlighted, thus highlighting the pixels of the region where the text appears as a whole, thereby detecting the distribution of the strokes.
When the embodiment is adopted to detect the character strokes in the image, the influences of the consistency of the line brightness of the strokes and the angular points and the cross points of the strokes are fully considered, so that the stroke detection effect is greatly improved.
Based on the method for detecting the strokes of the characters, the invention also provides a method for positioning the character lines in the image. FIG. 3 illustrates an alternative embodiment for locating lines of text in an image.
Step 31, receiving an image.
And step 32, calculating to obtain a light stroke graph and a dark stroke graph.
And step 33, calculating by using the light stroke graph and the dark stroke graph to obtain a stroke density graph and a character distribution area.
And step 34, projecting each character distribution area in a highlight drawing by two modes.
And 35, dividing each character distribution area into at least one character line.
Step 36, determine the upper and lower boundaries of each line of text.
After the method described in the foregoing embodiment is adopted and the light stroke map BSM, the dark stroke map DSM, and the combined stroke map USM are obtained through calculation, the stroke density map (USM) is obtained through calculation.
In the stroke density calculation step, a stroke density map SDM can be calculated according to the combined stroke map USM. There are many related algorithms, and one alternative is to use existing morphology-based region aggregation algorithms. Another alternative is to aggregate existing density-based regions.
The principle of one of the proposed density-based regional polymerization algorithms is shown in equation (1.6).
<math><mrow><mi>Dens</mi><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>=</mo><mfrac><mn>1</mn><mrow><mrow><mo>(</mo><mn>2</mn><mi>w</mi><mo>+</mo><mn>1</mn><mo>)</mo></mrow><mrow><mo>(</mo><mn>2</mn><mi>h</mi><mo>+</mo><mn>1</mn><mo>)</mo></mrow></mrow></mfrac><munderover><mi>&Sigma;</mi><mrow><mi>n</mi><mo>=</mo><mo>-</mo><mi>h</mi></mrow><mi>h</mi></munderover><munderover><mi>&Sigma;</mi><mrow><mi>m</mi><mo>=</mo><mo>-</mo><mi>w</mi></mrow><mi>w</mi></munderover><msubsup><mi>R</mi><mi>s</mi><mi>U</mi></msubsup><mrow><mo>(</mo><mi>x</mi><mo>+</mo><mi>m</mi><mo>,</mo><mi>y</mi><mo>+</mo><mi>n</mi><mo>)</mo></mrow><mo>-</mo><mo>-</mo><mo>-</mo><mrow><mo>(</mo><mn>1.6</mn><mo>)</mo></mrow></mrow></math>
Wherein Dens (x, y) represents the stroke density in the neighborhood region centered on the pixel point (x, y), the area of the neighborhood region is (2w +1) × (2h +1),
Figure B2009100780074D0000072
representing the value of a pixel point (x, y) on the joint stroke map USM.
The density calculation is carried out on each pixel point by the density-based regional aggregation algorithm, so that the regional aggregation effect is relatively accurate, and the robustness to low-density noise interference is good.
After obtaining the stroke density map SDM, candidate regions of text distribution need to be determined. The specific processing flow comprises the following steps:
the highlight map BSM is binarized using a set threshold value.
And carrying out OR operation on the binarized bright stroke graph and the stroke density graph SDM to obtain a new stroke density graph.
On the new stroke density graph, white pixels are connected into a plurality of areas, and the connected areas are candidate areas of character distribution.
The step of performing or operation on the binarized bright stroke map and the stroke density map SDM means that corresponding pixel points in the binarized bright stroke map and the stroke density map SDM are subjected to or operation.
After candidate areas of the character distribution are obtained, in an original bright stroke map BSM without binarization, double projection is carried out in each candidate area:
firstly, projecting the brightness value of a pixel point in each candidate region to obtain a brightness histogram;
and then accumulating the times of changing each row of pixel points in each candidate region from zero to non-zero to obtain an intersection point histogram. The intersection histogram is called because it is equivalent to traversing a candidate region with a horizontal straight line, and counting the number of intersections of the straight line and the character strokes.
After the luminance histogram and the intersection histogram are obtained, the segmentation points are counted on the two histograms. If the value of a point on the luminance histogram is less than the first empirical threshold and the value on the intersection histogram is less than the second empirical threshold, then the point is marked as a split point. The candidate region is horizontally divided along the dividing points, i.e. a plurality of candidate character lines are formed.
For each candidate character line, finding the maximum value of the horizontal brightness histogram, respectively finding boundary points from the maximum value to the upper direction and the lower direction, and stopping finding under the conditions that: the value of the point on the luminance histogram is less than a third threshold, or the value on the intersection histogram is less than a fourth empirical threshold. And horizontally cutting along the boundary point to form the upper and lower boundaries of the character line.
After each line is located, two verifications, shape-based and stroke-based, may be used, as the detected lines may be misdetected. Among the shape-based decision rules are: the size, height and width of the character area, the aspect ratio of the character area and the position of the character area. The stroke-based judgment rules include: the density of the stroke pixels, the proportions of the various directions of the stroke pixels, the length of the stroke, etc. Since the rules are basically heuristic rules defined by themselves, they are not described in detail here.
When the embodiment is adopted to position the character lines in the image, the advantages of stroke density and double projection are combined, the character lines can be positioned more accurately, and the noise is better in anti-interference performance. The double projection positioning method optimizes the result of character line positioning by using the stroke distribution characteristics on the basis of region aggregation, so that the boundary of the character line is more accurate.
In addition, the text in the video typically stays for several seconds, and the same text line is detected in the images of consecutive frames. If stroke detection and character line positioning are carried out on multi-frame images with the same character lines, resources and processing time are consumed meaninglessly. Therefore, verification can be carried out before stroke detection and character line positioning, whether the character line of the current frame image is the same as the character line of the previous frame image or not is judged, if yes, the current frame image is skipped, and repeated stroke detection and character line positioning are not carried out. The specific verification steps include:
1) for the image of the ith frame, detecting character strokes in the image, positioning character lines in the image, and storing relevant information of the frame i, wherein the method comprises the following steps: color image (RGB values of pixel points on image are
Figure B2009100780074D0000091
) A stroke distribution diagram (an alternative way is to use a bright stroke diagram as the stroke distribution diagram, and the pixel point value on the image is
Figure B2009100780074D0000092
) And the position of the candidate region of the character distribution (M candidate regions RECT in total)1To RECTm)。
Wherein the value of each pixel on the stroke profile represents a probability that the pixel is a text stroke. The calculation method of the stroke distribution diagram can be determined according to actual conditions. In the example, assuming that the brightness of the character strokes is larger than the background, a bright stroke graph is used as the stroke distribution graph, and the process of normalizing the probability value to be in the range of 0-1 is omitted.
2) For the (i +1) th frame image,calculating the color distance between the pixel points corresponding to the adjacent frame images according to the formula (1.7); calculating candidate region RECT according to equation (1.8)mThe distance between adjacent frames; calculating the total text line distance of the adjacent frames according to the formula (1.9), wherein size (RECT)m) Representing a candidate region RECTmThe area of (a).
<math><mrow><msub><mi>CortDist</mi><mrow><mi>i</mi><mo>+</mo><mn>1</mn></mrow></msub><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>=</mo><msup><mrow><mo>{</mo><mfrac><mn>1</mn><mn>3</mn></mfrac><munder><mi>&Sigma;</mi><mrow><mi>cor</mi><mo>=</mo><mi>R</mi><mo>,</mo><mi>G</mi><mo>,</mo><mi>B</mi></mrow></munder><msup><mrow><mo>[</mo><msubsup><mi>P</mi><mrow><mi>i</mi><mo>+</mo><mn>1</mn></mrow><mi>cor</mi></msubsup><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>-</mo><msubsup><mi>P</mi><mi>i</mi><mi>cor</mi></msubsup><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>]</mo></mrow><mn>2</mn></msup><mo>}</mo></mrow><mfrac><mn>1</mn><mn>2</mn></mfrac></msup><mo>-</mo><mo>-</mo><mo>-</mo><mrow><mo>(</mo><mn>1.7</mn><mo>)</mo></mrow></mrow></math>
<math><mrow><msub><mi>RectDist</mi><mrow><mi>i</mi><mo>+</mo><mn>1</mn></mrow></msub><mrow><mo>(</mo><mi>m</mi><mo>)</mo></mrow><mo>=</mo><mo>[</mo><munder><mi>&Sigma;</mi><mrow><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>&Element;</mo><msub><mi>RECT</mi><mi>m</mi></msub></mrow></munder><msubsup><mi>R</mi><mi>i</mi><mi>B</mi></msubsup><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>&times;</mo><msub><mi>CorDist</mi><mrow><mi>i</mi><mo>+</mo><mn>1</mn></mrow></msub><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>]</mo><mo>/</mo><munder><mi>&Sigma;</mi><mrow><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>&Element;</mo><msub><mi>RECT</mi><mi>m</mi></msub></mrow></munder><msubsup><mi>R</mi><mrow><mi>i</mi><mo>+</mo><mn>1</mn></mrow><mi>B</mi></msubsup><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>-</mo><mo>-</mo><mo>-</mo><mrow><mo>(</mo><mn>1.8</mn><mo>)</mo></mrow></mrow></math>
<math><mrow><msub><mi>FrameDist</mi><mrow><mi>i</mi><mo>+</mo><mn>1</mn></mrow></msub><mo>=</mo><mo>[</mo><munderover><mi>&Sigma;</mi><mrow><mi>m</mi><mo>=</mo><mn>1</mn></mrow><mi>M</mi></munderover><mi>size</mi><mrow><mo>(</mo><msub><mi>RECT</mi><mi>m</mi></msub><mo>)</mo></mrow><mo>&times;</mo><msub><mi>RectDist</mi><mrow><mi>i</mi><mo>+</mo><mn>1</mn></mrow></msub><mrow><mo>(</mo><mi>m</mi><mo>)</mo></mrow><mo>]</mo><mo>/</mo><munderover><mi>&Sigma;</mi><mrow><mi>m</mi><mo>=</mo><mn>1</mn></mrow><mi>M</mi></munderover><mi>size</mi><mrow><mo>(</mo><msub><mi>RECT</mi><mi>m</mi></msub><mo>)</mo></mrow><mo>-</mo><mo>-</mo><mo>-</mo><mrow><mo>(</mo><mn>1.9</mn><mo>)</mo></mrow></mrow></math>
Equation (1.7) is a method for calculating the color distance, and other distances may be used instead of the color distance, and here, only the RGB color distance is taken as an example for convenience of description.
3) Using an empirical threshold w for literal line distance frameDisti+1Make a judgment if FrameDisti+1If the difference is smaller than the experience threshold value w, the difference between the lines of the i +1 th frame image and the i th frame image is very small, and the lines of the i +1 th frame image and the i th frame image are the same line of the text, repeated detection is not needed, and the frame can be skipped directly; if FrameDisti+1If the value is larger than the threshold value, the difference between the character lines of the i +1 th frame and the ith frame is larger, and the character lines of the i +1 th frame are different from the ith frame, so that stroke detection and character line positioning need to be carried out again.
And after verification, updating corresponding information of the ith frame by using the color image, the stroke distribution diagram and the candidate region position of the character distribution of the (i +1) th frame.
Before character detection, the stroke distribution of adjacent frames is compared, whether the stroke distribution is repeated with the character line of the previous frame is judged, and then whether character detection is carried out again is determined. The method is compared and judged before detection, so that the detection process of repeated character lines is avoided, and much detection time is saved for the characters of the video character lines; the algorithm fully considers the action of the stroke pixels, has good anti-interference performance on the background pixels, and has better judgment effect on whether the character lines are repeated.
It can be seen that the method for determining whether to repeat before locating the text line is not limited to the embodiments provided by the present invention, and the method can be applied to other methods for locating the text line. No matter which method is used for positioning the character row in the image, the character stroke information can be utilized to judge whether the distance between the character row of the current image and the character row of the previous image is larger than an experience threshold value w or not before the character row positioning operation is carried out on the current image; if so, positioning the character line of the current image; otherwise, the result is located along the lines of the previous image.
FIG. 4 shows an apparatus 400 for detecting strokes of characters in an image, the apparatus 400 including a receiving unit S40, a first unit S41, a second unit S42 and a third unit S43
The receiving unit S40 is for receiving an image.
The first unit S41 is configured to calculate a light stroke response value and a dark stroke response value of each pixel point in the image; the second unit S42 is configured to process the light stroke response value and the dark stroke response value of each pixel point, respectively, to obtain a light stroke graph and a dark stroke graph; the third unit S43 is configured to merge the light stroke graph and the dark stroke graph to obtain a combined stroke graph and a distribution of strokes.
The processing of the first unit S41, the second unit S42 and the third unit S43 is described above and will not be described herein.
FIG. 5 shows an apparatus 500 for locating lines of text in an image, the apparatus 500 comprising a receiving unit S40, a first unit S41, a second unit S42, a fourth unit S54, a fifth unit S55, a sixth unit S56, and a seventh unit S57.
The fourth unit S54 calculates and obtains a stroke density map and a character distribution area using the light stroke map and the dark stroke map; a fifth unit S55 projects each character distribution region in the highlight map in two ways; the sixth unit S56 divides each letter distribution area into at least one letter row; the seventh unit S57 is for determining the upper and lower boundaries of each text line.
The processing of the fourth unit S54, the fifth unit S55, the sixth unit S56 and the seventh unit S57 is described above and will not be described herein.
The verification can be carried out before stroke detection and character line positioning, whether the character line of the current frame image is the same as the character line of the previous frame image or not is judged, if yes, the processing of the current frame image is skipped, and repeated stroke detection and character line positioning are not carried out. In this case, an eighth unit may be further added to the apparatus 400 or the apparatus 500.
The eighth unit is used for judging whether the distance between the text lines of the current image and the previous image is greater than a fifth threshold value; if so, the first element S41 is activated to locate the text line of the current image. Otherwise, the seventh unit S57 is started to output the character line positioning result of the previous image; alternatively, the third unit S43 is activated to output the stroke detection result of the previous image.
The processing procedure of the eighth unit is described above, and is not described herein.
Fig. 6 shows an apparatus 600 for judging duplication of subtitles, the apparatus 600 including a receiving unit S40, a locating unit S61, a storing unit S63 and an eighth unit S62.
The positioning unit S61 is used to position the text line of the image. An eighth unit for saving the character line position, the image content, and the stroke distribution map of the previous image to the storage unit S63 after the character line of the previous image is positioned; before the text line of the current image is positioned, whether the text line distance between the current image and the previous image is greater than a fifth threshold is judged by using the information stored in the storage unit S63; if yes, the positioning unit S61 is started to position the character line of the current image; otherwise, the result is located along the text line of the previous image held by the storage unit S63.
Those of skill in the art will understand that the various exemplary method steps and apparatus elements described in connection with the embodiments disclosed herein can be implemented as electronic hardware, software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative steps and elements have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The various illustrative elements described in connection with the embodiments disclosed herein may be implemented or performed with a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.
The steps of a method described in connection with the embodiments disclosed above may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a subscriber station. In the alternative, the processor and the storage medium may reside as discrete components in a subscriber station.
The disclosed embodiments are provided to enable those skilled in the art to make or use the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the scope or spirit of the invention. The above-described embodiments are merely preferred embodiments of the present invention, which should not be construed as limiting the invention, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (36)

1. A method for detecting strokes of characters in an image, comprising:
receiving an image;
calculating a response value of a bright stroke and a response value of a dark stroke of each pixel point in the image;
respectively processing the response value of the bright stroke and the response value of the dark stroke of each pixel point to obtain a bright stroke graph and a dark stroke graph;
and combining the bright stroke graph and the dark stroke graph to obtain a combined stroke graph and the distribution of strokes.
2. The method of claim 1, wherein for each pixel point:
with the pixel point as the center, 3 parallel strip-shaped areas with equal length and equal width are respectively arranged in a plurality of directions;
respectively calculating the brightness mean value and the brightness variance of each strip-shaped area in each direction;
and calculating the response value of the bright stroke and the response value of the dark stroke of the pixel point in each direction by using the brightness mean value and the brightness variance of each strip-shaped area in each direction.
3. The method of claim 2, wherein for each pixel point, the largest response value of the highlighting strokes in its respective direction is taken
Figure F2009100780074C0000011
And a response value perpendicular to the maximum light strokeThe response value of the bright stroke in the direction
Figure F2009100780074C0000013
Calculating to obtain the bright stroke value of each pixel point
Figure F2009100780074C0000014
By the light stroke value of all pixel points
Figure F2009100780074C0000015
The constructed image is a highlighted drawing.
4. The method of claim 3, wherein for each pixel point, the largest dark stroke response value in each direction is takenAnd a response value perpendicular to the largest dark stroke
Figure F2009100780074C0000017
Dark stroke response value of direction in which
Figure F2009100780074C0000018
Calculating to obtain the dark stroke value of each pixel point
Figure F2009100780074C0000019
From the dark stroke values of all pixel points
Figure F2009100780074C00000110
The constructed image is a dark stroke image.
5. The method of claim 4, wherein the same pixel (x, y) in the light stroke map and the dark stroke map is assigned its light stroke value
Figure F2009100780074C00000111
And dark stroke value
Figure F2009100780074C00000112
The larger one of the two is used as the combined stroke value of the pixel point
Figure F2009100780074C00000113
By joint stroke values of all pixel points
Figure F2009100780074C00000114
The formed image is a combined stroke picture; the distribution of strokes is characterized by the salient text pixels in the joint stroke graph.
6. The method of claim 2, wherein the pixel point is centered at a transverse included angle of 0,
Figure F2009100780074C0000021
Figure F2009100780074C0000022
And
Figure F2009100780074C0000023
in the 4 directions, 3 parallel strip-shaped areas with equal length and equal width are respectively arranged.
7. The method of claim 6 wherein the bright stroke response value for a pixel having coordinates (x, y) in the direction of lateral angle α is calculated according to the following formula
Figure F2009100780074C0000024
And dark stroke response value
Figure F2009100780074C0000025
<math><mrow><msubsup><mi>R</mi><mrow><mi>&alpha;</mi><mo>,</mo><mi>s</mi></mrow><mi>B</mi></msubsup><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>=</mo><mrow><mo>(</mo><msub><mi>u</mi><mn>1</mn></msub><mo>-</mo><msub><mi>u</mi><mn>2</mn></msub><mo>)</mo></mrow><mo>+</mo><mrow><mo>(</mo><msub><mi>u</mi><mn>1</mn></msub><mo>-</mo><msub><mi>u</mi><mn>3</mn></msub><mo>)</mo></mrow><mo>-</mo><mo>|</mo><msub><mi>u</mi><mn>2</mn></msub><mo>-</mo><msub><mi>u</mi><mn>3</mn></msub><mo>|</mo><mo>-</mo><mfrac><mn>1</mn><mi>d</mi></mfrac><munderover><mi>&Sigma;</mi><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mn>3</mn></munderover><msubsup><mi>&sigma;</mi><mi>i</mi><mn>2</mn></msubsup><mo>;</mo></mrow></math>
<math><mrow><msubsup><mi>R</mi><mrow><mi>&alpha;</mi><mo>,</mo><mi>s</mi></mrow><mi>D</mi></msubsup><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>=</mo><mrow><mo>(</mo><msub><mi>u</mi><mn>2</mn></msub><mo>-</mo><msub><mi>u</mi><mn>1</mn></msub><mo>)</mo></mrow><mo>+</mo><mrow><mo>(</mo><msub><mi>u</mi><mn>3</mn></msub><mo>-</mo><msub><mi>u</mi><mn>1</mn></msub><mo>)</mo></mrow><mo>-</mo><mo>|</mo><msub><mi>u</mi><mn>2</mn></msub><mo>-</mo><msub><mi>u</mi><mn>3</mn></msub><mo>|</mo><mo>-</mo><mfrac><mn>1</mn><mi>d</mi></mfrac><munderover><mi>&Sigma;</mi><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mn>3</mn></munderover><msubsup><mi>&sigma;</mi><mi>i</mi><mn>2</mn></msubsup><mo>;</mo></mrow></math>
s is the configured detection scale, d is the interval between each strip-shaped area;
u1、u2、u3respectively representing the average of the luminance of 3 strip-shaped regions in the direction of the transverse angle alpha,
Figure F2009100780074C0000028
denotes the variance of the luminance of each strip in the direction of the transverse angle α, where u1The luminance average of the strip region where the pixel point (x, y) is located.
8. The method of claim 3, wherein the computing is performed
Figure F2009100780074C0000029
Andobtaining the mean value of
Figure F2009100780074C00000211
9. The method of claim 4, wherein the computing is performed
Figure F2009100780074C00000212
And
Figure F2009100780074C00000213
obtaining the mean value of
Figure F2009100780074C00000214
10. A method for locating lines of text in an image, comprising:
receiving an image;
calculating to obtain a light stroke image and a dark stroke image of the image;
calculating to obtain a stroke density graph and a character distribution area by using the bright stroke graph and the dark stroke graph;
projecting each character distribution area in the highlight drawing by two modes;
dividing each character distribution area into at least one character line;
the upper and lower boundaries of each line of text are determined.
11. The method according to claim 10, characterized in that binarization processing is performed on the bright stroke map, and an or operation is performed on the bright stroke map after binarization processing and the stroke density map;
and taking the area formed by connecting white pixel points in the new stroke density graph as a character distribution area.
12. The method of claim 10, wherein the luminance value of the pixel on the light stroke map is projected in the horizontal direction in each text distribution area to obtain a luminance histogram of each text distribution area;
and accumulating the times of changing each line of pixels from zero to non-zero on the highlight drawing in each character distribution area to obtain an intersection point histogram of each character distribution area.
13. The method of claim 12, wherein the text distribution area is divided horizontally at each text distribution area along the searched dividing points to form a plurality of text lines;
wherein the division point satisfies the following condition:
the value of the point on the luminance histogram is less than a first threshold and the value on the intersection histogram is less than a second threshold.
14. The method of claim 13, wherein for each text line, the boundary points of the text line are searched from the maximum value of the luminance histogram in the upper and lower directions, and are horizontally divided along the boundary points to form the upper and lower boundaries of the text line;
wherein the boundary points satisfy the following conditions:
the value of the point on the luminance histogram is less than a third threshold, or the value on the intersection histogram is less than a fourth threshold.
15. A method for judging subtitle repetition is characterized by comprising the following steps:
after the character line of the previous image is positioned, the character line position, the image content and the stroke distribution map of the previous image are saved;
before the character line of the current image is positioned, whether the distance between the character lines of the current image and the character line of the previous image is larger than a fifth threshold value is judged by utilizing the stored information; if so, positioning the character line of the current image; otherwise, the result is located along the lines of the previous image.
16. The method of claim 15, wherein the method is performed by computing
<math><mrow><msub><mi>FrameDist</mi><mrow><mi>i</mi><mo>+</mo><mn>1</mn></mrow></msub><mo>=</mo><mo>[</mo><munderover><mi>&Sigma;</mi><mrow><mi>m</mi><mo>=</mo><mn>1</mn></mrow><mi>M</mi></munderover><mi>size</mi><mrow><mo>(</mo><msub><mi>RECT</mi><mi>m</mi></msub><mo>)</mo></mrow><mo>&times;</mo><msub><mi>RectDist</mi><mrow><mi>i</mi><mo>+</mo><mn>1</mn></mrow></msub><mrow><mo>(</mo><mi>m</mi><mo>)</mo></mrow><mo>]</mo><mo>/</mo><munderover><mi>&Sigma;</mi><mrow><mi>m</mi><mo>=</mo><mn>1</mn></mrow><mi>M</mi></munderover><mi>size</mi><mrow><mo>(</mo><msub><mi>RECT</mi><mi>m</mi></msub><mo>)</mo></mrow><mo>,</mo></mrow></math> Obtaining the line distance FrameDist between the current image and the previous imagei+1
Wherein, size (RECT)m) Representing a text region RECTMArea of (1), RectDisti+1(M) represents a distance between the text distribution areas of the current image and the previous image, and M represents the total number of text distribution areas.
17. The method of claim 16, wherein the computing is performed by
<math><mrow><msub><mi>RectDist</mi><mrow><mi>i</mi><mo>+</mo><mn>1</mn></mrow></msub><mrow><mo>(</mo><mi>m</mi><mo>)</mo></mrow><mo>=</mo><mo>[</mo><munder><mi>&Sigma;</mi><mrow><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>&Element;</mo><msub><mi>RECT</mi><mi>m</mi></msub></mrow></munder><msubsup><mi>R</mi><mi>i</mi><mi>B</mi></msubsup><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>&times;</mo><msub><mi>CorDist</mi><mrow><mi>i</mi><mo>+</mo><mn>1</mn></mrow></msub><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>]</mo><mo>/</mo><munder><mi>&Sigma;</mi><mrow><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>&Element;</mo><msub><mi>RECT</mi><mi>m</mi></msub></mrow></munder><msubsup><mi>R</mi><mrow><mi>i</mi><mo>+</mo><mn>1</mn></mrow><mi>B</mi></msubsup><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>,</mo><mo></mo></mrow></math> Obtaining the distance between the text distribution areas of the current image and the previous image;
wherein,
Figure F2009100780074C0000043
representing the value of a pixel (x, y) on the stroke profile of the previous image, CortDisti+1And (x, y) represents the color distance between corresponding pixel points of the current image and the previous image.
18. The method of claim 17,
<math><mrow><msub><mi>CortDist</mi><mrow><mi>i</mi><mo>+</mo><mn>1</mn></mrow></msub><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>=</mo><msup><mrow><mo>{</mo><mfrac><mn>1</mn><mn>3</mn></mfrac><munder><mi>&Sigma;</mi><mrow><mi>cor</mi><mo>=</mo><mi>R</mi><mo>,</mo><mi>G</mi><mo>,</mo><mi>B</mi></mrow></munder><msup><mrow><mo>[</mo><msubsup><mi>P</mi><mrow><mi>i</mi><mo>+</mo><mn>1</mn></mrow><mi>cor</mi></msubsup><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>-</mo><msubsup><mi>P</mi><mi>i</mi><mi>cor</mi></msubsup><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>]</mo></mrow><mn>2</mn></msup><mo>}</mo></mrow><mfrac><mn>1</mn><mn>2</mn></mfrac></msup></mrow></math>
wherein,
Figure F2009100780074C0000045
and
Figure F2009100780074C0000046
and respectively representing RGB color values of corresponding pixel points of the current image and the previous image.
19. An apparatus for detecting strokes of characters in an image, comprising a receiving unit for receiving the image, characterized by further comprising:
the first unit is used for calculating a light stroke response value and a dark stroke response value of each pixel point in the image;
a second unit for respectively processing the response value of the bright stroke and the response value of the dark stroke of each pixel point to obtain a bright stroke graph and a dark stroke graph; and,
and combining the bright stroke graph and the dark stroke graph to obtain a third unit of the distribution of the combined stroke graph and the strokes.
20. The apparatus of claim 19, wherein for each pixel:
the first unit takes the pixel point as a center, and 3 parallel strip-shaped areas with equal length and equal width are respectively arranged in a plurality of directions; respectively calculating the brightness mean value and the brightness variance of each strip-shaped area in each direction; and calculating the response value of the bright stroke and the response value of the dark stroke of the pixel point in each direction by using the brightness mean value and the brightness variance of each strip-shaped area in each direction.
21. The apparatus of claim 20, wherein for each pixel point, the second element takes the largest of its bright stroke response values in each direction
Figure F2009100780074C0000051
And a response value perpendicular to the maximum light stroke
Figure F2009100780074C0000052
The response value of the bright stroke in the direction
Figure F2009100780074C0000053
Calculating to obtain the bright stroke value of each pixel point
By the light stroke value of all pixel pointsThe constructed image is a highlighted drawing.
22. The apparatus of claim 21, wherein for each pixel point, the second element takes the largest dark stroke response value in its respective direction
Figure F2009100780074C0000056
And a response value perpendicular to the largest dark stroke
Figure F2009100780074C0000057
Dark stroke response value of direction in which
Figure F2009100780074C0000058
Calculating to obtain the dark stroke value of each pixel point
Figure F2009100780074C0000059
From the dark stroke values of all pixel points
Figure F2009100780074C00000510
The constructed image is a dark stroke image.
23. The apparatus of claim 22, wherein the third unit takes the light stroke value of the same pixel (x, y) in the light stroke map and the dark stroke map
Figure F2009100780074C00000511
And dark stroke valueThe larger one of the two is used as the combined stroke value of the pixel point
Figure F2009100780074C00000513
By joint stroke values of all pixel points
Figure F2009100780074C00000514
The formed image is a combined stroke picture;
the distribution of strokes is characterized by the salient text pixels in the joint stroke graph.
24. The apparatus of claim 20, wherein the pixel is centered at a transverse included angle of 0,
Figure F2009100780074C00000515
Figure F2009100780074C00000516
And
Figure F2009100780074C00000517
in the 4 directions, 3 parallel strip-shaped areas with equal length and equal width are respectively arranged.
25. The apparatus of claim 24 wherein the bright stroke response value for a pixel having coordinates (x, y) in a direction having a transverse angle α is calculated according to the following formulaAnd dark stroke response value
Figure F2009100780074C00000519
<math><mrow><msubsup><mi>R</mi><mrow><mi>&alpha;</mi><mo>,</mo><mi>s</mi></mrow><mi>B</mi></msubsup><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>=</mo><mrow><mo>(</mo><msub><mi>u</mi><mn>1</mn></msub><mo>-</mo><msub><mi>u</mi><mn>2</mn></msub><mo>)</mo></mrow><mo>+</mo><mrow><mo>(</mo><msub><mi>u</mi><mn>1</mn></msub><mo>-</mo><msub><mi>u</mi><mn>3</mn></msub><mo>)</mo></mrow><mo>-</mo><mo>|</mo><msub><mi>u</mi><mn>2</mn></msub><mo>-</mo><msub><mi>u</mi><mn>3</mn></msub><mo>|</mo><mo>-</mo><mfrac><mn>1</mn><mi>d</mi></mfrac><munderover><mi>&Sigma;</mi><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mn>3</mn></munderover><msubsup><mi>&sigma;</mi><mi>i</mi><mn>2</mn></msubsup><mo>;</mo></mrow></math>
<math><mrow><msubsup><mi>R</mi><mrow><mi>&alpha;</mi><mo>,</mo><mi>s</mi></mrow><mi>D</mi></msubsup><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>=</mo><mrow><mo>(</mo><msub><mi>u</mi><mn>2</mn></msub><mo>-</mo><msub><mi>u</mi><mn>1</mn></msub><mo>)</mo></mrow><mo>+</mo><mrow><mo>(</mo><msub><mi>u</mi><mn>3</mn></msub><mo>-</mo><msub><mi>u</mi><mn>1</mn></msub><mo>)</mo></mrow><mo>-</mo><mo>|</mo><msub><mi>u</mi><mn>2</mn></msub><mo>-</mo><msub><mi>u</mi><mn>3</mn></msub><mo>|</mo><mo>-</mo><mfrac><mn>1</mn><mi>d</mi></mfrac><munderover><mi>&Sigma;</mi><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mn>3</mn></munderover><msubsup><mi>&sigma;</mi><mi>i</mi><mn>2</mn></msubsup><mo>;</mo></mrow></math>
s is the configured detection scale, d is the interval between each strip-shaped area;
u1、u2、u3respectively representing the average of the luminance of 3 strip-shaped regions in the direction of the transverse angle alpha,
Figure F2009100780074C0000061
denotes the variance of the luminance of each strip in the direction of the transverse angle α, where u1Is likeAnd (5) the brightness average value of the strip-shaped area where the pixel point (x, y) is located.
26. The apparatus of claim 21, wherein the computing of the
Figure F2009100780074C0000062
And
Figure F2009100780074C0000063
obtaining the mean value of
Figure F2009100780074C0000064
27. The apparatus of claim 22, wherein the computing of the
Figure F2009100780074C0000065
And
Figure F2009100780074C0000066
obtaining the mean value of
28. An apparatus for locating a line of text in an image, comprising a receiving unit for receiving the image, characterized by:
the first unit is used for calculating a light stroke response value and a dark stroke response value of each pixel point in the image;
a second unit for respectively processing the response value of the bright stroke and the response value of the dark stroke of each pixel point to obtain a bright stroke graph and a dark stroke graph;
calculating to obtain a stroke density graph and a fourth unit of a character distribution area by using the bright stroke graph and the dark stroke graph;
a fifth unit for projecting each character distribution area in two ways in the highlight drawing;
a sixth unit dividing each character distribution area into at least one character line; and,
and a seventh unit for determining upper and lower boundaries of each text line.
29. The apparatus according to claim 28, wherein a fourth unit performs binarization processing on the bright stroke map, and performs an or operation on the binarized bright stroke map and the stroke density map;
and taking the area formed by connecting white pixel points in the new stroke density graph as a character distribution area.
30. The apparatus according to claim 28, wherein the fifth unit performs projection of the luminance value of the pixel on the highlight map in the horizontal direction in each of the character distribution regions to obtain a luminance histogram in each of the character distribution regions; and accumulating the times of changing each line of pixels from zero to non-zero on the highlight drawing in each character distribution area to obtain an intersection point histogram of each character distribution area.
31. The apparatus of claim 30, wherein the sixth means divides the text distribution area horizontally at each text distribution area along the located dividing points to form a plurality of text lines; wherein the division point satisfies the following condition:
the value of the point on the luminance histogram is less than a first threshold and the value on the intersection histogram is less than a second threshold.
32. The apparatus of claim 31, wherein for each text line, the seventh unit searches for boundary points of the text line in two directions, respectively up and down, from a maximum value of the luminance histogram, and divides horizontally along the boundary points to form upper and lower boundaries of the text line; wherein the boundary points satisfy the following conditions:
the value of the point on the luminance histogram is less than a third threshold, or the value on the intersection histogram is less than a fourth threshold.
33. An apparatus for determining duplication of subtitles, comprising a receiving unit for receiving an image, a storage unit, and a positioning unit for positioning a text line of the image, the apparatus comprising:
the eighth unit is used for storing the character line position, the image content and the stroke distribution map of the previous image into the storage unit after the character line of the previous image is positioned; before the character line of the current image is positioned, judging whether the distance between the character lines of the current image and the previous image is greater than a fifth threshold value by using the information stored in the storage unit; if the current image is larger than the preset image, starting a positioning unit to position the character line of the current image; otherwise, the character line positioning result of the previous image stored by the storage unit is used.
34. The apparatus of claim 33, wherein the eighth unit calculates
<math><mrow><msub><mi>FrameDist</mi><mrow><mi>i</mi><mo>+</mo><mn>1</mn></mrow></msub><mo>=</mo><mo>[</mo><munderover><mi>&Sigma;</mi><mrow><mi>m</mi><mo>=</mo><mn>1</mn></mrow><mi>M</mi></munderover><mi>size</mi><mrow><mo>(</mo><msub><mi>RECT</mi><mi>m</mi></msub><mo>)</mo></mrow><mo>&times;</mo><msub><mi>RectDist</mi><mrow><mi>i</mi><mo>+</mo><mn>1</mn></mrow></msub><mrow><mo>(</mo><mi>m</mi><mo>)</mo></mrow><mo>]</mo><mo>/</mo><munderover><mi>&Sigma;</mi><mrow><mi>m</mi><mo>=</mo><mn>1</mn></mrow><mi>M</mi></munderover><mi>size</mi><mrow><mo>(</mo><msub><mi>RECT</mi><mi>m</mi></msub><mo>)</mo></mrow><mo>,</mo></mrow></math> Obtaining the line distance FrameDist between the current image and the previous imagei+1
Wherein, size (RECT)m) Representing a text region RECTmArea of (1), RectDisti+1(M) represents the distance between the text distribution areas of the current image and the previous image, and M represents the text distributionThe total number of regions.
35. The apparatus of claim 34, wherein the calculation is performed by
<math><mrow><msub><mi>RectDist</mi><mrow><mi>i</mi><mo>+</mo><mn>1</mn></mrow></msub><mrow><mo>(</mo><mi>m</mi><mo>)</mo></mrow><mo>=</mo><mo>[</mo><munder><mi>&Sigma;</mi><mrow><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>&Element;</mo><msub><mi>RECT</mi><mi>m</mi></msub></mrow></munder><msubsup><mi>R</mi><mi>i</mi><mi>B</mi></msubsup><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>&times;</mo><msub><mi>CorDist</mi><mrow><mi>i</mi><mo>+</mo><mn>1</mn></mrow></msub><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>]</mo><mo>/</mo><munder><mi>&Sigma;</mi><mrow><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>&Element;</mo><msub><mi>RECT</mi><mi>m</mi></msub></mrow></munder><msubsup><mi>R</mi><mrow><mi>i</mi><mo>+</mo><mn>1</mn></mrow><mi>B</mi></msubsup><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>,</mo><mo></mo></mrow></math> Obtaining the distance between the text distribution areas of the current image and the previous image;
wherein,
Figure F2009100780074C0000081
representing the value of a pixel (x, y) on the stroke profile of the previous image, CortDisti+1And (x, y) represents the color distance between corresponding pixel points of the current image and the previous image.
36. The apparatus of claim 35,
<math><mrow><msub><mi>CortDist</mi><mrow><mi>i</mi><mo>+</mo><mn>1</mn></mrow></msub><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>=</mo><msup><mrow><mo>{</mo><mfrac><mn>1</mn><mn>3</mn></mfrac><munder><mi>&Sigma;</mi><mrow><mi>cor</mi><mo>=</mo><mi>R</mi><mo>,</mo><mi>G</mi><mo>,</mo><mi>B</mi></mrow></munder><msup><mrow><mo>[</mo><msubsup><mi>P</mi><mrow><mi>i</mi><mo>+</mo><mn>1</mn></mrow><mi>cor</mi></msubsup><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>-</mo><msubsup><mi>P</mi><mi>i</mi><mi>cor</mi></msubsup><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>]</mo></mrow><mn>2</mn></msup><mo>}</mo></mrow><mfrac><mn>1</mn><mn>2</mn></mfrac></msup></mrow></math>
wherein,
Figure F2009100780074C0000083
and
Figure F2009100780074C0000084
and respectively representing RGB color values of corresponding pixel points of the current image and the previous image.
CN200910078007A 2009-02-09 2009-02-09 Method and device for detecting strokes of characters, method and device for locating lines of characters, and method and device for judging repeat of subtitles Pending CN101799922A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN200910078007A CN101799922A (en) 2009-02-09 2009-02-09 Method and device for detecting strokes of characters, method and device for locating lines of characters, and method and device for judging repeat of subtitles

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN200910078007A CN101799922A (en) 2009-02-09 2009-02-09 Method and device for detecting strokes of characters, method and device for locating lines of characters, and method and device for judging repeat of subtitles

Publications (1)

Publication Number Publication Date
CN101799922A true CN101799922A (en) 2010-08-11

Family

ID=42595595

Family Applications (1)

Application Number Title Priority Date Filing Date
CN200910078007A Pending CN101799922A (en) 2009-02-09 2009-02-09 Method and device for detecting strokes of characters, method and device for locating lines of characters, and method and device for judging repeat of subtitles

Country Status (1)

Country Link
CN (1) CN101799922A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102915438A (en) * 2012-08-21 2013-02-06 北京捷成世纪科技股份有限公司 Method and device for extracting video subtitles
CN103679208A (en) * 2013-11-27 2014-03-26 北京中科模识科技有限公司 Broadcast and television caption recognition based automatic training data generation and deep learning method
CN107123127A (en) * 2017-04-27 2017-09-01 北京京东尚科信息技术有限公司 A kind of image subject extracting method and device
CN108573258A (en) * 2018-04-24 2018-09-25 中国科学技术大学 Chinese language word localization method is tieed up in a kind of quick complex background image
CN109614938A (en) * 2018-12-13 2019-04-12 深源恒际科技有限公司 A kind of text objects detection method and system based on depth network
CN113240779A (en) * 2021-05-21 2021-08-10 北京达佳互联信息技术有限公司 Method and device for generating special character effect, electronic equipment and storage medium

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102915438A (en) * 2012-08-21 2013-02-06 北京捷成世纪科技股份有限公司 Method and device for extracting video subtitles
CN102915438B (en) * 2012-08-21 2016-11-23 北京捷成世纪科技股份有限公司 The extracting method of a kind of video caption and device
CN103679208A (en) * 2013-11-27 2014-03-26 北京中科模识科技有限公司 Broadcast and television caption recognition based automatic training data generation and deep learning method
CN107123127A (en) * 2017-04-27 2017-09-01 北京京东尚科信息技术有限公司 A kind of image subject extracting method and device
CN108573258A (en) * 2018-04-24 2018-09-25 中国科学技术大学 Chinese language word localization method is tieed up in a kind of quick complex background image
CN108573258B (en) * 2018-04-24 2020-06-26 中国科学技术大学 Method for quickly positioning dimension Chinese characters in complex background image
CN109614938A (en) * 2018-12-13 2019-04-12 深源恒际科技有限公司 A kind of text objects detection method and system based on depth network
CN109614938B (en) * 2018-12-13 2022-03-15 深源恒际科技有限公司 Text target detection method and system based on deep network
CN113240779A (en) * 2021-05-21 2021-08-10 北京达佳互联信息技术有限公司 Method and device for generating special character effect, electronic equipment and storage medium
CN113240779B (en) * 2021-05-21 2024-02-23 北京达佳互联信息技术有限公司 Method and device for generating text special effects, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111382704B (en) Vehicle line pressing violation judging method and device based on deep learning and storage medium
Zhang et al. Image segmentation based on 2D Otsu method with histogram analysis
CN107093172B (en) Character detection method and system
CN103034848B (en) A kind of recognition methods of form types
US8059868B2 (en) License plate recognition apparatus, license plate recognition method, and computer-readable storage medium
CN101334836B (en) License plate positioning method incorporating color, size and texture characteristic
CN104200210B (en) A kind of registration number character dividing method based on component
WO2016127545A1 (en) Character segmentation and recognition method
CN104298982A (en) Text recognition method and device
CN101515325A (en) Character extracting method in digital video based on character segmentation and color cluster
CN111259878A (en) Method and equipment for detecting text
CN105205488A (en) Harris angular point and stroke width based text region detection method
CN109376740A (en) A kind of water gauge reading detection method based on video
CN101799922A (en) Method and device for detecting strokes of characters, method and device for locating lines of characters, and method and device for judging repeat of subtitles
CN104239867A (en) License plate locating method and system
WO2023279966A1 (en) Multi-lane-line detection method and apparatus, and detection device
CN108830133A (en) Recognition methods, electronic device and the readable storage medium storing program for executing of contract image picture
CN104504717A (en) Method and device for detection of image information
US20110200257A1 (en) Character region extracting apparatus and method using character stroke width calculation
CN106951896B (en) License plate image tilt correction method
CN113239733B (en) Multi-lane line detection method
CN111898491A (en) Method and device for identifying reverse driving of vehicle and electronic equipment
EP2821935B1 (en) Vehicle detection method and device
CN111126383A (en) License plate detection method, system, device and storage medium
CN115240197A (en) Image quality evaluation method, image quality evaluation device, electronic apparatus, scanning pen, and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
AD01 Patent right deemed abandoned

Effective date of abandoning: 20100811

C20 Patent right or utility model deemed to be abandoned or is abandoned