CN111507344A

CN111507344A - Method and device for recognizing characters from image

Info

Publication number: CN111507344A
Application number: CN201910092406.XA
Authority: CN
Inventors: 矫健
Original assignee: Beijing Qihoo Technology Co Ltd
Current assignee: Beijing Qihoo Technology Co Ltd
Priority date: 2019-01-30
Filing date: 2019-01-30
Publication date: 2020-08-07

Abstract

The invention discloses a method and a device for recognizing characters from an image. The method comprises the following steps: acquiring an image set to be identified; extracting text regions from each image in the image set respectively; extracting characters from each text area; and identifying the extracted characters to obtain character identification results corresponding to the images. The technical scheme has the advantages that the text area is extracted from the image instead of being integrally recognized, so that the recognition accuracy can be greatly improved, the interference of similar character lines in the image is avoided, the recognition effect in the image with better fusion of characters such as posters and the like and the background is excellent, various service scenes can be compatible, the image data production, verification and operation cost is reduced, the content clues contained in the image can be automatically explored, and important help is provided for data mining based on the image.

Description

Method and device for recognizing characters from image

Technical Field

The invention relates to the technical field of image recognition, in particular to a method and a device for recognizing characters from an image.

Background

With the advent of the high-tech information age and the development of internet computer technology, a great deal of multimedia information appears in daily life of people in an exponential growth mode, and the sharply increased image information attracts more and more attention, but because the images store original information of colors, brightness and the like of objects in the form of pixels, high-level description of image content is lacked, and the content is difficult to automatically identify through computer analysis, understanding, retrieval and multiplexing of the images.

Although there is a great technical advance in recognizing text by means of OCR (Optical Character Recognition) or the like, there is still a certain disadvantage in directly recognizing an image. For example, images such as posters often contain characters, and character detection is affected by factors such as languages, character resolutions, character pitches, distribution conditions, background of characters, illumination, colors, and the like, and backgrounds of certain patterns and textures in such images are difficult to distinguish from the characters.

Disclosure of Invention

In view of the above, the present invention has been made to provide a method and apparatus for recognizing characters from an image that overcomes or at least partially solves the above-mentioned problems.

According to an aspect of the present invention, there is provided a method for recognizing a text from an image, comprising:

acquiring an image set to be identified;

extracting text regions from each image in the image set respectively;

extracting characters from each text area;

and identifying the extracted characters to obtain character identification results corresponding to the images.

Optionally, the acquiring the set of images to be recognized includes:

and when the resource description information in the multimedia resource library is updated, obtaining the poster in the resource description information and putting the poster into the image set.

Optionally, the method further comprises:

acquiring character description information in the resource description information, and calculating the matching degree of the character description information and the character recognition result;

if the matching degree reaches a preset threshold value, judging that the poster passes verification;

and if the matching degree does not reach a preset threshold value, putting the poster into a set to be verified.

Optionally, the extracting the text region from each image in the image set respectively includes:

extracting candidate text regions from each image respectively;

and inputting the candidate text region into a preset model for verification, and determining the text region according to a verification result.

Optionally, the extracting the candidate text regions from the images respectively includes one or more of the following manners:

extracting candidate text regions from each image based on edge detection;

extracting candidate text regions from each image respectively based on the texture features;

extracting candidate text regions from each image respectively based on the color features;

candidate text regions will be extracted from each image separately based on the connected components.

Optionally, the extracting the candidate text regions from the respective images based on the edge detection includes:

determining a preliminary candidate text region according to the edge information, and determining a candidate text region according to the connected component and the preliminary candidate text region.

Optionally, the determining the preliminary candidate text region according to the edge information includes:

smoothing the image according to the median filter;

carrying out edge detection on the smoothed image according to an edge detection operator to obtain an edge image;

and removing non-character edges according to an edge filter to obtain a preliminary candidate text region.

detecting edges based on a Gaussian function and a binarization mode to obtain a preliminary candidate text region;

carrying out color modeling on the preliminary candidate text region according to a Gaussian mixture model to determine background information;

identifying missing characters from the image according to the background information;

and determining a candidate text region according to the identification result of the missed characters and the preliminary candidate text region.

extracting the color edge of the image according to a Sobel edge detection operator;

carrying out binarization processing on the edge image in an entropy threshold mode, and obtaining an image block according to mathematical morphology closing operation and opening operation;

determining a preliminary candidate text region according to one or more of the height, the aspect ratio and the density of edge points in the image block;

and performing wavelet decomposition on the preliminary candidate text region, extracting features according to wavelet coefficients, and determining the candidate text region from the image.

Optionally, the extracting the candidate text regions from the images respectively based on the texture features includes:

the method comprises the steps of sliding a window with a preset size in an image, extracting one or more of an average value, a second-order central moment and a third-order central moment in the window to serve as features, classifying regions in the window according to a neural network based on the features, and obtaining candidate text regions according to classification results.

and performing wavelet transformation on the image, extracting variance from a plurality of high-frequency sub-images by utilizing a histogram as a feature, and determining a candidate text region according to a K-means algorithm.

and performing texture segmentation on the image according to a Gaussian filter, and determining a candidate text region according to a bottom-up connected domain.

Optionally, the extracting the candidate text regions from the respective images based on the color features includes:

and performing three-mean clustering on the image according to the Euclidean distance and the cosine similarity, processing each obtained sub-control according to a log-Gabor filter, and determining a candidate text region according to a filtering result.

and performing color clustering on the image according to the histogram of the RGB three color components, decomposing the image into a plurality of binary images according to each color obtained by clustering, and determining candidate text regions based on the connected domain.

carrying out local color quantization on the image;

and determining candidate text regions according to the size, the aspect ratio and the proportion of the character color in the minimum envelope rectangle of the connected component.

Optionally, the extracting candidate text regions from each image based on the connected component includes:

segmenting the image based on a nonlinear Niblack binarization algorithm, then calibrating connected domains, and extracting the characteristics of each connected domain;

and constructing a cascade classifier according to AdaBoost, screening the connected domain, and obtaining a candidate text region according to a screening result.

and generating gradient information of the image, and determining a candidate text region according to mathematical morphology operation after the gradient image is binarized.

and performing primary connected domain calibration according to the color difference value of the adjacent pixels, and performing iterative judgment until the pixels cannot be combined according to whether the colors of the pixels at the boundary can be combined or not to obtain a candidate text region.

Optionally, the preset model is a support vector machine SVM model including one or more of the following features:

the character row projection characteristic, the character column projection characteristic, the histogram crossing characteristic, the shape matching characteristic, the co-occurrence matrix characteristic, the edge density characteristic and the direction consistency characteristic.

Optionally, the extracting the text from each text region includes:

and performing character segmentation and extraction on each text region based on an OSTU algorithm.

Optionally, the recognizing the extracted text to obtain the text recognition result corresponding to each image includes:

and for the extracted characters, acquiring height information of the characters in the characters by using projection, and segmenting according to the regression characters to obtain a plurality of characters contained in the characters.

Optionally, the recognizing the extracted text to obtain the text recognition result corresponding to each image further includes:

OCR character recognition is performed for each character.

According to another aspect of the present invention, there is provided an apparatus for recognizing a character from an image, including:

the image set acquisition unit is suitable for acquiring an image set to be identified;

a text region extraction unit adapted to extract text regions from respective images in the image set;

a text extraction unit adapted to extract text from each text region;

and the character recognition unit is suitable for recognizing the extracted characters to obtain character recognition results corresponding to the images.

Optionally, the image collection obtaining unit is adapted to obtain the poster in the resource description information to place in the image collection when the resource description information in the multimedia resource library is updated.

Optionally, the apparatus further comprises:

the verification unit is suitable for acquiring the character description information in the resource description information and calculating the matching degree of the character description information and the character recognition result; if the matching degree reaches a preset threshold value, judging that the poster passes verification; and if the matching degree does not reach a preset threshold value, putting the poster into a set to be verified.

Optionally, the text region extracting unit is adapted to extract candidate text regions from the respective images; and inputting the candidate text region into a preset model for verification, and determining the text region according to a verification result.

Optionally, the text region extracting unit is adapted to extract candidate text regions from the respective images in one or more of the following manners: extracting candidate text regions from each image based on edge detection; extracting candidate text regions from each image respectively based on the texture features; extracting candidate text regions from each image respectively based on the color features; candidate text regions will be extracted from each image separately based on the connected components.

Optionally, the text region extracting unit is adapted to determine a preliminary candidate text region according to the edge information, and determine a candidate text region according to the connected component and the preliminary candidate text region.

Optionally, the text region extracting unit is adapted to perform smoothing processing on the image according to a median filter; carrying out edge detection on the smoothed image according to an edge detection operator to obtain an edge image; and removing non-character edges according to an edge filter to obtain a preliminary candidate text region.

Optionally, the text region extracting unit is adapted to detect an edge based on a gaussian function and a binarization mode to obtain a preliminary candidate text region; carrying out color modeling on the preliminary candidate text region according to a Gaussian mixture model to determine background information; identifying missing characters from the image according to the background information; and determining a candidate text region according to the identification result of the missed characters and the preliminary candidate text region.

Optionally, the text region extracting unit is adapted to extract color edges of the image according to a Sobel edge detection operator; carrying out binarization processing on the edge image in an entropy threshold mode, and obtaining an image block according to mathematical morphology closing operation and opening operation; determining a preliminary candidate text region according to one or more of the height, the aspect ratio and the density of edge points in the image block; and performing wavelet decomposition on the preliminary candidate text region, extracting features according to wavelet coefficients, and determining the candidate text region from the image.

Optionally, the text region extracting unit is adapted to slide in the image with a window of a preset size, extract one or more of an average value, a second-order central moment and a third-order central moment in the window as a feature, classify the region in the window according to a neural network based on the feature, and obtain a candidate text region according to a classification result.

Optionally, the text region extracting unit is adapted to perform wavelet transform on the image, extract variance as a feature using a histogram in a number of high-frequency sub-images, and determine candidate text regions according to a K-means algorithm.

Optionally, the text region extracting unit is adapted to perform texture segmentation on the image according to a gaussian filter, and then determine candidate text regions according to a bottom-to-top connected domain.

Optionally, the text region extracting unit is adapted to perform three-mean clustering on the image according to the euclidean distance and the cosine similarity, process each obtained sub-control according to a log-Gabor filter, and determine the candidate text region according to the filtering result.

Optionally, the text region extracting unit is adapted to perform color clustering on the image according to a histogram of three color components of RGB, decompose the image into a plurality of binary images according to each color obtained by clustering, and determine candidate text regions based on the connected domain.

Optionally, the text region extracting unit is adapted to perform local color quantization on the image; and determining candidate text regions according to the size, the aspect ratio and the proportion of the character color in the minimum envelope rectangle of the connected component.

Optionally, the text region extracting unit is adapted to segment the image based on a nonlinear Niblack binarization algorithm, and then perform connected domain calibration to extract features of each connected domain; and constructing a cascade classifier according to AdaBoost, screening the connected domain, and obtaining a candidate text region according to a screening result.

Optionally, the text region extracting unit is adapted to generate gradient information of the image, and determine the candidate text region according to a mathematical morphology operation after binarizing the gradient image.

Optionally, the text region extracting unit is adapted to perform primary connected component calibration according to a color difference value of adjacent pixels, and iteratively determine whether the colors of the pixels at the boundary can be combined until the colors cannot be combined, so as to obtain a candidate text region.

Optionally, the preset model is a support vector machine SVM model including one or more of the following features: the character row projection characteristic, the character column projection characteristic, the histogram crossing characteristic, the shape matching characteristic, the co-occurrence matrix characteristic, the edge density characteristic and the direction consistency characteristic.

Optionally, the text extraction unit is adapted to perform text segmentation and extraction on each text region based on an OSTU algorithm.

Optionally, the text recognition unit is adapted to obtain height information of characters in the extracted text by projection, and then segment the extracted text according to the regression word to obtain a plurality of characters included in the text.

Optionally, the character recognition unit is adapted to perform OCR character recognition on each character respectively.

In accordance with still another aspect of the present invention, there is provided an electronic apparatus including: a processor; and a memory arranged to store computer executable instructions that, when executed, cause the processor to perform a method as any one of the above.

According to a further aspect of the invention, there is provided a computer readable storage medium, wherein the computer readable storage medium stores one or more programs which, when executed by a processor, implement a method as any one of the above.

According to the technical scheme, after the image set to be identified is obtained, the text regions are respectively extracted from the images in the image set, the characters are extracted from the text regions, and the extracted characters are identified to obtain the character identification results corresponding to the images. The technical scheme has the advantages that the text area is extracted from the image instead of being integrally recognized, so that the recognition accuracy can be greatly improved, the interference of similar character lines in the image is avoided, the recognition effect in the image with better fusion of characters such as posters and the like and the background is excellent, various service scenes can be compatible, the image data production, verification and operation cost is reduced, the content clues contained in the image can be automatically explored, and important help is provided for data mining based on the image.

The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

FIG. 1 is a flow diagram illustrating a method for recognizing text from an image according to one embodiment of the present invention;

FIG. 2 is a schematic diagram of an apparatus for recognizing text from an image according to an embodiment of the present invention;

FIG. 3 shows a schematic structural diagram of an electronic device according to one embodiment of the invention;

FIG. 4 shows a schematic structural diagram of a computer-readable storage medium according to one embodiment of the invention;

FIG. 5 shows a schematic view of a text column in a poster and its projection;

FIG. 6a is a diagram of a set of characters with relatively stable edge point distances;

FIG. 6b is a schematic diagram of another set of characters with relatively stable edge point distances;

FIG. 6c is a diagram of a set of characters with relatively unstable distances of edge points;

FIG. 6d is a schematic diagram of another set of characters with relatively unstable distances of edge points;

FIG. 7a shows a schematic diagram of a matrix template for deburring according to an embodiment of the present invention;

FIG. 7b shows a schematic diagram of a matrix template for line drawing smoothing and hole filling according to one embodiment of the invention;

FIG. 8 shows a schematic diagram of a set of characters and their projections;

FIG. 9 shows a schematic diagram of the regression word segmentation method principle;

fig. 10 is a diagram illustrating a character segmentation result corresponding to the character in fig. 8.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Fig. 1 is a flow chart illustrating a method for recognizing characters from an image according to an embodiment of the present invention.

As shown in fig. 1, the method includes:

step S110, an image set to be recognized is acquired. This step is to determine the images to be identified, at least one of which is to be identified in the set when identification is possible. Wherein the image may be a poster with text, a vertical drawing, a cartoon, etc. In the existing mode, information such as characters in the image is often recognized in a manual mode, the cost of labor, business, operation and the like is increased along with the increase of the information in the image, and the recognition accuracy is also closely related to the careful degree of corresponding workers.

In step S120, text regions are extracted from each image in the image set. The text area here is an image area that is considered to contain characters, which may be chinese characters, latin letters, or the like.

In step S130, characters are extracted from each text area.

Step S140 is performed to identify the extracted characters, and character identification results corresponding to the respective images are obtained.

The steps S120 to S140 provide the following design ideas: the method comprises the steps of determining a region which possibly contains characters from an image as a text region, extracting a part which is considered as the characters from the text region, and finally identifying the characters. Therefore, the steps are progressively carried out layer by layer, the identification is split into a plurality of steps, the identification accuracy of the step can be provided in a certain mode in each step, and finally the overall identification rate is also obviously improved.

As can be seen, in the method shown in fig. 1, after the image set to be recognized is obtained, the text regions are extracted from each image in the image set, the characters are extracted from each text region, and the extracted characters are recognized, so as to obtain the character recognition results corresponding to each image. The technical scheme has the advantages that the text area is extracted from the image instead of being integrally recognized, so that the recognition accuracy can be greatly improved, the interference of similar character lines in the image is avoided, the recognition effect in the image with better fusion of characters such as posters and the like and the background is excellent, various service scenes can be compatible, the image data production, verification and operation cost is reduced, the content clues contained in the image can be automatically explored, and important help is provided for data mining based on the image.

In an embodiment of the present invention, in the method, acquiring the image set to be recognized includes: and when the resource description information in the multimedia resource library is updated, acquiring the poster in the resource description information and putting the poster into the image set.

For example, video assets of a plurality of movie plays and corresponding asset description information, which may be film presentations, staff information, etc. in text form or posters, are stored in a multimedia asset library.

In one embodiment, the multimedia asset library is implemented using a database of an existing media asset system. The media resource system injects resource description information into the OTT content management system according to a specified data injection interface specification, the center platform or the local platform verifies the SecretId and SecretKey signature (including a timestamp, a random number, an access method and a method parameter) of the API interface, the verification updates the resource description information in real time through obtaining the media resource system, then remotely captures according to a picture (poster) source address of source data and stores the source data in the local for use during preprocessing.

In an embodiment of the present invention, the method further includes: acquiring character description information in the resource description information, and calculating the matching degree of the character description information and a character recognition result; if the matching degree reaches a preset threshold value, judging that the poster passes verification; and if the matching degree does not reach the preset threshold value, putting the poster into the set to be verified.

In one particular embodiment may be used to verify that the poster matches a video asset of a movie or the like. Generally speaking, the text description information in the video resources and the resource description information is often correct, and the verification is relatively easy and can be realized through simple text and feature code matching and other modes; but the verification of the poster involves the identification of the poster content, and the technical scheme provided by the invention can be used, and since the characters in the poster are also the introduction of the corresponding video resource, the information is usually contained in the character description information. For example, the threshold value is set to be 80%, if the matching degree of characters identified from the poster and the film names is larger than or equal to 80%, the characters automatically pass, otherwise, the characters enter a manual review list to perform secondary verification and review, so that the manual intervention degree is reduced to a greater extent, and the operation cost is saved.

In an embodiment of the present invention, in the method, extracting the text region from each image in the image set respectively includes: extracting candidate text regions from each image respectively; and inputting the candidate text region into a preset model for verification, and determining the text region according to a verification result.

The text area and the non-text area can be distinguished by different characteristics of the text and the background, for example, by training the characteristics of the size, color and brightness, edge, stroke adhesion, texture, arrangement mode, character spacing and the like of the text, the image is divided into the text area and the non-text area, and the influence of various factors such as language, text resolution, text spacing, distribution condition, background complexity, illumination, color and the like on the text is reduced as much as possible. In an embodiment of the invention, the extracted region can be used as a candidate text region, the candidate text region is verified by using a preset model obtained by pre-learning, and the candidate text region is determined as the text region after the verification is passed, so that the recognition accuracy is further improved.

In an embodiment of the present invention, in the method, extracting the candidate text regions from the images respectively includes one or more of the following modes: extracting candidate text regions from each image based on edge detection; extracting candidate text regions from each image respectively based on the texture features; extracting candidate text regions from each image respectively based on the color features; candidate text regions will be extracted from each image separately based on the connected components.

There are several specific implementations based on edge detection, and only three of them will be exemplified below.

The first method is as follows: in one embodiment of the present invention, the method, wherein extracting the candidate text regions from the images respectively based on the edge detection comprises: determining a preliminary candidate text region according to the edge information, and determining a candidate text region according to the connected component and the preliminary candidate text region. Specifically, in one embodiment of the present invention, the determining the preliminary candidate text region according to the edge information in the method described above includes: smoothing the image according to the median filter; carrying out edge detection on the smoothed image according to an edge detection operator to obtain an edge image; and removing non-character edges according to an edge filter to obtain a preliminary candidate text region.

The second method comprises the following steps: in one embodiment of the present invention, the method, wherein extracting the candidate text regions from the images respectively based on the edge detection comprises: detecting edges based on a Gaussian function and a binarization mode to obtain a preliminary candidate text region; carrying out color modeling on the preliminary candidate text region according to the Gaussian mixture model to determine background information; identifying missing characters from the image according to the background information; and determining a candidate text region according to the identification result of the missed characters and the preliminary candidate text region.

The third method comprises the following steps: in one embodiment of the present invention, the method, wherein extracting the candidate text regions from the images respectively based on the edge detection comprises: extracting the color edge of the image according to a Sobel edge detection operator; carrying out binarization processing on the edge image in an entropy threshold mode, and obtaining an image block according to mathematical morphology closing operation and opening operation; determining a preliminary candidate text region according to one or more of the height, the aspect ratio and the density of edge points in the image block; and performing wavelet decomposition on the preliminary candidate text region, extracting features according to wavelet coefficients, and determining the candidate text region from the image.

The above three methods may be used in combination or alternatively, and other methods for extracting candidate text regions from each image based on edge detection may be selected.

Next, a design idea of extracting candidate text regions from each image based on texture features is described: the character is regarded as a special texture, and whether a pixel point or a pixel block belongs to the character or not is judged by using texture characteristics of the image. Since characters are typically made up of many thinner strokes, the areas where the strokes of the text are present are also typically areas of the image where texture is rich.

The concept assumes that the text region and the background region have texture difference, extracts texture features capable of distinguishing the text region and the background region, and then performs texture classification. Common texture features are the first derivative, the second derivative, the edge intensity, the local variance, the FFT coefficients, the Gabor coefficients, and various statistical features of the wavelet coefficients such as the first moment, the second moment, the histogram, the co-occurrence matrix, etc. Since the extracted texture features are usually high in dimensionality, a classifier is usually designed by adopting a machine learning method.

How to extract candidate text regions from each image based on texture features is described below in three exemplary ways.

The first method is as follows: in an embodiment of the present invention, the method, wherein extracting the candidate text regions from the images respectively based on the texture features includes: the method comprises the steps of sliding a window with a preset size in an image, extracting one or more of an average value, a second-order central moment and a third-order central moment in the window to serve as features, classifying regions in the window according to a neural network based on the features, and obtaining candidate text regions according to classification results.

The second method comprises the following steps: in an embodiment of the present invention, the method, wherein extracting the candidate text regions from the images respectively based on the texture features includes: and performing wavelet transformation on the image, extracting variance from a plurality of high-frequency sub-images by utilizing a histogram as a feature, and determining a candidate text region according to a K-means algorithm.

The third method comprises the following steps: in an embodiment of the present invention, the method, wherein extracting the candidate text regions from the images respectively based on the texture features includes: and performing texture segmentation on the image according to a Gaussian filter, and determining a candidate text region according to a bottom-up connected domain.

The color-based character detection method utilizes the fact that most characters in the image have uniform colors, the original image can be decomposed into a plurality of sub-images with different colors by a color reduction method, and then the character detection work is independently carried out on each sub-image. How to extract candidate text regions from each image based on color features is described below in three exemplary ways.

The first method is as follows: in one embodiment of the present invention, the method wherein extracting the candidate text regions from the respective images based on the color features comprises: and performing three-mean clustering on the image according to the Euclidean distance and the cosine similarity, processing each obtained sub-control according to a log-Gabor filter, and determining a candidate text region according to a filtering result.

The second method comprises the following steps: in one embodiment of the present invention, the method wherein extracting the candidate text regions from the respective images based on the color features comprises: and performing color clustering on the image according to the histogram of the RGB three color components, decomposing the image into a plurality of binary images according to each color obtained by clustering, and determining candidate text regions based on the connected domain.

The third method comprises the following steps: in one embodiment of the present invention, the method wherein extracting the candidate text regions from the respective images based on the color features comprises: carrying out local color quantization on the image; and determining candidate text regions according to the size, the aspect ratio and the proportion of the character color in the minimum envelope rectangle of the connected component.

The design idea based on the connected component uses the character arrangement, that is, the characters are grouped in most cases, for example, a plurality of letters form a word, or a plurality of Chinese characters form a sentence. Characters belonging to the same group will have more uniform geometric size and arrangement rules. How to extract the candidate text regions from each image based on the connected component is described below in three exemplary ways.

The first method is as follows: in an embodiment of the present invention, the method, wherein extracting candidate text regions from the images respectively based on the connected component includes: segmenting the image based on a nonlinear Niblack binarization algorithm, then calibrating connected domains, and extracting the characteristics of each connected domain; and constructing a cascade classifier according to AdaBoost, screening the connected domain, and obtaining a candidate text region according to a screening result.

The second method comprises the following steps: in an embodiment of the present invention, the method, wherein extracting candidate text regions from the images respectively based on the connected component includes: and generating gradient information of the image, and determining a candidate text region according to mathematical morphology operation after the gradient image is binarized.

In an embodiment of the present invention, the method, wherein extracting candidate text regions from each image based on the connected component includes: and performing primary connected domain calibration according to the color difference value of the adjacent pixels, and performing iterative judgment until the pixels cannot be combined according to whether the colors of the pixels at the boundary can be combined or not to obtain a candidate text region.

In an embodiment of the present invention, in the method, the preset model is a support vector machine SVM model including one or more of the following features: the character row projection characteristic, the character column projection characteristic, the histogram crossing characteristic, the shape matching characteristic, the co-occurrence matrix characteristic, the edge density characteristic and the direction consistency characteristic.

As mentioned above, the purpose of the verification using the preset model is to further remove the text region that is detected by mistake, so as to improve the detection accuracy. In one embodiment, a multi-feature verification strategy based on SVM is adopted, which mainly comprises two parts of classifier design and feature selection, and the features used in the stage mainly comprise the features mentioned above. A specific model training process is briefly described below.

First, the respective calculated amounts of the respective features are determined. In the text region candidate link, the entire text region candidate is taken as a unit, and thus some statistics are required to describe its state. In actual operation, the probability density function P is transformed from the histogram H as follows:

the statistics used have mean values

Variance (variance)

Kurtosis K and consistency C, as follows:

the mean value is correspondingly defined according to different conditions, and is not the mean value obtained through moment estimation; the kurtosis is an index used for reflecting the distribution concentration degree, namely: the degree of sharp or flat top of the distribution curve is described, the variance can reflect the concentration degree of the distribution to a certain degree, but the data with the same variance have different kurtosis, so the fourth-order central moment (m) is used₄) To reflect the degree of sharpness of the distribution.

The implementation of feature selection is described below.

1) Text line projection feature and text column projection feature

Text is composed of a plurality of characters, adjacent characters have gaps between them, and the characters are similar in width. In the case of vertical text row, the vertical projection of the stroke image of the text row has a plurality of peaks and valleys, while the horizontal projection has only a few peaks or valleys, as shown in fig. 5, which gives an example of a text column in a poster and its projection. Thus, we can validate lines of text by the number and width of peaks or valleys on the horizontal and vertical projections of the line-stroke image.

2) Traversing histogram features

The character position distribution of the text line is considered to be periodic, namely the character distribution in the text line is characterized as follows: word-space-word-space, which is also an important feature to distinguish between text lines and non-text lines. The invention uses a characteristic-crossing histogram characteristic which directly uses a crossing line to capture the character cycle characteristic in a spatial domain, and the characteristic extraction can be decomposed into the following steps:

① extracts a gradient map of the text and obtains a one-dimensional projection signal by projecting all pixels in the vertical or horizontal direction.

②, the gradient projection is gaussian smoothed, and if the gradient projection can be observed with approximate period and rule in the text line image, assuming that there are N crossing lines, the crossing number of the k-th crossing line is cc (k), and k is 1, 2, 3 … … N, then the crossing histogram can be calculated by the following formula:

for example, if the maximum value of the gradient projection is 300, using N ═ 300 scan lines scans, 300 histogram bits cch (k) are obtained; k is 1, 2, … 300, by uniformly dividing and accumulating bits cch (k) of the histogram:

thus, a 16-bit through histogram is obtained, wherein a non-overlapping window is represented, and the number of the histogram bits in the window are accumulated to form a dimension of the histogram.

3) Shape matching features

In the local binary image, the outline of the character connected domain is smooth, the outline of the background noise is irregular, mathematical morphology open operation is carried out, and the change of the image before and after processing is compared, so that the change of the character connected domain is small, and the change of the background noise is large. By using the characteristic, the shape matching characteristic can be extracted, and the rough detection text region can be screened. The feature extraction key points are as follows:

① since the polarity of the character is unknown before binarization (the polarity describes the color shade contrast relationship between the text and the background, i.e. whether the current text area is a white-background black word or a black-background white word), the first image is obtained after binarization, and then the first image is negated to obtain the second image, and the two images are respectively subjected to open operation to calculate the shape matching degree.

② the on operation is performed on both images, the size of the structuring element being 2 × 2.

③ the equation is given by:

the smaller the obtained M value is, the smaller the degree of change before and after the opening operation is, and the more likely it is to be a text.

4) Co-occurrence matrix characterization

The co-occurrence matrix is defined by the joint probability density of the pixels at two positions, and is a method for describing texture features. The stroke width of the character is observed to be uniform, i.e.: the distance between the left and right edge points (or the upper and lower edge points) of the character is relatively stable as shown in fig. 6a and 6b, while the distance between the edge points of the background noise is irregular as shown in fig. 6c and 6 d. In this case, the candidate text region is suitably screened using the co-occurrence matrix feature.

5) Other features

The other characteristics comprise edge density characteristics and direction consistency characteristics of the candidate text region, the width of the candidate text region Ti is set as w, the height is set as h, and the candidate text region Ti comprises a connected domain ej; j is 1; ni, ej, having a pixel point number Pj and a direction consistency Cj, having an edge density characteristic as follows:

and directional consistency features, as follows:

the following briefly introduces an explanation of the SVM design method. Candidate text region verification is actually a binary problem for pattern recognition: classification of text regions and non-text regions. The SVM is one of the most popular classifiers in the field of machine learning nowadays, and shows strong classification generalization capability on many real-world data sets.

For a two-classification problem, a training set is given

：S＝|{(x_i，y_i)|i＝1，2，…，N}∈(X×Y)^N

Wherein x_i∈X＝RⁿIs the ith training sample, y_i∈ Y { -1, 1}, is the class label of xi, as shown by the mapping from the input feature space Rn to the high-dimensional feature space as follows:

the training set S is mapped as follows:

S＝{(x_i，y_i)|i＝1，2，…，N}＝{(Φ(x_i)，y_i)|i＝1，…，N}

the key to the SVM is to find an optimal hyperplane in a high-dimensional space, which separates two classes, and the vector ω and the real number b for determining this hyperplane can be obtained by minimizing the following objective function, as shown below:

s.t.y_i(<ω，x_i>+b)≥1-ζ_i，i＝1，…N，ξ_i≥0

ξ therein_iIs the learning error of the ith training sample, and C is a penalty factor for adjusting two objectivesThe weight of the object is set to be,<ω，x_i>the inner product is expressed. The solution to the above optimization problem can be obtained by minimizing the following lagrangian equation:

where α, β are lagrangian multipliers, minimizing the above lagrangian equation is equivalent to solving its dual problem, as follows:

a_i≥0，i＝1，...，N

wherein K (x)_i，x_j)＝<Φ(x_i)，Φ(x_j)>Is a kernel function. By solving the dual problem, the optimal hyperplane can be obtained as follows:

wherein SV_sRepresenting a set of support vectors, corresponding

Is not 0. Given a sample x to be classified, its class can be determined according to the sign of f (x).

A specific example of the steps of feature training and candidate text region verification in support of an SVM classifier is described below:

① preparation of the sample:

the data set is prepared in the format required by the SVM.

Positive sample, 2905 text regions segmented from the image;

negative sample, 3601 segmented out of the image to the background area.

② feature selection:

and observing the difference between the text area and the background area, designing features, and writing the calculation result into a file in a format required by the lib-SVM. The lib-SVM toolkit is provided with an executive program for data normalization, training and classification, and the classification accuracy can be obtained in time according to the feature file so as to measure the effectiveness of the features.

③ text region training and validation:

in the system, an interface function gamma of the lib-SVM is called, a trained parameter file is read in, and candidate text regions are classified. Selecting RBF kernel function

And selecting the optimal parameters C and gamma by adopting cross validation, training the whole training set by using the optimal parameter values to obtain a support vector machine model, and finally testing and predicting by using the obtained model, and carrying out candidate text region validation as described above.

The classification accuracy is the ratio of the number of correctly classified samples to the number of all samples. In one particular experiment, the results of the experiment are shown in Table 1, giving a classification accuracy of 91.55%. In order to prevent the SVM classifier from being over-trained, some features which have a less obvious effect on improving the accuracy are abandoned.

TABLE 1

Kind of characteristics	Number of features	Accuracy of features
			Co-occurrence matrix characterization	6	69.1884％
Shape matching features	1	78.1098％
			Edge density feature	1	83.2486％
Characteristics of gradient change	4	91.1316％
			Directional consistency feature	1	95.0307％

In an embodiment of the present invention, in the method, extracting the text from each text region includes: and performing character segmentation and extraction on each text region based on an OSTU algorithm.

The OSTU algorithm is a binarization algorithm, the gray level histogram is divided into two parts by searching an optimal threshold value, so that the variance between the two parts is maximum, and the variance in the two parts is minimum, the method is simple and has good applicability, the threshold value of the method is automatically obtained, the whole automation is conveniently realized, and a specific implementation example is as follows:

1) the threshold value obtaining method comprises the following steps:

assuming that the gray level of the gray image is M, the ith gray level has n_iThe total number of pixels in the image is N, and the probability of the ith level of gray level is as follows:

assuming that the grayscale image threshold is k, the pixels in the image can be divided into two parts, pixels with grayscale values greater than k and pixels with grayscale values less than k, then:

C₀＝{1,2，…k}，C₁＝{k+1,k+1,…,M}

the overall average gray level of the image is:

C₀the average gray level of a class is:

C₁the average gray level of a class is:

μ-μ(k)

C₀class and C₁The mean of the classes is:

the overall mean of the images is:

μ＝ω₀μ₀+ω₁μ₁

the between-class variance is:

²(k)＝ω₀(μ-μ₀)²+ω₁(μ-μ₁)²

the method is simplified as follows:

when k varies from 1 to M,²(k) k at the maximum value of (a) is the most suitable threshold value to be found.

2) Filtering and drying:

after binarization processing, some noise exists in the separated text region image, filtering and drying processing is required, the noise in the image at this stage is generally isolated noise points, burrs, uneven linear edges and the like, and the interference of some large gaps or thin lines does not exist.

The direct noise removal method generally uses an auxiliary template of n × n (n is generally 3-5) to match the binary image row by row and column by column, and according to the matching result, the pixel at the center of the matrix is changed from "0" to "255", or "255" to "0", so as to achieve the purpose of removing noise.

① Deburring

The matrix template is moved over the binary image, changing the gray value of the pixel corresponding to the center of the template from "0" to "255" as long as the template and binary image match, where "×" represents an arbitrary match.

② line drawing smoothing and hole filling

The matrix template is moved over the binary image, changing the gray value of the pixel corresponding to the center of the template from "0" to "255" as long as the template and binary image match, where "×" represents any match.

③ removal of Individual spots

The independent stains (small spots) mean that the gray values of the surrounding pixels are 255, and the size of the independent stains can be covered by a matrix of n × n, so that the matrix of n × n can be established, the images of the matrix template are 255, and the pixel values of the central area of the matrix are 255 "

In an embodiment of the present invention, the recognizing the extracted text to obtain the text recognition result corresponding to each image includes: and for the extracted characters, acquiring height information of the characters in the characters by using projection, and segmenting according to the regression characters to obtain a plurality of characters contained in the characters.

In a specific embodiment, a method combining a projection method and regression type character segmentation can be adopted to realize single character segmentation, and the two methods can better solve special conditions such as character fracture and adhesion and the like besides the advantages of simplicity, speed and the like due to the combination of the methods. The implementation mode includes that firstly, horizontal projection in a projection method is utilized to conduct segmentation, height information of characters is obtained, then regression character segmentation is adopted to complete character segmentation, and the difference between the height and the width of the characters is approximately in a certain range, so that the width of the characters can be estimated according to the height information, and the position of the next character is predicted.

The projection method is that pixels of a digital image are accumulated in a certain direction, and when the projection method is applied to character segmentation, only the projections in the horizontal direction and the vertical direction are usually adopted. Example (c): the projections of the character-containing image in the horizontal direction and the vertical direction are respectively P_xAnd P_yLet the function f (x, y) represent whether the pixel (x, y) in the binarized image belongs to the foreground of the image, as follows:

then

P_xAnd P_yThe accumulated values of the foreground pixels of the character image along the x-axis and the y-axis are recorded respectively, namely, the distribution of the foreground pixels of the image along the x-axis and the y-axis is represented. The projection effect of the character image is shown in fig. 8.

As can be seen from fig. 8, the binarized character image has a large gap between rows in the horizontal pixel distribution diagram, which can be used as a standard for row segmentation, and noise may occur in the image, so after horizontal projection, the number of foreground pixels in the horizontal direction is less than a certain threshold value, which can be used as the start or end of a row, and the middle value is used as a segmentation point for segmentation. After line segmentation, word segmentation can be performed by using vertical projection, and the principle is the same.

After the segmentation is completed by the projection method, each character must be segmented from the image by using regression word segmentation, and one embodiment of the invention is implemented by using a maximum width regression word segmentation algorithm. The realization method comprises the following steps:

let L (i, j) be the dot matrix of a line of text images, W_MIs the maximum width of the text, W_MObtained using projection, is the maximum line height of all lines. The regression range is denoted by d, and the starting position of the j-th character is j_AAs shown in fig. 9.

① at j_A≤j≤j_A+W_MCalculate the first

Is set as j_BCutting out j_ATo j_BIn between, and N represents the row height of each row.

② if j_B-j_A<(constant number), then judge as interference, ignore, otherwise go to ④.

③ at j_A+W_M-d≤j≤j_A+W_MWithin the range of

Point j of minimum value_B。

④ from j_BMaking a jump line as the boundary of characters, the width of the jth character is (j)_B-j_A)。

⑤ from j_BStart of calculation

When the value is not 0 (set to j)_A) And j is_A>j_BWhen j is_AI.e. the left boundary of j +1 words, the above steps are repeated.

The regression type character segmentation algorithm is relatively simple and easy to understand, has good applicability, is particularly good for segmenting Chinese characters, and is effective for the left, middle and right fracture conditions in the Chinese characters. The experimental results corresponding to fig. 8 are shown in fig. 10.

In an embodiment of the present invention, in the method, recognizing the extracted text to obtain the text recognition result corresponding to each image further includes: OCR character recognition is performed for each character. This step can use well-established OCR text recognition techniques and will not be described in detail herein.

Fig. 2 is a schematic structural diagram of an apparatus for recognizing characters from an image according to an embodiment of the present invention. As shown in fig. 2, the apparatus 200 for recognizing characters from an image includes:

an image set obtaining unit 210 adapted to obtain an image set to be identified. This is to determine the images to be recognized, at least one image to be recognized in the set being recognizable. Wherein the image may be a poster with text, a vertical drawing, a cartoon, etc. In the existing mode, information such as characters in the image is often recognized in a manual mode, the cost of labor, business, operation and the like is increased along with the increase of the information in the image, and the recognition accuracy is also closely related to the careful degree of corresponding workers.

The text region extracting unit 220 is adapted to extract text regions from the images in the image set respectively. The text area here is an image area that is considered to contain characters, which may be chinese characters, latin letters, or the like.

The text extraction unit 230 is adapted to extract text from each text region.

The character recognition unit 240 is adapted to recognize the extracted characters to obtain character recognition results corresponding to the images.

Specifically, the following design ideas are provided: the method comprises the steps of determining a region which possibly contains characters from an image as a text region, extracting a part which is considered as the characters from the text region, and finally identifying the characters. Therefore, the steps are progressively carried out layer by layer, the identification is split into a plurality of steps, the identification accuracy of the step can be provided in a certain mode in each step, and finally the overall identification rate is also obviously improved.

As can be seen, in the apparatus shown in fig. 2, through the mutual cooperation of the units, after the image set to be recognized is obtained, the text regions are extracted from the images in the image set, the characters are extracted from the text regions, and the extracted characters are recognized, so as to obtain the character recognition results corresponding to the images. The technical scheme has the advantages that the text area is extracted from the image instead of being integrally recognized, so that the recognition accuracy can be greatly improved, the interference of similar character lines in the image is avoided, the recognition effect in the image with better fusion of characters such as posters and the like and the background is excellent, various service scenes can be compatible, the image data production, verification and operation cost is reduced, the content clues contained in the image can be automatically explored, and important help is provided for data mining based on the image.

In an embodiment of the present invention, in the above apparatus, the image collection obtaining unit 210 is adapted to obtain the poster in the resource description information to place in the image collection when the resource description information in the multimedia repository is updated.

In an embodiment of the present invention, the apparatus further includes: the verification unit is suitable for acquiring the character description information in the resource description information and calculating the matching degree of the character description information and the character recognition result; if the matching degree reaches a preset threshold value, judging that the poster passes verification; and if the matching degree does not reach the preset threshold value, putting the poster into the set to be verified.

In an embodiment of the present invention, in the above apparatus, the text region extracting unit 220 is adapted to extract candidate text regions from each image; and inputting the candidate text region into a preset model for verification, and determining the text region according to a verification result.

In an embodiment of the present invention, in the above apparatus, the text region extracting unit 220 is adapted to extract the candidate text regions from the respective images in one or more of the following manners: extracting candidate text regions from each image based on edge detection; extracting candidate text regions from each image respectively based on the texture features; extracting candidate text regions from each image respectively based on the color features; candidate text regions will be extracted from each image separately based on the connected components.

In an embodiment of the present invention, in the above apparatus, the text region extraction unit 220 is adapted to determine the preliminary candidate text region based on the edge information, and determine the candidate text region based on the connected component and the preliminary candidate text region.

In an embodiment of the present invention, in the above apparatus, the text region extracting unit 220 is adapted to perform smoothing processing on the image according to a median filter; carrying out edge detection on the smoothed image according to an edge detection operator to obtain an edge image; and removing non-character edges according to an edge filter to obtain a preliminary candidate text region.

In an embodiment of the present invention, in the above apparatus, the text region extracting unit 220 is adapted to detect an edge based on a gaussian function and a binarization method to obtain a preliminary candidate text region; carrying out color modeling on the preliminary candidate text region according to the Gaussian mixture model to determine background information; identifying missing characters from the image according to the background information; and determining a candidate text region according to the identification result of the missed characters and the preliminary candidate text region.

In an embodiment of the present invention, in the above apparatus, the text region extracting unit 220 is adapted to extract color edges of the image according to a Sobel edge detection operator; carrying out binarization processing on the edge image in an entropy threshold mode, and obtaining an image block according to mathematical morphology closing operation and opening operation; determining a preliminary candidate text region according to one or more of the height, the aspect ratio and the density of edge points in the image block; and performing wavelet decomposition on the preliminary candidate text region, extracting features according to wavelet coefficients, and determining the candidate text region from the image.

In an embodiment of the present invention, in the above apparatus, the text region extracting unit 220 is adapted to slide in the image in a window with a preset size, extract one or more of an average value, a second-order central moment and a third-order central moment in the window as a feature, classify the region in the window according to a neural network based on the feature, and obtain the candidate text region according to the classification result.

In one embodiment of the present invention, in the above apparatus, the text region extraction unit 220 is adapted to perform wavelet transform on the image, extract variance as a feature using histograms in several high-frequency sub-images, and determine candidate text regions according to a K-means algorithm.

In an embodiment of the present invention, in the above apparatus, the text region extracting unit 220 is adapted to perform texture segmentation on the image according to a gaussian filter, and then determine the candidate text regions according to a bottom-up connected component.

In an embodiment of the present invention, in the above apparatus, the text region extracting unit 220 is adapted to perform three-mean clustering on the image according to the euclidean distance and the cosine similarity, process each obtained sub-control according to a log-Gabor filter, and determine the candidate text region according to the filtering result.

In an embodiment of the present invention, in the above apparatus, the text region extracting unit 220 is adapted to perform color clustering on the image according to a histogram of three color components of RGB, decompose the image into a plurality of binary images according to each color obtained by clustering, and determine the candidate text regions based on the connected component.

In an embodiment of the present invention, in the above apparatus, the text region extracting unit 220 is adapted to perform local color quantization on the image; and determining candidate text regions according to the size, the aspect ratio and the proportion of the character color in the minimum envelope rectangle of the connected component.

In an embodiment of the present invention, in the above apparatus, the text region extracting unit 220 is adapted to segment the image based on a nonlinear Niblack binarization algorithm, and then perform connected domain calibration to extract features of each connected domain; and constructing a cascade classifier according to AdaBoost, screening the connected domain, and obtaining a candidate text region according to a screening result.

In an embodiment of the present invention, in the above apparatus, the text region extracting unit 220 is adapted to generate gradient information of the image, and determine the candidate text regions according to a mathematical morphology operation after binarizing the gradient image.

In an embodiment of the present invention, in the above apparatus, the text region extracting unit 220 is adapted to perform a first connected component calibration according to a color difference between adjacent pixels, and perform an iterative determination until a candidate text region is obtained according to whether the colors of the pixels at the boundary can be merged.

In an embodiment of the present invention, in the above apparatus, the preset model is a support vector machine SVM model including one or more of the following features: the character row projection characteristic, the character column projection characteristic, the histogram crossing characteristic, the shape matching characteristic, the co-occurrence matrix characteristic, the edge density characteristic and the direction consistency characteristic.

In an embodiment of the present invention, in the apparatus, the text extraction unit 230 is adapted to perform text segmentation and extraction on each text region based on the OSTU algorithm.

In an embodiment of the present invention, in the above apparatus, the text recognition unit 240 is adapted to obtain height information of characters in the extracted text by projection, and then segment the extracted text according to the regression word to obtain a plurality of characters included in the text.

In an embodiment of the present invention, in the above apparatus, the character recognition unit 240 is adapted to perform OCR character recognition on each character.

It should be noted that, for the specific implementation of each apparatus embodiment, reference may be made to the specific implementation of the corresponding method embodiment, which is not described herein again.

In summary, according to the technical solution of the present invention, after the image set to be identified is obtained, the text regions are extracted from each image in the image set, the characters are extracted from each text region, and then the extracted characters are identified, so as to obtain the character identification result corresponding to each image. The technical scheme has the advantages that the text area is extracted from the image instead of being integrally recognized, so that the recognition accuracy can be greatly improved, the interference of similar character lines in the image is avoided, the recognition effect in the image with better fusion of characters such as posters and the like and the background is excellent, various service scenes can be compatible, the image data production, verification and operation cost is reduced, the content clues contained in the image can be automatically explored, and important help is provided for data mining based on the image.

It should be noted that:

the algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose devices may be used with the teachings herein. The required structure for constructing such a device will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.

In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components of an apparatus for recognizing text from images according to embodiments of the present invention. The present invention may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

For example, fig. 3 shows a schematic structural diagram of an electronic device according to an embodiment of the invention. The electronic device comprises a processor 310 and a memory 320 arranged to store computer executable instructions (computer readable program code). The memory 320 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. The memory 320 has a storage space 330 storing computer readable program code 331 for performing any of the method steps described above. For example, the storage space 330 for storing the computer readable program code may comprise respective computer readable program codes 331 for respectively implementing various steps in the above method. The computer readable program code 331 may be read from or written to one or more computer program products. These computer program products comprise a program code carrier such as a hard disk, a Compact Disc (CD), a memory card or a floppy disk. Such a computer program product is typically a computer readable storage medium such as described in fig. 4. Fig. 4 shows a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present invention. The computer readable storage medium 400 has stored thereon a computer readable program code 331 for performing the steps of the method according to the invention, readable by a processor 310 of the electronic device 300, which computer readable program code 331, when executed by the electronic device 300, causes the electronic device 300 to perform the steps of the method described above, in particular the computer readable program code 331 stored on the computer readable storage medium may perform the method shown in any of the embodiments described above. The computer readable program code 331 may be compressed in a suitable form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

The embodiment of the invention discloses A1, a method for recognizing characters from images, which comprises the following steps:

acquiring an image set to be identified;

extracting text regions from each image in the image set respectively;

extracting characters from each text area;

A2, the method as in A1, wherein the acquiring the set of images to be recognized comprises:

A3, the method of a2, wherein the method further comprises:

A4, the method of A1, wherein the extracting text regions from each image in the set of images respectively comprises:

extracting candidate text regions from each image respectively;

A5, the method of a4, wherein the extracting candidate text regions from respective images comprises one or more of:

extracting candidate text regions from each image based on edge detection;

A6, the method of a5, wherein the extracting candidate text regions from respective images based on edge detection comprises:

A7, the method of A6, wherein the determining preliminary candidate text regions from edge information comprises:

smoothing the image according to the median filter;

A8, the method of a5, wherein the extracting candidate text regions from respective images based on edge detection comprises:

A9, the method of a5, wherein the extracting candidate text regions from respective images based on edge detection comprises:

A10, the method of a5, wherein the extracting candidate text regions from respective images based on texture features comprises:

A11, the method of a5, wherein the extracting candidate text regions from respective images based on texture features comprises:

A12, the method of a5, wherein the extracting candidate text regions from respective images based on texture features comprises:

A13, the method of a5, wherein the extracting candidate text regions from each image based on color features respectively comprises:

A14, the method of a5, wherein the extracting candidate text regions from each image based on color features respectively comprises:

A15, the method of a5, wherein the extracting candidate text regions from each image based on color features respectively comprises:

carrying out local color quantization on the image;

A16, the method of a5, wherein the extracting candidate text regions from each image based on connected components comprises:

A17, the method of a5, wherein the extracting candidate text regions from each image based on connected components comprises:

A18, the method of a5, wherein the extracting candidate text regions from each image based on connected components comprises:

A19, the method as recited in a4, wherein the preset model is a support vector machine SVM model including one or more of the following features:

A20, the method as in a1, wherein the extracting the text from each text region comprises:

A21 the method of a1, wherein the recognizing the extracted characters to obtain the character recognition result corresponding to each image includes:

A22 the method of a21, wherein the recognizing the extracted characters to obtain the character recognition result corresponding to each image further comprises:

OCR character recognition is performed for each character.

The embodiment of the invention also discloses B23, a device for recognizing characters from images, which comprises:

a text extraction unit adapted to extract text from each text region;

B24, the device of B23, wherein,

the image set obtaining unit is suitable for obtaining posters in the resource description information to be put in the image set when the resource description information in the multimedia resource library is updated.

B25, the apparatus of B24, wherein the apparatus further comprises:

B26, the device of B23, wherein,

the text region extraction unit is suitable for extracting candidate text regions from the images respectively; and inputting the candidate text region into a preset model for verification, and determining the text region according to a verification result.

B27, the apparatus according to B26, wherein the text region extracting unit is adapted to extract candidate text regions from each image separately in one or more of: extracting candidate text regions from each image based on edge detection; extracting candidate text regions from each image respectively based on the texture features; extracting candidate text regions from each image respectively based on the color features; candidate text regions will be extracted from each image separately based on the connected components.

B28, the device of B27, wherein,

the text region extraction unit is adapted to determine a preliminary candidate text region from the edge information and to determine a candidate text region from the connected component and the preliminary candidate text region.

B29, the device of B28, wherein,

the text region extraction unit is suitable for carrying out smoothing processing on the image according to a median filter; carrying out edge detection on the smoothed image according to an edge detection operator to obtain an edge image; and removing non-character edges according to an edge filter to obtain a preliminary candidate text region.

B30, the device of B27, wherein,

the text region extraction unit is suitable for detecting edges based on a Gaussian function and a binarization mode to obtain a preliminary candidate text region; carrying out color modeling on the preliminary candidate text region according to a Gaussian mixture model to determine background information; identifying missing characters from the image according to the background information; and determining a candidate text region according to the identification result of the missed characters and the preliminary candidate text region.

B31, the device of B27, wherein,

the text region extraction unit is suitable for extracting the color edge of the image according to a Sobel edge detection operator; carrying out binarization processing on the edge image in an entropy threshold mode, and obtaining an image block according to mathematical morphology closing operation and opening operation; determining a preliminary candidate text region according to one or more of the height, the aspect ratio and the density of edge points in the image block; and performing wavelet decomposition on the preliminary candidate text region, extracting features according to wavelet coefficients, and determining the candidate text region from the image.

B32, the device of B27, wherein,

the text region extraction unit is suitable for sliding a window with a preset size in the image, extracting one or more of an average value, a second-order central moment and a third-order central moment in the window as features, classifying regions in the window according to a neural network based on the features, and obtaining candidate text regions according to a classification result.

B33, the device of B27, wherein,

the text region extraction unit is suitable for performing wavelet transformation on the image, extracting variance from the histograms in a plurality of high-frequency sub-images as features, and determining candidate text regions according to a K-means algorithm.

B34, the device of B27, wherein,

the text region extraction unit is suitable for performing texture segmentation on the image according to a Gaussian filter and determining candidate text regions according to a bottom-up connected domain.

B35, the device of B27, wherein,

the text region extraction unit is suitable for performing three-mean clustering on the image according to Euclidean distance and cosine similarity, processing each obtained sub-control according to a log-Gabor filter, and determining candidate text regions according to a filtering result.

B36, the device of B27, wherein,

the text region extraction unit is suitable for carrying out color clustering on the images according to histograms of RGB three color components, decomposing the images into a plurality of binary images according to each color obtained by clustering, and determining candidate text regions based on the connected domain.

B37, the device of B27, wherein,

the text region extraction unit is suitable for carrying out local color quantization on the image; and determining candidate text regions according to the size, the aspect ratio and the proportion of the character color in the minimum envelope rectangle of the connected component.

B38, the device of B27, wherein,

the text region extraction unit is suitable for segmenting the image based on a nonlinear Niblack binarization algorithm, then calibrating connected domains and extracting the characteristics of each connected domain; and constructing a cascade classifier according to AdaBoost, screening the connected domain, and obtaining a candidate text region according to a screening result.

B39, the device of B27, wherein,

the text region extraction unit is suitable for generating gradient information of the image, and determining candidate text regions according to mathematical morphology operation after the gradient image is binarized.

B40, the device of B27, wherein,

and the text region extraction unit is suitable for carrying out primary connected domain calibration according to the color difference value of adjacent pixels, and carrying out iterative judgment until the pixels cannot be combined according to whether the colors of the pixels at the boundary can be combined or not so as to obtain a candidate text region.

B41, the device of B26, wherein,

the preset model is a Support Vector Machine (SVM) model containing one or more of the following characteristics: the character row projection characteristic, the character column projection characteristic, the histogram crossing characteristic, the shape matching characteristic, the co-occurrence matrix characteristic, the edge density characteristic and the direction consistency characteristic.

B42, the device of B23, wherein,

and the character extraction unit is suitable for carrying out character segmentation and extraction on each text region based on an OSTU algorithm.

B43, the device of B23, wherein,

the character recognition unit is suitable for acquiring height information of characters in the extracted characters by projection, and segmenting according to the regression characters to obtain a plurality of characters contained in the characters.

B44, the device of B43, wherein,

and the character recognition unit is suitable for performing OCR character recognition on each character respectively.

The embodiment of the invention also discloses C45 and electronic equipment, wherein the electronic equipment comprises: a processor; and a memory arranged to store computer executable instructions that, when executed, cause the processor to perform the method of any one of a1-a 22.

Embodiments of the invention also disclose D46, a computer readable storage medium, wherein the computer readable storage medium stores one or more programs which, when executed by a processor, implement the method as any one of a1-a 22.

Claims

1. A method of recognizing text from an image, comprising:

acquiring an image set to be identified;

extracting text regions from each image in the image set respectively;

extracting characters from each text area;

2. The method of claim 1, wherein the acquiring a set of images to be identified comprises:

3. The method of claim 2, wherein the method further comprises:

4. The method of claim 1, wherein the extracting text regions from each image in the set of images comprises:

extracting candidate text regions from each image respectively;

5. An apparatus for recognizing text from an image, comprising:

a text extraction unit adapted to extract text from each text region;

6. The apparatus of claim 5, wherein,

7. The apparatus of claim 6, wherein the apparatus further comprises:

8. The apparatus of claim 5, wherein,

9. An electronic device, wherein the electronic device comprises: a processor; and a memory arranged to store computer-executable instructions that, when executed, cause the processor to perform the method of any one of claims 1-4.

10. A computer readable storage medium, wherein the computer readable storage medium stores one or more programs which, when executed by a processor, implement the method of any of claims 1-4.