CN113139537A

CN113139537A - Image processing method, electronic circuit, visual impairment assisting apparatus, and medium

Info

Publication number: CN113139537A
Application number: CN202110523036.8A
Authority: CN
Inventors: 高增辉; 喻以明; 高敬乾; 王欢; 周骥; 冯歆鹏
Original assignee: NextVPU Shanghai Co Ltd
Current assignee: NextVPU Shanghai Co Ltd
Priority date: 2021-05-13
Filing date: 2021-05-13
Publication date: 2021-07-20
Also published as: WO2022237893A1

Abstract

Provided is an image processing method including: performing text line detection on an input image to obtain a text line image comprising the curved text lines; determining a plurality of reference points in the text line image for the curved text line; determining a text line curve for the curved text line based on the plurality of reference points; adjusting the curved text line with an adjustment parameter determined based on the text line curve to obtain an identified text line corresponding to the curved text line, wherein the identified text line comprises a plurality of characters displayed horizontally. By using the method provided by the embodiment of the disclosure, curve fitting can be conveniently and accurately performed on the curved text line. By performing the segmentation process on the curved line of text, the curved line of text including a plurality of characters can be adjusted into a horizontally displayed line of text that is easier to recognize words.

Description

Image processing method, electronic circuit, visual impairment assisting apparatus, and medium

Technical Field

The present disclosure relates to the field of image processing, and in particular, to an image processing method, an electronic circuit, a visual impairment accessory device, an electronic device, a storage medium, and a program product.

Background

Characters existing in an image can be realized in an image processing mode, and the character recognition function has wide application in various fields.

The approaches described in this section are not necessarily approaches that have been previously conceived or pursued. Unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section. Similarly, unless otherwise indicated, the problems mentioned in this section should not be considered as having been acknowledged in any prior art.

Disclosure of Invention

According to an aspect of the present disclosure, there is provided an image processing method including: performing text line detection on an input image to obtain a text line image comprising the curved text lines; determining a plurality of reference points in the text line image for the curved text line; determining a text line curve for the curved text line based on the plurality of reference points; adjusting the curved text line with an adjustment parameter determined based on the text line curve to obtain an identified text line corresponding to the curved text line, wherein the identified text line comprises a plurality of characters displayed horizontally.

According to another aspect of the present disclosure, there is provided an electronic circuit comprising: circuitry configured to perform the steps of the above-described method.

According to another aspect of the present disclosure, there is also provided a vision-impairment assisting apparatus including: a camera configured to acquire an image, wherein the image includes a curved line of text therein; a curved text line correction circuit implemented by an electronic circuit as previously described; circuitry configured to perform text detection and/or recognition on the recognized text line resulting from the warped text line correction circuitry to obtain text data; circuitry configured to convert the textual data to sound data; and a circuit configured to output the sound data.

According to another aspect of the present disclosure, there is also provided an electronic device including: a processor; and a memory storing a program comprising instructions which, when executed by the processor, cause the processor to perform the method described above.

According to another aspect of the present disclosure, there is also provided a non-transitory computer readable storage medium storing a program, the program comprising instructions which, when executed by a processor of an electronic device, cause the electronic device to perform the above-described method.

According to another aspect of the present disclosure, there is also provided a computer program product comprising a computer program, wherein the computer program realizes the above-mentioned method when executed by a processor.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the embodiments and, together with the description, serve to explain the exemplary implementations of the embodiments. The illustrated embodiments are for purposes of illustration only and do not limit the scope of the claims. Throughout the drawings, identical reference numbers designate similar, but not necessarily identical, elements.

FIG. 1 illustrates a schematic diagram of an exemplary system in which various methods and apparatus described herein may be implemented in accordance with embodiments of the present disclosure;

FIG. 2 shows an exemplary flow diagram of an image processing method according to an embodiment of the present disclosure;

FIG. 3A illustrates one example of a text line image including curved text lines;

fig. 3B shows an example of a plurality of character detection boxes in a text line image obtained by character target detection;

an example of a text line region in a text line image obtained by image segmentation is shown in fig. 3C;

FIG. 3D illustrates an example of determining a reference point based on the height of a text line region and a predetermined step size;

FIG. 3E shows an example of a text line curve obtained by a method using B-spline interpolation;

FIG. 4 illustrates an exemplary flow diagram of a method of adjusting a curved line of text in accordance with an embodiment of the present disclosure;

FIG. 5 illustrates an exemplary flow diagram of a method of determining a plurality of text sub-regions in a curved line of text in accordance with an embodiment of the present disclosure;

FIG. 6A illustrates an example of a slope at a location determined on a text line curve corresponding to at least one point; (ii) a

An example of a plurality of text sub-regions divided based on the slope at the positions on the text line curve corresponding to the respective reference points is shown in fig. 6B;

FIG. 7 illustrates an exemplary flow diagram of a method of adjusting a curved line of text in accordance with an embodiment of the present disclosure;

FIG. 8 illustrates an example of a line of recognized text resulting from the concatenation of multiple adjusted sub-regions of text resulting from the method described in FIG. 7;

FIG. 9 illustrates another exemplary flow diagram of a method of adjusting a curved line of text in accordance with an embodiment of the disclosure;

10A-10C illustrate an example of determining an identified line of text according to the method described in FIG. 9;

FIG. 11 shows an exemplary flow diagram of a text recognition process according to an embodiment of the present disclosure;

fig. 12 shows an exemplary block diagram of an image processing apparatus according to an embodiment of the present disclosure; and

fig. 13 is a block diagram illustrating an example of an electronic device according to an exemplary embodiment of the present disclosure.

Detailed Description

In the present disclosure, unless otherwise specified, the use of the terms "first", "second", etc. to describe various elements is not intended to limit the positional relationship, the timing relationship, or the importance relationship of the elements, and such terms are used only to distinguish one element from another. In some examples, a first element and a second element may refer to the same instance of the element, and in some cases, based on the context, they may also refer to different instances.

The terminology used in the description of the various described examples in this disclosure is for the purpose of describing particular examples only and is not intended to be limiting. Unless the context clearly indicates otherwise, if the number of elements is not specifically limited, the elements may be one or more. Furthermore, the term "and/or" as used in this disclosure is intended to encompass any and all possible combinations of the listed items.

In the process of recognizing the character information included in the image, if there is a bend/deformation in the surface on which the character information is located, the character information to be recognized in the image has a curved shape. Identification of textual information for warping is a challenge. Since the display of text in curved lines of text is irregular, training the text recognition model directly to recognize text data in curved lines of text would make the text recognition model rather complex and less accurate than for horizontally displayed lines of text.

The principle of the present disclosure is described below in an example in which characters in text lines are arranged in order in a horizontal direction. The "horizontal direction" refers to a direction that coincides with the character arrangement direction of the text line. The "vertical direction" refers to a direction perpendicular to the character arrangement direction of the text line. "curved lines of text" means that the lines of characters of the line of text are not on a horizontal line, e.g., the individual characters deviate from the same horizontal line by a distance exceeding a predetermined distance threshold. With the method provided by the present disclosure, a curved line of text may be straightened out for a plurality of characters displayed horizontally.

It is understood that "horizontal" and "vertical" may also be interchanged to correct for skew of a longitudinally arranged text column without departing from the principles of the present disclosure.

In the horizontally arranged text, the term "horizontally displayed" means that the characters are substantially on the same horizontal line, that is, the characters are horizontally displayed at a distance not exceeding a predetermined distance threshold from the same horizontal line. In the context of a portrait arrangement, "horizontally displayed" means that the individual characters lie substantially on the same vertical line, that is, the individual characters displayed horizontally deviate from the same vertical line by no more than a predetermined distance threshold.

To accurately and efficiently warp text in a text line image, the present disclosure provides a new image processing method. The principles of the present disclosure will be described hereinafter with reference to the drawings.

Fig. 1 illustrates a schematic diagram of an exemplary system 100 in which various methods and apparatus described herein may be implemented in accordance with embodiments of the present disclosure. Referring to fig. 1, the system 100 includes one or more terminal devices 101, a server 120, and one or more communication networks 110 coupling the one or more terminal devices 101 to the server 120. Terminal device 101 may be configured to execute one or more applications.

In an embodiment of the present disclosure, the server 120 may run one or more services or software applications that enable the execution of the method for image processing according to the present disclosure. In some embodiments, terminal device 101 may also be used to run one or more services or software applications for the method for image processing according to the present disclosure. In some implementations, terminal device 101 may be implemented as a visual impairment accessory.

In some embodiments, the server 120 may also provide other services or software applications that may include non-virtual environments and virtual environments. In some embodiments, these services may be provided as web-based services or cloud services, for example, provided to users of the terminal devices 101 under a software as a service (SaaS) model.

In the configuration shown in fig. 1, server 120 may include one or more components that implement the functions performed by server 120. These components may include software components, hardware components, or a combination thereof, which may be executed by one or more processors. A user operating terminal device 101 may, in turn, utilize one or more terminal applications to interact with server 120 to take advantage of the services provided by these components. It should be understood that a variety of different system configurations are possible, which may differ from system 100. Accordingly, fig. 1 is one example of a system for implementing the various methods described herein and is not intended to be limiting.

Terminal device 101 may provide an interface that enables a user of the terminal device to interact with the terminal device. The terminal device may also output information to the user via the interface. Although fig. 1 depicts only one terminal device, those skilled in the art will appreciate that any number of terminal devices may be supported by the present disclosure.

The terminal device 101 may include various types of computer devices, such as portable handheld devices, general purpose computers (such as personal computers and laptop computers), workstation computers, wearable devices, gaming systems, thin clients, various messaging devices, sensors or other sensing devices, and so forth. These computer devices may run various types and versions of software applications and operating systems, such as Microsoft Windows, Apple iOS, UNIX-like operating systems, Linux, or Linux-like operating systems (e.g., Google Chrome OS); or include various Mobile operating systems, such as Microsoft Windows Mobile OS, iOS, Windows Phone, Android. Portable handheld devices may include cellular telephones, smart phones, tablets, Personal Digital Assistants (PDAs), and the like. Wearable devices may include head mounted displays and other devices. The gaming system may include a variety of handheld gaming devices, internet-enabled gaming devices, and the like. The terminal device is capable of executing a variety of different applications, such as various Internet-related applications, communication applications (e.g., email applications), Short Message Service (SMS) applications, and may use a variety of communication protocols.

Network 110 may be any type of network known to those skilled in the art that may support data communications using any of a variety of available protocols, including but not limited to TCP/IP, SNA, IPX, etc. By way of example only, one or more networks 110 may be a Local Area Network (LAN), an ethernet-based network, a token ring, a Wide Area Network (WAN), the internet, a virtual network, a Virtual Private Network (VPN), an intranet, an extranet, a Public Switched Telephone Network (PSTN), an infrared network, a wireless network (e.g., bluetooth, WIFI), and/or any combination of these and/or other networks.

The server 120 may include one or more general purpose computers, special purpose server computers (e.g., PC (personal computer) servers, UNIX servers, mid-end servers), blade servers, mainframe computers, server clusters, or any other suitable arrangement and/or combination. The server 120 may include one or more virtual machines running a virtual operating system, or other computing architecture involving virtualization (e.g., one or more flexible pools of logical storage that may be virtualized to maintain virtual storage for the server). In various embodiments, the server 120 may run one or more services or software applications that provide the functionality described below.

The computing units in server 120 may run one or more operating systems including any of the operating systems described above, as well as any commercially available server operating systems. The server 120 may also run any of a variety of additional server applications and/or middle tier applications, including HTTP servers, FTP servers, CGI servers, JAVA servers, database servers, and the like.

In some embodiments, the server 120 may include one or more applications to analyze and consolidate data feeds and/or event updates received from a user of the terminal device 101. The server 120 may also include one or more applications to display data feeds and/or real-time events via one or more display devices of the terminal device 101.

In some embodiments, the server 120 may be a server of a distributed system, or a server incorporating a blockchain. The server 120 may also be a cloud server, or a smart cloud computing server or a smart cloud host with artificial intelligence technology. The cloud Server is a host product in a cloud computing service system, and is used for solving the defects of high management difficulty and weak service expansibility in the traditional physical host and Virtual Private Server (VPS) service.

The system 100 may also include one or more databases 130. In some embodiments, these databases may be used to store data and other information. For example, one or more of the databases 130 may be used to store information such as audio files and video files. The database 130 may reside in various locations. For example, the data store used by the server 120 may be local to the server 120, or may be remote from the server 120 and may communicate with the server 120 via a network-based or dedicated connection. The database 130 may be of different types. In certain embodiments, the database 130 used by the server 120 may be a relational database. One or more of these databases may store, update, and retrieve data to and from the database in response to the command.

In some embodiments, one or more of the databases 130 may also be used by applications to store application data. The databases used by the application may be different types of databases, such as key-value stores, object stores, or regular stores supported by a file system.

The system 100 of fig. 1 may be configured and operated in various ways to enable application of the various methods and apparatus described in accordance with the present disclosure.

Fig. 2 illustrates an exemplary flow diagram of an image processing method 200 according to an embodiment of the disclosure. The method illustrated in fig. 2 may be performed by the terminal device 101 or the server 120 illustrated in fig. 1. The image containing the curved line of text may be processed using the method 200 shown in fig. 2 to correct the characters in the curved line of text in the image to a horizontal display for further text recognition processing.

In step 202, text line detection may be performed on the input image to obtain a text line image comprising curved text lines.

In some embodiments, the input image may be acquired by an image acquisition unit (e.g., a camera) mounted on the terminal device. In other embodiments, the pre-acquired image may be read from memory as an input image. Wherein the input image may comprise one or more lines of text and the one or more lines of text in the input image are curved lines of text.

The input image may be processed by a pre-derived image processing model for detecting text lines in the image to obtain a text line image comprising a single curved text line. In some embodiments, the input image may be processed by using a pre-trained neural network-based text line detection model to obtain a sub-image including a text line in the input image as a text line image.

FIG. 3A illustrates one example of a text line image including curved text lines. As shown in fig. 3A, by performing text line detection on the input image, a text line image including and including only one text line can be detected from the input image. The text lines included in the example shown in fig. 3A are curved text lines, i.e., the plurality of characters included in the text lines are not displayed horizontally.

In step S204, a plurality of reference points for bending text lines in the text line image may be determined.

The position of each reference point may be the same as the position of at least one character included in the curved text line, or may be a simulated position of a character included in the curved text line, and does not necessarily correspond to the position of the real character. The characters in the text line can be any units in the text line, such as English words, English letters, Chinese characters, punctuation marks, and the like. The form of text in the text line is not limited herein.

In some embodiments, character target detection may be performed on the text line image to derive a plurality of reference points for use in warping the text line. Wherein the respective reference points indicate positions of respective characters included in the curved text line.

In some implementations, the text line image can be processed using a pre-trained neural network-based target detection model for recognizing characters to obtain a character detection box for each character included in the curved text line.

Fig. 3B shows an example of a plurality of character detection boxes in a text line image obtained by character target detection. Each character detection box 301 may include at least one character therein. As shown in fig. 3B, most of the character detection boxes include only one character. However, some character detection boxes may include a plurality of characters due to some errors in the detection of the target detection model. According to the principles of the present disclosure, the number of characters included in the character detection box obtained by the character target detection is not required as long as the result of the character detection can substantially reflect the tendency of characters in a curved text line.

The positions of the plurality of reference points for use in the curved text line may be determined based on the positions of the plurality of character detection boxes as shown in fig. 3B. For example, a center point of at least one of the recognized character detection boxes may be determined as a reference point, that is, a position of the center point of the at least one character detection box may be determined as a position of the corresponding reference point.

In other embodiments, image segmentation may be performed on the text line image to obtain text line regions in the text line image that correspond to curved text lines. For example, the text line image may be segmented pixel by using a pre-trained neural network-based image segmentation model for text line segmentation to obtain a segmentation result that each pixel in the text line image belongs to a text line or does not belong to the text line, so as to determine a text line region corresponding to a curved text line.

An example of a text line region in a text line image obtained by image segmentation is shown in fig. 3C. The black area represents an image area determined not to include a character in the text line image, and the white area represents a text line area determined to be where a character is located in the text line image.

Using the height of the text line region and the predetermined step size as shown in fig. 3C, the plurality of reference points in the text line region can be determined.

Fig. 3D shows an example of determining the reference point based on the height of the text line region and the predetermined step size. Wherein the predetermined step size may indicate a predetermined character width. It is understood that the value of the predetermined step size can be arbitrarily set by those skilled in the art according to actual situations, and the predetermined step size here may be different from the actual width of the characters in the curved text line. The predetermined step size shown in fig. 3D is less than the true width of the characters in the curved line of text. In other embodiments, the predetermined step size may also be greater than the true width of the characters in the curved line of text.

As shown in fig. 3D, the text line region may be segmented based on a predetermined step size to obtain simulated character boxes 302 for a plurality of simulated character positions. The position of the center point of each analog character box may be determined as the position of the reference point. In some embodiments, the abscissa of the reference point may be an average of the abscissas of the left and right boundaries of the corresponding analog character box, and the ordinate of the reference point may be an average of the ordinates of the points within the area of the analog character box.

In step S206, a text line curve for bending the text line may be determined based on the plurality of reference points determined in step S304. The text line curve may indicate a specific shape of an arrangement of characters in a curved text line. By mathematically analyzing a text line curve representing the shape of a curved text line, adjustment parameters for straightening the curved text line can be obtained.

As previously described, the locations of a plurality of reference points indicating the locations of the curved text lines may be obtained using step S204. By curve fitting the reference points, a text line curve for simulating a curve in which the characters in the curved text line are located can be obtained.

In some embodiments, the positions of the reference points may be curve-fitted by using B-spline interpolation to obtain a mathematical expression of the text line curve. In other embodiments, the positions of the reference points may be curve-fitted using any curve fitting method, such as polynomial fitting.

Fig. 3E shows an example of a text line curve obtained by a method using B-spline interpolation. As can be seen from the example shown in fig. 3E, the text line curve 303 obtained by using the B-spline interpolation method can accurately fit the curve where the characters in the curved text line are located. It is understood that one skilled in the art may fit using any mathematical method that can fit well to curved lines of text without departing from the principles of the present disclosure.

In step S208, the curved text line may be adjusted using the adjustment parameters determined based on the text line curve to obtain the identified text line corresponding to the curved text line. Wherein the recognition text line includes a plurality of characters displayed horizontally.

Because the characters included in the recognized text line are basically displayed on the same horizontal line, the recognized text line can be processed by utilizing a trained character recognition model to obtain character data in the recognized text line. It is understood that the text data in the recognized text line is the same as the text data in the curved text line because the recognized text line is a result of the straightening out of the curved text line.

By using the image processing method provided by the embodiment of the disclosure, the text line curve capable of accurately representing the curved text line can be obtained based on the position of the reference point for the curved text line. The curved text lines may be straightened out by using the adjustment parameters derived from the text line curves. Because the text line curve obtained based on the reference point can accurately represent the position of the curved text line, the method provided by the disclosure can obtain a better straightening effect. In the subsequent character recognition process, the character recognition algorithm can directly perform character recognition on the recognition text line in which characters are displayed on the same horizontal line. For example, an end-to-end seq2seq deep learning model can be adopted to identify a text sequence.

FIG. 4 illustrates an exemplary flow diagram of a method 400 of adjusting a curved line of text in accordance with an embodiment of the present disclosure.

In step S402, a plurality of text sub-regions for bending a text line may be determined.

To achieve straightening out of a curved line of text, the curved line of text may be divided into a plurality of text sub-regions, such that the curved line of text is segmented. For example, the display effect of each text sub-region can be adjusted respectively, so that the characters in each text sub-region are displayed on the same horizontal line.

In some embodiments, each text sub-region of the plurality of text sub-regions may include a single character. In other embodiments, each text sub-region of the plurality of text sub-regions may include at least two characters. The number of characters included in each text sub-region may be the same or different. In still other embodiments, each text sub-region of the plurality of text sub-regions may comprise a width of a single column of pixels. It is to be understood that the above description is intended only as an exemplary illustration of segmenting a curved line of text, and not as a limitation of the present disclosure.

In step S404, for each text sub-region of the plurality of text sub-regions, the text sub-region may be adjusted based on the adjustment parameter for the text sub-region determined using the text line curve.

It is understood that in the curved text line, the positions where characters in different areas are displayed on the image are different, and the respective characters are not displayed on the same horizontal line. Based on the text line curve obtained by the method described with reference to fig. 2, a corresponding adjustment parameter may be determined for each text sub-region, so as to adjust at least one of the display direction and the position of the characters in the text sub-region, so that the characters in each text sub-region are displayed horizontally, thereby achieving the effect of straightening and bending the text line.

In some embodiments, the adjustment parameter for each text sub-region may include an angle of an arrangement direction of characters of the text sub-region to a horizontal direction. Hereinafter, a specific process of the method for adjusting the text sub-region will be described with reference to fig. 7, and will not be described again here.

In step S406, an identified text line corresponding to the curved text line may be determined based on the adjusted text sub-region.

In some embodiments, the adjusted plurality of text sub-regions may be scaled such that the adjusted plurality of text sub-regions have the same height.

Since the height of the adjusted text sub-region depends on the rotation angle during the adjustment, the heights of the adjusted text sub-regions may be different.

In order to enable the adjusted text sub-regions to be spliced into a text line, the adjusted text sub-regions can be scaled in size so as to have the same height.

In some implementations, the adjusted size of the plurality of text sub-regions may be scaled in the height direction only. In other implementations, the adjusted sizes of the plurality of text sub-regions may be scaled equally in the height direction and the length direction so that the adjusted plurality of text sub-regions have the same height. For example, the adjusted plurality of text sub-regions may be scaled based on a predetermined reference height such that the scaled plurality of text sub-regions all have the reference height.

The scaled plurality of text sub-regions may be stitched in a horizontal direction to obtain a line of recognized text, wherein characters in the line of recognized text are displayed horizontally.

By using the method provided by the disclosure, a plurality of characters in the curved text line can be segmented, and the curved text line is straightened based on the adjusting parameters determined by the text line curve. The image processing method provided by the present disclosure can straighten a curved text line of an arbitrary length and obtain a recognized text line having a plurality of characters displayed horizontally.

FIG. 5 illustrates an exemplary flow diagram of a method 500 of determining a plurality of text sub-regions in a curved line of text in accordance with an embodiment of the present disclosure.

In step S502, a slope at a location on the text line curve corresponding to at least one point may be determined.

In step S504, the curved line of text may be divided based on the slope at the location corresponding to the at least one point to obtain a plurality of text sub-regions, wherein adjacent text sub-regions correspond to different slopes.

As previously described, a text line curve fitted with a plurality of reference points can simulate the tendency and location of characters in a curved text line. By determining the slope at the position of at least one point in the text line curve, the trend of the character at that point can be obtained.

If the slopes of the two positions on the text line curve corresponding to the two adjacent reference points are similar, the tendency of the character existing between the two adjacent reference points is similar. Characters with similar trends may be classified within the same text sub-region based on the corresponding slopes.

Fig. 6A illustrates an example of a slope at a location corresponding to at least one reference point determined on a text line curve. In the case where the mathematical expression of the text line curve is obtained based on the foregoing method, a point on the text line curve having the same abscissa as that of each reference point may be determined as a position corresponding to the reference point. The arrows shown in fig. 6A indicate the different slopes of the text line curve at different positions.

In some embodiments, the rate of change of slope between adjacent reference points may be determined based on the slope at each point on the text line curve, and regions between adjacent points where the rate of change of slope is less than the change threshold may be divided within the same text subregion. In this case, it can be considered that the characters in each text sub-region correspond to the same slope, that is, the tendency of the characters in each text sub-region is substantially the same.

An example of a plurality of text sub-regions divided based on the slope at least one point on the text line curve is shown in FIG. 6B. As shown in fig. 6B, the arrangement direction of the characters in each text sub-region is substantially the same as the angle with the horizontal direction. As shown in fig. 6B, the boundaries of the text sub-regions are represented by angular quadrilateral regions, wherein the left boundary and the right boundary of each text sub-region are perpendicular to the horizontal direction, and the included angle between the upper boundary and the lower boundary and the horizontal direction is the same as the included angle between the character trend indicated by the slope corresponding to the text curvy region and the horizontal direction. Further, the height of each text sub-region may be derived based on the height of characters in the curved text line. For example, the height of the text subarea may be determined based on the height of the character detection box obtained by the target detection. As another example, the height of a text sub-region may be determined based on the height of the text line in the result of the text line segmentation.

FIG. 7 illustrates an exemplary flow diagram of a method of adjusting a curved line of text in accordance with an embodiment of the disclosure. The sub-regions of text shown in FIG. 6B may be adjusted using the method 700 shown in FIG. 7.

In step S702, an adjustment parameter for each of a plurality of text sub-regions of a curved line of text may be determined. Wherein the adjusting parameter may include an angle between an arrangement direction of characters included in the text sub-region and a horizontal direction determined based on a slope of a text line curve corresponding to the text sub-region.

In step S704, the text sub-region may be adjusted based on the angle between the arrangement direction of the characters included in the text sub-region and the horizontal direction determined in step S702 so that the characters in the text sub-region are horizontally displayed.

The entire text line image may be reversely rotated based on the angle between the arrangement direction of the characters included in the text sub-region and the horizontal direction determined in step S702, so that the characters in the text sub-region are horizontally displayed, and the four vertex positions of the corresponding text sub-region in the rotated text line image are obtained based on the rotation angle. The minimum bounding rectangle of the four vertices of the rotated text sub-region may be clipped from the rotated text line image to obtain the adjusted text sub-region. Wherein the upper and lower boundaries of the minimum bounding rectangle are parallel to the horizontal direction and the left and right boundaries are parallel to the vertical direction.

FIG. 8 illustrates an example of a line of recognized text resulting from the concatenation of multiple adjusted sub-regions of text resulting from the method described in FIG. 7. And splicing the adjusted plurality of text subregions in the horizontal direction to obtain a recognition text line for text recognition.

FIG. 9 sets forth another exemplary flow chart illustrating a method for adjusting curved lines of text according to embodiments of the present disclosure.

In step S902, for each column of pixels in the text row image, an adjustment parameter for the column of pixels is determined. Wherein, for each column of pixels in the text row image, the adjustment parameter for the column of pixels comprises an offset between a ordinate of a point on the text row curve in the column of pixels and a reference position. In some embodiments, the reference position in the text line image may be predetermined. For example, the position of the horizontal center line of the text line image may be determined as the reference position. For another example, the position of a horizontal line in which any character in the text line image is located may be determined as the reference position. For another example, the value of the average ordinate of the character detection box obtained by character detection in the text line image may be determined as the reference position.

In step S904, the display of each column of pixels in the text line image may be adjusted using the adjustment parameter. For example, the position of the point on the line of pixels on the line of text curves may be adjusted in the vertical direction based on the offset between the ordinate of the point on the line of pixels on the line of text curves and the reference position so that the adjusted vertical position of the point on the line of pixels on the line of text curves coincides with the reference position.

In step S906, the recognized text line may be determined based on the adjusted text line image. For example, the image background of the adjusted text line image may be cropped based on character height to obtain identified text lines.

10A-10C illustrate an example of determining an identified line of text according to the method described in FIG. 9.

As shown in fig. 10A, for a column of pixels 1001, it can be determined that the point on the text line curve within the sub-region of text is located a distance d below the reference line 1002. In this case, a pixel point sequence of height d on the opposite side of the point on the text line curve (for the column pixel 1001, i.e., above the reference line 1002) may be clipped, the portion remaining after clipping in the text sub-region 1001 is moved upward by a distance of d, and the clipped pixel point sequence of height d is filled back under the point on the text line curve, so that the position of the point on the text line curve within the column pixel 1001 is adjusted to coincide with the reference position.

Fig. 10B shows the result obtained by adjusting each column of pixel points in the text line image by the method described in conjunction with fig. 10A. As shown in fig. 10B, all the characters of the curved text line are adjusted to be displayed horizontally. Fig. 10C shows the result of recognizing a text line obtained by clipping the result shown in fig. 10B according to the character height.

With the method for adjusting the text line image provided by the present disclosure, the pixels in the text line image can be reversely filled column by column based on the parameter of the text line curve and the reference position, so that the pixels corresponding to the characters in each column of pixels are substantially displayed at the reference position. The method can conveniently realize the straightening of the curved text line with any length.

Fig. 11 shows an exemplary flow diagram of a text recognition process 1100 according to an embodiment of the disclosure.

In step 1102, text line detection may be performed on the acquired input image to obtain a text line image including a single text line.

In step S1104, a text line in the text line image may be subjected to curvature correction to obtain a recognized text line, wherein the recognized text line includes a plurality of characters displayed horizontally. The foregoing process of the method described in conjunction with fig. 2 to 10C may be utilized to perform curvature correction on the curved text lines in the text line image, which is not described in detail herein.

In step S1106, character recognition may be performed on the recognized text line to obtain character data included in the text line.

The recognized text lines may be processed through a trained text recognition model based on a neural network. Because the characters in the recognized text line are displayed in a horizontal mode, the text recognition model does not need to directly recognize the content in the bent text line, so the complexity of the text recognition model is reduced, and the accuracy of the text recognition is improved.

By using the text recognition method provided by the disclosure, the bent text line is corrected to obtain the recognized text line basically displayed on the same horizontal line, so that the recognition pressure of the text recognition model on the text with larger bending degree/longer text can be relieved, and the text recognition performance is improved.

Fig. 12 shows an exemplary block diagram of an image processing apparatus according to an embodiment of the present disclosure.

As shown in fig. 12, the image processing apparatus 1200 may include a text line detection unit 1210, a reference point determination unit 1220, a curve determination unit 1230, and an identification text determination unit 1240. The text line detection unit 1210 may be configured to perform text line detection on an input image to obtain a text line image including curved text lines. The reference point determining unit 1220 may be configured to determine a plurality of reference points for bending a text line in the text line image. The curve determining unit 1230 may be configured to determine a text line curve for the curved text line based on the plurality of reference points. The recognition text determining unit 1240 may be configured to adjust the curved text line using an adjustment parameter determined based on the text line curve to obtain a recognition text line corresponding to the curved text line, wherein the recognition text line includes a plurality of characters displayed horizontally.

The text line detection unit 1210, the reference point determination unit 1220, the curve determination unit 1230 and the recognized text determination unit 1240 may be used to implement the steps of the image processing method described in conjunction with fig. 2 to 10C, and are not described herein again.

With the image processing apparatus provided by the embodiment of the present disclosure, a text line curve capable of accurately representing a curved text line can be obtained based on the position of the reference point for the curved text line. The curved text lines may be straightened out by using the adjustment parameters derived from the text line curves. Because the text line curve obtained based on the reference point can accurately represent the position of the curved text line, the method provided by the disclosure can obtain a better straightening effect. In the subsequent character recognition process, the character recognition algorithm can directly perform character recognition on the recognition text line in which characters are displayed on the same horizontal line. For example, an end-to-end seq2seq deep learning model can be adopted to identify a text sequence.

Exemplary methods according to the present disclosure have been described above in connection with the accompanying drawings. Exemplary embodiments utilizing the electronic circuit, electronic device, and the like of the present disclosure will be further described with reference to the accompanying drawings.

According to another aspect of the present disclosure, there is provided an electronic circuit comprising: circuitry configured to perform the steps of the methods described in this disclosure.

According to another aspect of the present disclosure, there is provided an electronic device including: a processor; and a memory storing a program comprising instructions that, when executed by the processor, cause the processor to perform the method described in this disclosure.

According to another aspect of the present disclosure, there is provided a computer readable storage medium storing a program, the program comprising instructions which, when executed by a processor of an electronic device, cause the electronic device to perform the method described in the present disclosure.

According to another aspect of the disclosure, a computer program product is provided, comprising a computer program, wherein the computer program realizes the method described in the disclosure when executed by a processor.

Fig. 13 is a block diagram illustrating an example of an electronic device according to an exemplary embodiment of the present disclosure. It is noted that the structure shown in fig. 13 is only one example, and the electronic device of the present disclosure may include only one or more of the constituent parts shown in fig. 13 according to a specific implementation.

The electronic device 1300 may be, for example, a general purpose computer (e.g., various computers such as a laptop computer, a tablet computer, etc.), a mobile phone, a personal digital assistant. According to some embodiments, the electronic device 1300 may be a vision-impaired auxiliary device. The electronic device 1300 may include a camera and electronic circuitry for curved text line correction. Wherein the camera may be configured to acquire an image in which curved lines of text are included, the electronic circuitry may be configured to perform the image processing method for text line correction described in connection with fig. 2-10C.

According to some embodiments, the electronic device 1300 may be configured to include or be removably mountable to a spectacle frame (e.g., a frame of a spectacle frame, a connector connecting two frames, a temple, or any other portion) so as to be able to capture an image that approximately includes a field of view of a user.

The electronic device 1300 may also be mounted to or integrated with other wearable devices, according to some embodiments. The wearable device may be, for example: a head-mounted device (e.g., a helmet or hat, etc.), an ear-wearable device, etc. According to some embodiments, the electronic device may be implemented as an accessory attachable to a wearable device, for example as an accessory attachable to a helmet or cap, or the like.

The electronic device 1300 may also have other forms according to some embodiments. For example, the electronic device 1300 may be a mobile phone, a general purpose computing device (e.g., a laptop computer, a tablet computer, etc.), a personal digital assistant, and so forth. The electronic device 1300 may also have a base so that it can be placed on a table top.

The electronic device 1300 may include a camera 1304 for acquiring images. The video camera 1304 may include, but is not limited to, a webcam or a still camera, etc. Electronic device 1300 may also include curved-text line correction circuitry (electronic circuitry) 1400, which curved-text line correction circuitry (electronic circuitry) 1400 includes circuitry configured to perform the steps of the image method for text line correction as previously described (e.g., the method steps described in conjunction with fig. 2-10C).

Electronic device 1300 may also comprise a text recognition circuit 1305, which text recognition circuit 1305 is configured to perform text detection and/or recognition (e.g., OCR processing) on the corrected text line contained in the image output by the curved-text-line correction circuit, thereby obtaining text data. The text recognition circuit 1305 may be implemented by a dedicated chip, for example. The electronic device 1300 may also include a voice conversion circuit 1306, the voice conversion circuit 1306 configured to convert the textual data to voice data. The sound conversion circuit 1306 can be realized by a dedicated chip, for example. The electronic device 1300 may further include a sound output circuit 1307, the sound output circuit 1307 configured to output the sound data. The sound output circuit 1307 may include, but is not limited to, a headphone, a speaker, a vibrator, or the like, and its corresponding drive circuit.

According to some embodiments, the electronic device 1300 may also include image processing circuitry 1308, which may include circuitry configured to perform various image processing on images 1308. The image processing circuitry 1308 may include, for example, but is not limited to, one or more of the following: circuitry configured to reduce noise in an image, circuitry configured to deblur an image, circuitry configured to geometrically correct an image, circuitry configured to feature extract an image, circuitry configured to detect and/or identify objects in an image, circuitry configured to detect words contained in an image, circuitry configured to extract lines of text from an image, circuitry configured to extract coordinates of words from an image, circuitry configured to extract object boxes from an image, circuitry configured to extract text boxes from an image, circuitry configured to perform layout analysis (e.g., paragraph segmentation) based on an image, and so forth.

According to some embodiments, the electronic device 1300 may further comprise a word processing circuit 1309, which word processing circuit 1309 may be configured to perform various processing based on the extracted information relating to the word (e.g., word data, text box, paragraph coordinates, text line coordinates, word coordinates, etc.) to obtain processing results such as paragraph ordering, word semantic analysis, layout analysis results, and the like.

One or more of the various circuits described above (e.g., word recognition circuit 1305, voice conversion circuit 1306, voice output circuit 1307, image processing circuit 1308, word processing circuit 1309, warped text line correction circuit (electronic circuit) 1400) may be implemented using custom hardware, and/or may be implemented in hardware, software, firmware, middleware, microcode, hardware description languages, or any combination thereof. For example, one or more of the various circuits described above can be implemented by programming hardware (e.g., programmable logic circuits including Field Programmable Gate Arrays (FPGAs) and/or Programmable Logic Arrays (PLAs)) in an assembly language or hardware programming language (such as VERILOG, VHDL, C + +) using logic and algorithms according to the present disclosure.

According to some embodiments, the electronic device 1300 may also include a communication circuit 1310, which communication circuit 1310 may be any type of device or system that enables communication with external devices and/or with a network, and may include, but is not limited to, a modem, a network card, an infrared communication device, a wireless communication device, and/or a chipset, such as a bluetooth device, 1302.11 device, a WiFi device, a WiMax device, a cellular communication device, and/or the like.

According to some embodiments, the electronic device 1300 may also include an input device 1311, which input device 1311 may be any type of device capable of inputting information to the electronic device 1300, and may include, but is not limited to, various sensors, mice, keyboards, touch screens, buttons, joysticks, microphones, and/or remote controls, and the like.

According to some embodiments, electronic device 1300 may also include output device 1312, which output device 1312 may be any type of device capable of presenting information and may include, but is not limited to, a display, a visual output terminal, a vibrator, and/or a printer, among others. Although the electronic device 1300 is used in accordance with some embodiments for vision-impaired aids, a vision-based output device may facilitate a user's family or service personnel, etc. in obtaining output information from the electronic device 1300.

According to some embodiments, the electronic device 1300 may also include a processor 1301. The processor 1301 may be any type of processor and may include, but is not limited to, one or more general purpose processors and/or one or more special purpose processors (e.g., special purpose processing chips). The processor 1301 may be, for example, but not limited to, a central processing unit CPU or a microprocessor MPU, or the like. Electronic device 1300 may also include a working memory 1302, which working memory 1302 may store programs (including instructions) and/or data (e.g., images, text, sound, and other intermediate data, etc.) useful for the operation of the processor 1301, and may include, but is not limited to, a random access memory and/or a read only memory device. Electronic device 1300 may also include storage 1303, which storage 1303 may include any non-transitory storage device, which may be non-transitory and may implement any storage device for data storage, and may include, but is not limited to, a disk drive, an optical storage device, solid-state memory, a floppy disk, a flexible disk, a hard disk, a magnetic tape, or any other magnetic medium, an optical disk or any other optical medium, a ROM (read only memory), a RAM (random access memory), a cache memory, and/or any other memory chip or cartridge, and/or any other medium from which a computer may read data, instructions, and/or code. The work memory 1302 and the storage device 1303 may be collectively referred to as "memory", and may be used in some cases as both.

According to some embodiments, the processor 1301 may control and schedule at least one of the camera 1304, the word recognition circuit 1305, the voice conversion circuit 1306, the voice output circuit 1307, the image processing circuit 1308, the word processing circuit 1309, the communication circuit 1310, the warped text line correction circuit (electronic circuit) 1400, and other various devices and circuits included in the electronic device 1300. According to some embodiments, at least some of the various components described in fig. 13 may be interconnected and/or in communication by a bus 1313.

Software elements (programs) may reside in the working memory 1302 including, but not limited to, an operating system 1302a, one or more application programs 1302b, drivers, and/or other data and code.

According to some embodiments, instructions for performing the aforementioned control and scheduling may be included in operating system 1302a or one or more application programs 1302 b.

According to some embodiments, instructions to perform the method steps described in this disclosure (e.g., the method steps described in conjunction with fig. 2-10C) may be included in one or more application programs 1302b, and the various modules of the electronic device 1300 described above may be implemented by the processor 1301 reading and executing the instructions of the one or more application programs 1302 b. In other words, the electronic device 1300 may comprise a processor 1301 and a memory (e.g. working memory 1302 and/or storage device 1303) storing a program comprising instructions which, when executed by the processor 1301, cause the processor 1301 to perform the method according to the various embodiments of the present disclosure.

According to some embodiments, some or all of the operations performed by at least one of the word recognition circuit 1305, the voice conversion circuit 1306, the image processing circuit 1308, the word processing circuit 1309, and the warped text line correction circuit (electronic circuit) 1400 may be implemented by instructions that are read by the processor 1301 and execute one or more application programs 1302.

Executable code or source code of instructions of the software elements (programs) may be stored in a non-transitory computer-readable storage medium (e.g., the storage device 1303) and, upon execution, may be stored in the working memory 1302 (possibly compiled and/or installed). Accordingly, the present disclosure provides a computer readable storage medium storing a program comprising instructions that, when executed by a processor of an electronic device (e.g., a vision-impaired auxiliary device), cause the electronic device to perform a method as described in various embodiments of the present disclosure. According to another embodiment, the executable code or source code of the instructions of the software elements (programs) may also be downloaded from a remote location.

It will also be appreciated that various modifications may be made in accordance with specific requirements. For example, customized hardware might also be used and/or individual circuits, units, modules, or elements might be implemented in hardware, software, firmware, middleware, microcode, hardware description languages, or any combination thereof. For example, some or all of the circuits, units, modules, or elements encompassed by the disclosed methods and apparatus may be implemented by programming hardware (e.g., programmable logic circuitry including Field Programmable Gate Arrays (FPGAs) and/or Programmable Logic Arrays (PLAs)) in an assembly language or hardware programming language such as VERILOG, VHDL, C + +, using logic and algorithms in accordance with the present disclosure.

According to some embodiments, the processor 1301 in the electronic device 1300 may be distributed over a network. For example, some processes may be performed using one processor while other processes may be performed by another processor that is remote from the one processor. Other modules of the electronic device 1300 may also be similarly distributed. As such, the electronic device 1300 may be interpreted as a distributed computing system that performs processing at multiple locations.

Some exemplary aspects of the disclosure are described below.

Aspect 1 an image processing method, comprising:

performing text line detection on an input image to obtain a text line image comprising the curved text lines;

determining a plurality of reference points in the text line image for the curved text line;

determining a text line curve for the curved text line based on the plurality of reference points;

adjusting the curved text line with an adjustment parameter determined based on the text line curve to obtain an identified text line corresponding to the curved text line, wherein the identified text line comprises a plurality of characters displayed horizontally.

Aspect 2 the image processing method of aspect 1, wherein determining a plurality of reference points in the text line image for the curved text line comprises:

performing character target detection on the text line image to obtain a plurality of reference points for the curved text line, wherein each reference point indicates a position of each character in the curved text line.

Aspect 3 the image processing method of aspect 1, wherein determining a plurality of reference points in the text line image for the curved text line comprises:

performing image segmentation on the text line image to obtain a text line region corresponding to the curved text line in the text line image;

determining a plurality of reference points in the text line region for the curved text line based on a height of the text line region and a predetermined step size.

Aspect 4 the image processing method of any of aspects 1-3, wherein determining a text line curve for the curved text line based on the plurality of reference points comprises:

and performing curve fitting on the positions of the reference points based on a B spline interpolation method to obtain the text line curve.

Aspect 5 the image processing method of aspect 1, wherein adjusting the curved line of text with adjustment parameters determined based on the text line curve to obtain identified lines of text corresponding to the curved line of text comprises:

determining a plurality of text subregions for the curved line of text;

for each text subregion in the plurality of text subregions, adjusting the text subregion based on the adjustment parameter for the text subregion determined by the text line curve;

determining an identified line of text corresponding to the curved line of text based on the adjusted sub-region of text.

Aspect 6 the image processing method of aspect 5, wherein determining a plurality of text subregions in the curved line of text comprises:

determining a slope at a location of at least one point on the text line curve;

dividing the curved line of text based on a slope at a location corresponding to the at least one point to obtain a plurality of text sub-regions, wherein adjacent text sub-regions correspond to different slopes.

Aspect 7. the image processing method according to aspect 6, wherein the adjustment parameter for each of the plurality of text sub-regions includes an angle between an arrangement direction of characters included in the text sub-region and a horizontal direction, which is determined based on a slope corresponding to the text sub-region.

Aspect 8 the image processing method of aspect 7, wherein adjusting the sub-region of text based on the adjustment parameter for the sub-region of text determined using the text line curve comprises:

and adjusting the text subarea based on the angle so that the characters in the adjusted text subarea are displayed horizontally.

Aspect 9 the image processing method of any of aspects 5-8, wherein determining the identified line of text corresponding to the curved line of text based on the adjusted sub-region of text comprises:

scaling the adjusted plurality of text sub-regions so that the adjusted plurality of text sub-regions have the same height;

and splicing the zoomed multiple text subregions in the horizontal direction to obtain the identification text line, wherein characters in the identification text line are displayed horizontally.

The image processing method of aspect 1, wherein adjusting the curved line of text with adjustment parameters determined based on the text line curve to obtain identified lines of text corresponding to the curved line of text comprises:

for each column of pixels in the text line image, determining an adjustment parameter for the column of pixels;

adjusting the display of each column of pixels in the text line image by using the adjusting parameters;

determining the identified text lines based on the adjusted text line image.

Aspect 11 the image processing method of aspect 10, wherein for each column of pixels in the text line image, the adjustment parameter for the column of pixels comprises an offset between a vertical coordinate of a point on the text line curve in the column of pixels and a reference position.

Aspect 12 the image processing method of aspect 11, wherein adjusting the display of the column of pixels using the adjustment parameter comprises:

the position of the point on the text row curve in the column of pixels is adjusted in the vertical direction based on the offset so that the adjusted vertical position of the point on the text row curve in the column of pixels coincides with the reference position.

Aspect 13 is an electronic circuit comprising:

circuitry configured to perform the steps of the method of any of aspects 1-12.

Aspect 14. a vision-impairment assisting apparatus, comprising:

a camera configured to acquire an image, wherein the image includes a curved line of text therein;

a curved text line correction circuit implemented by the electronic circuit of aspect 13;

circuitry configured to perform text detection and/or recognition on the recognized text line resulting from the warped text line correction circuitry to obtain text data;

circuitry configured to convert the textual data to sound data; and

a circuit configured to output the sound data.

An electronic device of aspect 15, comprising:

a processor; and

a memory storing a program comprising instructions that, when executed by the processor, cause the processor to perform the method of any of aspects 1-12.

A non-transitory computer readable storage medium storing a program, the program comprising instructions that, when executed by a processor of an electronic device, cause the electronic device to perform the method of any of aspects 1-12.

Aspect 17 a computer program product comprising a computer program, wherein the computer program realizes the method of any of aspects 1-12 when executed by a processor.

Although embodiments or examples of the present disclosure have been described with reference to the accompanying drawings, it is to be understood that the above-described methods, systems and apparatus are merely exemplary embodiments or examples and that the scope of the present invention is not limited by these embodiments or examples, but only by the claims as issued and their equivalents. Various elements in the embodiments or examples may be omitted or may be replaced with equivalents thereof. Further, the steps may be performed in an order different from that described in the present disclosure. Further, various elements in the embodiments or examples may be combined in various ways. It is important that as technology evolves, many of the elements described herein may be replaced with equivalent elements that appear after the present disclosure.

Claims

1. An image processing method comprising:

2. The image processing method of claim 1, wherein determining a plurality of reference points in the text line image for the curved text line comprises:

3. The image processing method of claim 1, wherein determining a plurality of reference points in the text line image for the curved text line comprises:

4. The image processing method of any of claims 1-3, wherein determining a text line curve for the curved text line based on the plurality of reference points comprises:

5. The image processing method of claim 1, wherein adjusting the curved line of text with adjustment parameters determined based on the text line curve to obtain identified lines of text corresponding to the curved line of text comprises:

determining a plurality of text subregions for the curved line of text;

6. An electronic circuit, comprising:

circuitry configured to perform the steps of the method of any of claims 1-5.

7. A visual impairment assistance device comprising:

a curved text line correction circuit implemented by the electronic circuit of claim 6;

circuitry configured to convert the textual data to sound data; and

a circuit configured to output the sound data.

8. An electronic device, comprising:

a processor; and

a memory storing a program comprising instructions that, when executed by the processor, cause the processor to perform the method of any of claims 1-5.

9. A non-transitory computer readable storage medium storing a program, the program comprising instructions that, when executed by a processor of an electronic device, cause the electronic device to perform the method of any of claims 1-5.

10. A computer program product comprising a computer program, wherein the computer program realizes the method of any one of claims 1-5 when executed by a processor.