WO2022237893A1 - 图像处理方法、电子电路、视障辅助设备和介质 - Google Patents

图像处理方法、电子电路、视障辅助设备和介质 Download PDF

Info

Publication number
WO2022237893A1
WO2022237893A1 PCT/CN2022/092625 CN2022092625W WO2022237893A1 WO 2022237893 A1 WO2022237893 A1 WO 2022237893A1 CN 2022092625 W CN2022092625 W CN 2022092625W WO 2022237893 A1 WO2022237893 A1 WO 2022237893A1
Authority
WO
WIPO (PCT)
Prior art keywords
text line
text
curved
image
line
Prior art date
Application number
PCT/CN2022/092625
Other languages
English (en)
French (fr)
Inventor
王欢
周骥
冯歆鹏
Original Assignee
上海肇观电子科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海肇观电子科技有限公司 filed Critical 上海肇观电子科技有限公司
Publication of WO2022237893A1 publication Critical patent/WO2022237893A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words

Definitions

  • the present disclosure relates to the field of image processing, and in particular to an image processing method, an electronic circuit, an auxiliary device for the visually impaired, an electronic device, a storage medium and a program product.
  • the text existing in the image can be realized by means of image processing, and such a text recognition function has wide applications in various fields.
  • an image processing method including: performing text line detection on an input image to obtain a text line image including the curved text line; A plurality of reference points of the text line; determining a text line curve for the curved text line based on the plurality of reference points; adjusting the curved text line by using an adjustment parameter determined based on the text line curve, to obtain A recognized text line corresponding to the curved text line, wherein the recognized text line includes a plurality of characters displayed horizontally.
  • an electronic circuit comprising: a circuit configured to perform the steps of the above method.
  • a visually impaired assistive device comprising: a camera configured to acquire an image, wherein the image includes a curved text line; the curved text realized by the electronic circuit as described above a line correction circuit; a circuit configured to perform text detection and/or recognition on the recognized text line obtained by the curved text line correction circuit to obtain text data; a circuit configured to convert the text data into sound data; and A circuit configured to output the sound data.
  • an electronic device including: a processor; and a memory storing a program, the program including instructions, which when executed by the processor cause the processor to perform the above-mentioned Methods.
  • a non-transitory computer-readable storage medium storing a program, the program includes instructions, and the instructions, when executed by a processor of an electronic device, cause the electronic device to perform the above-mentioned Methods.
  • a computer program product including a computer program, wherein the computer program implements the above method when executed by a processor.
  • FIG. 1 shows a schematic diagram of an exemplary system in which various methods and apparatus described herein may be implemented according to an embodiment of the present disclosure
  • Fig. 2 shows an exemplary flowchart of an image processing method according to an embodiment of the present disclosure
  • Figure 3 A shows an example of a text line image comprising a curved text line
  • Fig. 3 B shows the example of a plurality of character detection frames in the text line image obtained by character target detection
  • Figure 3D shows an example of determining a reference point based on the height of the text line area and a predetermined step size
  • Fig. 3 E shows the example of the text line curve that utilizes the method for B-spline interpolation to obtain
  • FIG. 4 shows an exemplary flow chart of a method for adjusting curved text lines according to an embodiment of the present disclosure
  • FIG. 5 shows an exemplary flowchart of a method for determining multiple text subregions in a curved text line according to an embodiment of the present disclosure
  • FIG. 6A shows an example of a slope determined on a text line curve at a location corresponding to at least one point
  • Figure 6B shows an example of a plurality of text sub-regions obtained based on the slope division at the position corresponding to each reference point on the text line curve
  • Fig. 7 shows an exemplary flowchart of a method for adjusting curved text lines according to an embodiment of the present disclosure
  • FIG. 8 shows an example of a recognized text line obtained by splicing multiple adjusted text subregions according to the method described in FIG. 7;
  • Fig. 9 shows another exemplary flowchart of a method for adjusting a curved text line according to an embodiment of the present disclosure
  • 10A-FIG. 10C show an example of determining a line of recognized text according to the method described in FIG. 9;
  • FIG. 11 shows an exemplary flowchart of a text recognition process according to an embodiment of the present disclosure
  • Fig. 12 shows an exemplary block diagram of an image processing device according to an embodiment of the present disclosure.
  • FIG. 13 is a block diagram illustrating an example of an electronic device according to an exemplary embodiment of the present disclosure.
  • first, second, etc. to describe various elements is not intended to limit the positional relationship, temporal relationship or importance relationship of these elements, and such terms are only used for Distinguishes one element from another.
  • first element and the second element may refer to the same instance of the element, and in some cases, they may also refer to different instances based on contextual description.
  • the text information to be recognized in the image has a curved shape.
  • Recognition of curved textual information is a challenge. Since the text display in the curved text line is irregular, directly training the text recognition model to recognize the text data in the curved text line will make the text recognition model quite complicated, and the accuracy of text recognition is compared to that for horizontally displayed text. The row recognition accuracy is lower.
  • “Horizontal direction” refers to a direction that coincides with the direction in which characters of a text line are arranged.
  • “Vertical direction” refers to a direction perpendicular to the character arrangement direction of a text line.
  • “Curved text line” means that the connecting lines of the characters in the text line are not on the same horizontal line, for example, the distance of each character from the same horizontal line exceeds a predetermined distance threshold.
  • displaying horizontally means that each character is substantially located on the same horizontal line, that is, the distance between each horizontally displayed character and the same horizontal line does not exceed a predetermined distance threshold.
  • displaying horizontally means that each character is substantially located on the same vertical line, that is, the distance of each character displayed horizontally from the same vertical line does not exceed a predetermined distance threshold.
  • FIG. 1 shows a schematic diagram of an exemplary system 100 in which various methods and apparatus described herein may be implemented according to an embodiment of the present disclosure.
  • the system 100 includes one or more terminal devices 101 , a server 120 and one or more communication networks 110 coupling the one or more terminal devices 101 to the server 120 .
  • the terminal device 101 may be configured to execute one or more application programs.
  • the server 120 may run one or more services or software applications enabling execution of the method for image processing according to the present disclosure.
  • the terminal device 101 may also be used to run one or more services or software applications according to the method for image processing of the present disclosure.
  • the terminal device 101 may be implemented as a visually impaired assistive device.
  • server 120 may also provide other services or software applications that may include non-virtualized environments and virtualized environments.
  • these services may be provided as web-based services or cloud services, eg under a Software as a Service (SaaS) model to users of the terminal device 101 .
  • SaaS Software as a Service
  • server 120 may include one or more components that implement the functions performed by server 120 . These components may include software components, hardware components or combinations thereof executable by one or more processors. A user operating the terminal device 101 may in turn utilize one or more terminal application programs to interact with the server 120 to utilize the services provided by these components. It should be understood that various different system configurations are possible, which may differ from system 100 . Accordingly, FIG. 1 is one example of a system for implementing the various methods described herein, and is not intended to be limiting.
  • the terminal device 101 may provide an interface enabling a user of the terminal device to interact with the terminal device.
  • the terminal can also output information to the user via this interface.
  • FIG. 1 depicts only one terminal device, those skilled in the art will understand that the present disclosure may support any number of terminal devices.
  • Terminal devices 101 may include various types of computer devices, such as portable handheld devices, general purpose computers (such as personal computers and laptops), workstation computers, wearable devices, gaming systems, thin clients, various messaging devices, sensors or other sensing devices, etc. These computer devices can run various types and versions of software applications and operating systems, such as Microsoft Windows, Apple iOS, UNIX-like operating systems, Linux or Linux-like operating systems (such as Google Chrome OS); or include various mobile operating systems , such as Microsoft Windows Mobile OS, iOS, Windows Phone, Android.
  • Portable handheld devices may include cellular phones, smart phones, tablet computers, personal digital assistants (PDAs), and the like.
  • Wearable devices can include head-mounted displays and other devices.
  • Gaming systems may include various handheld gaming devices, Internet-enabled gaming devices, and the like. Terminal devices are capable of executing various applications, such as various Internet-related applications, communication applications (eg, email applications), Short Message Service (SMS) applications, and may use various communication protocols.
  • SMS Short Message Service
  • Network 110 can be any type of network known to those skilled in the art that can support data communications using any of a number of available protocols, including but not limited to TCP/IP, SNA, IPX, and the like.
  • the one or more networks 110 may be a local area network (LAN), an Ethernet-based network, a token ring, a wide area network (WAN), the Internet, a virtual network, a virtual private network (VPN), an intranet, an extranet, Public switched telephone network (PSTN), infrared network, wireless network (eg Bluetooth, WIFI) and/or any combination of these and/or other networks.
  • LAN local area network
  • Ethernet-based network a token ring
  • WAN wide area network
  • VPN virtual private network
  • PSTN Public switched telephone network
  • WIFI wireless network
  • Server 120 may include one or more general purpose computers, dedicated server computers (e.g., PC (personal computer) servers, UNIX servers, midrange servers), blade servers, mainframe computers, server clusters, or any other suitable arrangement and/or combination .
  • Server 120 may include one or more virtual machines running virtual operating systems, or other computing architectures involving virtualization (eg, one or more flexible pools of logical storage devices that may be virtualized to maintain the server's virtual storage devices).
  • server 120 may run one or more services or software applications that provide the functionality described below.
  • Computing units in server 120 may run one or more operating systems including any of the operating systems described above as well as any commercially available server operating systems.
  • Server 120 may also run any of a variety of additional server applications and/or middle-tier applications, including HTTP servers, FTP servers, CGI servers, JAVA servers, database servers, and the like.
  • server 120 may include one or more applications to analyze and consolidate data feeds and/or event updates received from users of terminal devices 101 .
  • Server 120 may also include one or more applications to display data feeds and/or real-time events via one or more display devices of terminal device 101 .
  • the server 120 may be a server of a distributed system, or a server combined with blockchain.
  • the server 120 can also be a cloud server, or an intelligent cloud computing server or an intelligent cloud host with artificial intelligence technology.
  • Cloud server is a host product in the cloud computing service system to solve the defects of difficult management and weak business scalability existing in traditional physical host and virtual private server (VPS, Virtual Private Server) services.
  • System 100 may also include one or more databases 130 .
  • these databases may be used to store data and other information.
  • databases 130 may be used to store information such as audio files and video files.
  • Database 130 may reside in various locations.
  • the data store used by server 120 may be local to server 120, or may be remote from server 120 and may communicate with server 120 via a network-based or dedicated connection.
  • Database 130 can be of different types.
  • database 130 used by server 120 may be a relational database.
  • One or more of these databases may store, update and retrieve the database and data from the database in response to commands.
  • databases 130 may also be used by applications to store application data.
  • Databases used by applications can be different types of databases such as key-value stores, object stores or regular stores backed by a file system.
  • the system 100 of FIG. 1 may be configured and operated in various ways to enable application of the various methods and apparatuses described in accordance with this disclosure.
  • FIG. 2 shows an exemplary flowchart of an image processing method 200 according to an embodiment of the present disclosure.
  • the method shown in FIG. 2 can be executed by the terminal device 101 or the server 120 shown in FIG. 1 .
  • the image containing curved text lines can be processed by using the method 200 shown in FIG. 2 to correct the characters in the curved text lines in the image to be displayed horizontally for further character recognition process.
  • step 202 text line detection may be performed on the input image to obtain a text line image including curved text lines.
  • the input image may be acquired by an image acquisition unit (such as a camera) installed on the terminal device.
  • pre-acquired images may be read from memory as input images.
  • the input image may include one or more text lines, and one or more text lines in the input image are curved text lines.
  • the input image may be processed by a pre-obtained image processing model for detecting text lines in the image to obtain a text line image including a single curved text line.
  • the input image may be processed by using a pre-trained neural network-based text line detection model to obtain a sub-image including a text line in the input image as a text line image.
  • FIG. 3A shows an example of a text line image including curved text lines.
  • a text line image including and only one text line can be detected from the input image.
  • the text line included in the example shown in FIG. 3A is a curved text line, that is, a plurality of characters included in the text line are not displayed horizontally.
  • step S204 a plurality of reference points for bending the text line in the text line image may be determined.
  • the position of each of the above reference points may be the same as the position of at least one character included in the curved text line, or may be a simulated position of the character included in the curved text line, and does not necessarily correspond to the position of the real character.
  • the characters in the text line mentioned here may be units of any form in the text line, such as English words, English letters, Chinese characters, punctuation marks, and the like. The form of the text in the text line is not restricted here.
  • character object detection may be performed on the text line image to obtain multiple reference points for the curved text line. Wherein, each reference point indicates the position of each character included in the curved text line.
  • the text line image may be processed by using a pre-trained neural network-based object detection model for character recognition, so as to obtain character detection frames for each character included in the curved text line.
  • FIG. 3B shows an example of a plurality of character detection frames in a text line image obtained through character object detection.
  • Each character detection frame 301 may include at least one character.
  • most character detection frames only include one character.
  • some character detection boxes may also include multiple characters.
  • the number of characters included in the character detection frame obtained by the character object detection is not required, as long as the character detection result can basically reflect the trend of the characters in the curved text line.
  • the positions of multiple reference points in the curved text line may be determined based on the positions of multiple character detection frames as shown in FIG. 3B .
  • the center point of at least one character detection frame among the recognized character detection frames may be determined as the reference point, that is, the position of the center point of at least one character detection frame may be determined as the position of the corresponding reference point.
  • image segmentation may be performed on the text line image to obtain the text line area corresponding to the curved text line in the text line image.
  • a pre-trained neural network-based image segmentation model for text line segmentation can be used to segment the text line image pixel by pixel, so as to obtain the segmentation result that each pixel in the text line image belongs to the text line or does not belong to the text line , thereby determining the text line region corresponding to the curved text line.
  • FIG. 3C shows an example of a text line region in a text line image obtained through image segmentation.
  • the black area indicates the image area determined not to include characters in the text line image
  • the white area indicates the text line area determined to contain characters in the text line image.
  • FIG. 3D shows an example of determining a reference point based on the height of the text line area and a predetermined step size.
  • the predetermined step size may indicate a predetermined character width. It can be understood that those skilled in the art can arbitrarily set the value of the predetermined step according to the actual situation, and the predetermined step here may be different from the actual width of the characters in the curved text line.
  • the predetermined step size shown in FIG. 3D is smaller than the actual width of the characters in the curved text line. In other embodiments, the predetermined step size may also be larger than the actual width of the characters in the curved text line.
  • the text line region may be segmented based on a predetermined step size to obtain a plurality of simulated character boxes 302 simulating character positions.
  • the position of the center point of each simulated character frame may be determined as the position of the reference point.
  • the abscissa of the reference point can be the average value of the abscissas of the left and right borders of the corresponding simulated character frame
  • the ordinate of the reference point can be the mean value of the ordinate of the points in the area of the simulated character frame .
  • a text line curve for bending the text line may be determined based on the plurality of reference points determined in step S304.
  • Text line curves can indicate the specific shape of the arrangement of characters in a curved text line.
  • Adjustment parameters for straightening the curved text line can be obtained by mathematically analyzing the text line curve representing the shape of the curved text line.
  • the positions of multiple reference points indicating the positions of the curved text lines can be obtained by using step S204.
  • a text line curve for simulating the curve of the characters in the curved text line can be obtained.
  • the B-spline interpolation method can be used to perform curve fitting on the positions of multiple reference points, so as to obtain the mathematical expression of the text line curve.
  • any curve fitting method such as polynomial fitting may also be used to perform curve fitting on the positions of multiple reference points.
  • FIG. 3E shows an example of a text line curve obtained by using the B-spline interpolation method. It can be seen from the example shown in FIG. 3E that the text line curve 303 obtained by using the B-spline interpolation method can accurately fit the curve where the characters in the curved text line are located. It can be understood that, without departing from the principles of the present disclosure, those skilled in the art can adopt any mathematical method that can well fit curved text lines to perform fitting.
  • the curved text line may be adjusted by using the adjustment parameter determined based on the text line curve, so as to obtain a recognized text line corresponding to the curved text line.
  • the recognized text line includes multiple characters displayed horizontally.
  • the trained character recognition model can be used to process the recognized text line to obtain text data in the recognized text line. It can be understood that since the recognized text line is obtained by straightening the curved text line, the text data in the recognized text line is the same as the text data in the curved text line.
  • a text line curve that can accurately represent a curved text line can be obtained based on the position of the reference point for the curved text line.
  • Curved text lines can be straightened by using the adjustment parameters derived from the text line curve. Since the curve of the text line obtained based on the reference point can accurately represent the position of the curved text line, a better straightening effect can be obtained by using the method provided in the present disclosure.
  • the text recognition algorithm can directly perform text recognition on the recognized text lines in which the characters are basically displayed on the same horizontal line. For example, an end-to-end seq2seq deep learning model can be used to recognize text sequences.
  • FIG. 4 shows an exemplary flowchart of a method 400 for adjusting curved text lines according to an embodiment of the present disclosure.
  • step S402 a plurality of text subregions for curved text lines may be determined.
  • the curved text line can be divided into multiple text sub-regions, so that the curved text line can be segmented.
  • the display effect of each text sub-area can be adjusted respectively, so that the characters in each text sub-area are basically displayed on the same horizontal line.
  • each text sub-region of the plurality of text sub-regions may include a single character. In other embodiments, each text sub-region of the plurality of text sub-regions may include at least two characters. The number of characters included in each text sub-area may be the same or different. In still other embodiments, each text sub-region of the plurality of text sub-regions may comprise a width of a single column of pixels. It can be understood that the above description is only used as an exemplary description of segmenting a curved text line, rather than as a limitation of the present disclosure.
  • the text sub-region may be adjusted based on the adjustment parameters for the text sub-region determined by using the text line curve.
  • a curved text line characters in different regions are displayed at different positions on the image, and the characters are not displayed on the same horizontal line.
  • corresponding adjustment parameters may be determined for each text subregion, for adjusting at least one of the direction and position of the characters in the text subregion, To make the characters in each text sub-area displayed horizontally, so as to achieve the effect of straightening a curved text line.
  • the adjustment parameters for each text sub-region may include an angle between the arrangement direction of characters in the text sub-region and the horizontal direction.
  • the recognized text line corresponding to the curved text line may be determined based on the adjusted text sub-region.
  • the adjusted multiple text sub-regions may be scaled so that the adjusted multiple text sub-regions have the same height.
  • the adjusted heights of multiple text sub-regions may be different.
  • the sizes of the adjusted multiple text sub-regions may be scaled so that the adjusted multiple text sub-regions have the same height.
  • the adjusted sizes of the plurality of text sub-regions may be scaled only in the height direction.
  • the adjusted dimensions of the multiple text subregions may be proportionally scaled in the height direction and the length direction, so that the adjusted multiple text subregions have the same height.
  • the adjusted multiple text sub-regions may be scaled based on a predetermined reference height, so that the multiple scaled text sub-regions all have a reference height.
  • the multiple scaled text sub-regions may be spliced in the horizontal direction to obtain a recognized text line, wherein the characters in the recognized text line are displayed horizontally.
  • multiple characters in a curved text line can be segmented, and the curved text line can be straightened based on the adjustment parameters determined based on the curve of the text line.
  • the image processing method provided by the present disclosure can straighten a curved text line of any length and obtain a recognized text line with multiple characters displayed horizontally.
  • FIG. 5 shows an exemplary flowchart of a method 500 for determining multiple text subregions in a curved text line according to an embodiment of the present disclosure.
  • step S502 the slope at a position corresponding to at least one point on the text line curve may be determined.
  • the curved text line may be divided based on the slope at the position corresponding to the at least one point to obtain multiple text sub-regions, wherein adjacent text sub-regions correspond to different slopes.
  • the text line curve obtained by fitting multiple reference points can simulate the trend and position of the characters in the curved text line.
  • the trend of the characters at the position can be obtained.
  • FIG. 6A shows an example of slopes determined on a text line curve at positions corresponding to at least one reference point.
  • the point on the text line curve that is the same as the abscissa of each reference point can be determined as the position corresponding to the reference point.
  • the arrows shown in FIG. 6A indicate different slopes of the text line curve at different positions.
  • the rate of change of the slope between adjacent reference points can be determined based on the slope at each point on the curve of the text line, and the area between adjacent points whose rate of change of the slope is less than a change threshold can be divided into the same within the text subregion.
  • a change threshold can be divided into the same within the text subregion.
  • FIG. 6B shows an example of a plurality of text sub-regions divided based on the slope of at least one point on the text line curve.
  • the included angles between the arrangement direction of characters in each text subregion and the horizontal direction are basically the same.
  • the boundaries of each text subregion are represented by angled quadrangular regions, wherein the left boundary and right boundary of each text subregion are perpendicular to the horizontal direction, and the angle between the upper boundary and the lower boundary and the horizontal direction is equal to The trend of the character indicated by the slope corresponding to the text curve area is the same as the included angle in the horizontal direction.
  • the height of each text sub-region may be based on the height of characters in the curved text line.
  • the height of the text sub-region may be determined based on the height of the character detection frame obtained from object detection.
  • the height of the text sub-region may be determined based on the height of the text line in the text line segmentation result.
  • Fig. 7 shows an exemplary flowchart of a method for adjusting curved text lines according to an embodiment of the present disclosure.
  • the text sub-region shown in FIG. 6B can be adjusted using the method 700 shown in FIG. 7 .
  • step S702 the adjustment parameters of each text sub-region in the plurality of text sub-regions of the curved text line may be determined.
  • the adjustment parameter may include the angle between the arrangement direction of the characters included in the text sub-region and the horizontal direction determined based on the slope of the text line curve corresponding to the text sub-region.
  • the text subregion may be adjusted based on the angle between the arrangement direction of the characters included in the text subregion determined in step S702 and the horizontal direction, so that the characters in the text subregion are displayed horizontally.
  • the entire text line image may be reversely rotated based on the angle between the arrangement direction of the characters included in the text subregion determined in step S702 and the horizontal direction, so that the characters in the text subregion are displayed horizontally, and based on the rotation
  • the four vertex positions of the corresponding text sub-region in the rotated text line image are obtained by the angle.
  • the minimum circumscribing rectangle of the four vertices of the rotated text sub-region may be taken, and the minimum circumscribing rectangle may be cropped from the rotated text line image to obtain the adjusted text sub-region.
  • the upper boundary and the lower boundary of the minimum circumscribed rectangle are parallel to the horizontal direction
  • the left boundary and the right boundary are parallel to the vertical direction.
  • FIG. 8 shows an example of a recognized text line obtained by splicing multiple adjusted text subregions according to the method described in FIG. 7 .
  • the adjusted multiple text subregions are spliced in the horizontal direction, so as to obtain recognized text lines for text recognition.
  • Fig. 9 shows another exemplary flowchart of a method for adjusting a curved text line according to an embodiment of the present disclosure.
  • step S902 for each column of pixels in the text line image, an adjustment parameter for the column of pixels is determined.
  • the adjustment parameters for the column of pixels include the offset between the ordinate of the point on the text line curve in the column of pixels and the reference position.
  • the reference positions in the text line image may be predetermined. For example, the position of the horizontal centerline of the text line image may be determined as the reference position. For another example, the position of the horizontal line where any character in the text line image is located may be determined as the reference position. For another example, the value of the average ordinate of the character detection frame obtained through character detection in the text line image may be determined as the reference position.
  • step S904 the display of each column of pixels in the text line image may be adjusted using adjustment parameters.
  • the position of the point on the text line curve in the column of pixels can be adjusted in the vertical direction based on the offset between the ordinate of the point on the text line curve in the column of pixels and the reference position, so that the column of pixels in Chinese The adjusted vertical position of the point on the curve in this line is consistent with the reference position.
  • the recognized text line may be determined based on the adjusted text line image.
  • the image background of the adjusted text line image may be cropped based on the character height to obtain the recognized text line.
  • 10A-10C show an example of determining a recognized text line according to the method described in FIG. 9 .
  • the point on the text line curve in the text sub-region is located at a distance d below the reference line 1002 .
  • the pixel point sequence with a height of d on the opposite side of the point on the text line curve (for the column pixel 1001, that is, above the reference line 1002) can be clipped, and the text sub-region 1001 Move the remaining part after clipping up by a distance of d, and reversely fill the clipped pixel point sequence with a height of d below the point on the text line curve, so that the position of the point on the text line curve in the column pixel 1001 is adjusted to be consistent with the reference position.
  • FIG. 10B shows the result obtained after adjusting the pixel points of each column in the text line image using the method described in conjunction with FIG. 10A . As shown in FIG. 10B, all characters of the curved text line are adjusted to be displayed horizontally.
  • FIG. 10C shows the result of recognizing text lines after cropping the result shown in FIG. 10B according to the character height.
  • the pixels in the text line image can be reversely filled column by column based on the parameters of the text line curve and the reference position, so that the pixels in each column of pixels corresponding to the character Basically displayed at the reference position.
  • the straightening of curved text lines of any length can be conveniently realized by using the above method.
  • FIG. 11 shows an exemplary flowchart of a text recognition process 1100 according to an embodiment of the present disclosure.
  • step 1102 text line detection may be performed on the acquired input image to obtain a text line image including a single text line.
  • step S1104 bending correction may be performed on the text lines in the text line image to obtain a recognized text line, wherein the recognized text line includes a plurality of characters displayed horizontally.
  • the curved text line in the text line image can be corrected by using the process of the method described above in conjunction with FIG. 2-FIG. 10C , which will not be repeated here.
  • step S1106 character recognition may be performed on the recognized text line to obtain character data included in the text line.
  • Recognition of text lines can be processed by a trained text recognition model based on a neural network. Since the characters in the recognized text line are displayed in a horizontal manner, the text recognition model does not need to directly recognize the content in the curved text line, thus reducing the complexity of the text recognition model and improving the accuracy of text recognition.
  • the text recognition method provided by the present disclosure, by first correcting the curved text lines to obtain the recognized text lines that are basically displayed on the same horizontal line, the pressure of the text recognition model on the recognition of large/long texts with a large degree of curvature can be alleviated, and the text can be improved. recognition performance.
  • FIG. 12 shows an exemplary block diagram of an image processing device according to an embodiment of the present disclosure.
  • the image processing apparatus 1200 may include a text line detection unit 1210 , a reference point determination unit 1220 , a curve determination unit 1230 , and a recognized text determination unit 1240 .
  • the text line detection unit 1210 may be configured to perform text line detection on the input image to obtain a text line image including curved text lines.
  • the reference point determining unit 1220 may be configured to determine a plurality of reference points for bending a text line in the text line image.
  • the curve determining unit 1230 may be configured to determine a text line curve for the curved text line based on the plurality of reference points.
  • the recognized text determination unit 1240 may be configured to adjust the curved text line using the adjustment parameters determined based on the text line curve to obtain a recognized text line corresponding to the curved text line, wherein the recognized text line includes horizontal Multiple characters to display.
  • the text line detection unit 1210, the reference point determination unit 1220, the curve determination unit 1230, and the recognized text determination unit 1240 can be used to implement the steps of the image processing method described above in conjunction with FIG. 2-FIG.
  • a text line curve that can accurately represent a curved text line can be obtained based on the position of a reference point for a curved text line.
  • Curved text lines can be straightened by using the adjustment parameters derived from the text line curve. Since the curve of the text line obtained based on the reference point can accurately represent the position of the curved text line, a better straightening effect can be obtained by using the method provided in the present disclosure.
  • the text recognition algorithm can directly perform text recognition on the recognized text lines in which the characters are basically displayed on the same horizontal line. For example, an end-to-end seq2seq deep learning model can be used to recognize text sequences.
  • an electronic circuit comprising: a circuit configured to perform the steps of the method described in the present disclosure.
  • an electronic device comprising: a processor; and a memory storing a program, the program including instructions which, when executed by the processor, cause the processor to perform the present disclosure. method described in .
  • a computer-readable storage medium storing a program, the program including instructions that, when executed by a processor of an electronic device, cause the electronic device to perform the operations described in the present disclosure.
  • a computer program product comprising a computer program, wherein the computer program implements the method described in the present disclosure when executed by a processor.
  • FIG. 13 is a block diagram illustrating an example of an electronic device according to an exemplary embodiment of the present disclosure. It should be noted that the structure shown in FIG. 13 is only an example, and according to a specific implementation manner, the electronic device of the present disclosure may only include one or more of the components shown in FIG. 13 .
  • the electronic device 1300 may be, for example, a general-purpose computer (eg, various computers such as a laptop computer, a tablet computer, etc.), a mobile phone, or a personal digital assistant. According to some embodiments, the electronic device 1300 may be a visually impaired assistive device. Electronic device 1300 may include a camera and electronic circuitry for curved text line correction. Wherein, the camera can be configured to acquire images, wherein the image includes curved text lines, and the electronic circuit can be configured to execute the image processing method for text line correction described in conjunction with FIGS. 2-10C .
  • the electronic device 1300 may be configured to include a spectacle frame or be configured to be detachably mounted to a spectacle frame (such as a frame of a spectacle frame, a connector connecting two frames, a temple or any other part) ), so that an image approximately including the user's field of view can be captured.
  • a spectacle frame such as a frame of a spectacle frame, a connector connecting two frames, a temple or any other part
  • the electronic device 1300 can also be installed on other wearable devices, or be integrated with other wearable devices.
  • the wearable device may be, for example: a head-mounted device (such as a helmet or a hat, etc.), a device that can be worn on the ear, and the like.
  • the electronic device may be implemented as an accessory attachable to a wearable device, such as an accessory attachable to a helmet or a hat, and the like.
  • the electronic device 1300 may also have other forms.
  • electronic device 1300 may be a mobile phone, a general computing device (eg, laptop computer, tablet computer, etc.), a personal digital assistant, and the like.
  • the electronic device 1300 may also have a base so that it can be placed on a table.
  • the electronic device 1300 may include a camera 1304 for capturing images.
  • the camera 1304 may include, but not limited to, a video camera or a camera, and the like.
  • the electronic device 1300 may further include a curved text line correction circuit (electronic circuit) 1400 comprising steps configured to perform the image method for text line correction as described above ( For example, the circuits of the method steps) are described in conjunction with FIGS. 2-10C .
  • the electronic device 1300 may further include a character recognition circuit 1305 configured to perform character detection and/or recognition (such as OCR processing) on the corrected text line contained in the image output by the curved text line correction circuit, To obtain text data.
  • the character recognition circuit 1305 can be realized by a dedicated chip, for example.
  • the electronic device 1300 may further include a sound conversion circuit 1306 configured to convert the text data into sound data.
  • the sound conversion circuit 1306 can be realized by a dedicated chip, for example.
  • the electronic device 1300 may also include a sound output circuit 1307 configured to output the sound data.
  • the sound output circuit 1307 may include but not limited to earphones, speakers, or vibrators, etc., and their corresponding driving circuits.
  • the electronic device 1300 may further include an image processing circuit 1308, and the image processing circuit 1308 may include a circuit configured to perform various image processing on the image.
  • the image processing circuit 1308 may include, but not limited to, one or more of the following: a circuit configured to denoise an image, a circuit configured to deblur an image, a circuit configured to geometrically correct an image A circuit, a circuit configured to perform feature extraction on an image, a circuit configured to perform object detection and/or recognition on an object in an image, a circuit configured to perform text detection on text contained in an image, a circuit configured to perform text detection from Circuits for extracting text lines from images, circuits configured to extract text coordinates from images, circuits configured to extract object boxes from images, circuits configured to extract text boxes from images, circuits configured to extract text boxes from images, Circuits for layout analysis (e.g. paragraph division), etc.
  • the electronic device 1300 may further include a word processing circuit 1309, and the word processing circuit 1309 may be configured to extract text-related information (such as text data, text boxes, paragraph coordinates, text line coordinates, Text coordinates, etc.) to perform various processing, so as to obtain processing results such as paragraph sorting, text semantic analysis, layout analysis results, etc.
  • text-related information such as text data, text boxes, paragraph coordinates, text line coordinates, Text coordinates, etc.
  • One or more of the above-mentioned various circuits can be Use custom hardware, and/or can be implemented in hardware, software, firmware, middleware, microcode, hardware description languages, or any combination thereof.
  • one or more of the above-mentioned various circuits can be implemented in assembly language or hardware programming language (such as VERILOG, VHDL, C++) by using logic and algorithms according to the present disclosure to implement hardware (for example, including field programmable gate array) (FPGA) and/or Programmable Logic Circuits of Programmable Logic Array (PLA)) to implement programming.
  • FPGA field programmable gate array
  • PDA Programmable Logic Circuits of Programmable Logic Array
  • the electronic device 1300 may also include a communication circuit 1310, which may be any type of device or system that enables communication with external devices and/or with a network, and may include, but is not limited to, a modem, a network card , infrared communication devices, wireless communication devices and/or chipsets, such as Bluetooth devices, 802.11 devices, WiFi devices, WiMax devices, cellular communication devices, and/or the like.
  • a communication circuit 1310 may be any type of device or system that enables communication with external devices and/or with a network, and may include, but is not limited to, a modem, a network card , infrared communication devices, wireless communication devices and/or chipsets, such as Bluetooth devices, 802.11 devices, WiFi devices, WiMax devices, cellular communication devices, and/or the like.
  • the electronic device 1300 may also include an input device 1311, which may be any type of device capable of inputting information to the electronic device 1300, and may include but not limited to various sensors, mouse, keyboard, touch screen , buttons, joystick, microphone and/or remote control, etc.
  • an input device 1311 may be any type of device capable of inputting information to the electronic device 1300, and may include but not limited to various sensors, mouse, keyboard, touch screen , buttons, joystick, microphone and/or remote control, etc.
  • the electronic device 1300 may also include an output device 1312, which may be any type of device capable of presenting information, and may include, but is not limited to, a display, a visual output terminal, a vibrator, and/or a printer, etc. .
  • an output device 1312 may be any type of device capable of presenting information, and may include, but is not limited to, a display, a visual output terminal, a vibrator, and/or a printer, etc.
  • vision-based output devices may facilitate obtaining output information from the electronic device 1300 by the user's family members or maintenance workers, etc.
  • the electronic device 1300 may further include a processor 1301 .
  • the processor 1301 may be any type of processor, and may include, but is not limited to, one or more general-purpose processors and/or one or more special-purpose processors (eg, special processing chips).
  • the processor 1301 may be, for example but not limited to, a central processing unit CPU or a microprocessor MPU or the like.
  • the electronic device 1300 may also include a working memory 1302, and the working memory 1302 may store programs (including instructions) and/or data (such as images, text, sound, and other intermediate data, etc.) useful for the work of the processor 1301. memory and may include, but is not limited to, random access memory and/or read-only memory devices.
  • the electronic device 1300 may also include a storage device 1303.
  • the storage device 1303 may include any non-transitory storage device.
  • the non-transitory storage device may be any storage device that is non-transitory and capable of storing data, and may include but is not limited to Disk drives, optical storage devices, solid state memory, floppy disks, flexible disks, hard disks, tapes or any other magnetic media, optical disks or any other optical media, ROM (read only memory), RAM (random access memory), cache memory and and/or any other memory chip or cartridge, and/or any other medium from which a computer can read data, instructions and/or code.
  • the work memory 1302 and the storage device 1303 may be collectively referred to as "storage", and may be used in conjunction with each other in some cases.
  • the processor 1301 can control the video camera 1304, the character recognition circuit 1305, the sound conversion circuit 1306, the sound output circuit 1307, the image processing circuit 1308, the word processing circuit 1309, the communication circuit 1310, the curved text line correction circuit (electronic circuit ) 1400 and at least one of various other devices and circuits included in the electronic device 1300 are controlled and scheduled.
  • at least some of the various components described in FIG. 13 may be connected and/or communicate with each other through the bus 1313 .
  • Software elements may reside in the working memory 1302, including but not limited to an operating system 1302a, one or more application programs 1302b, drivers, and/or other data and code.
  • instructions for performing the aforementioned control and scheduling may be included in the operating system 1302a or one or more application programs 1302b.
  • instructions for executing the method steps described in the present disclosure (for example, the method steps described in conjunction with FIGS. This is achieved by the processor 1301 reading and executing instructions of one or more application programs 1302b.
  • the electronic device 1300 may include a processor 1301 and a memory (such as a working memory 1302 and/or a storage device 1303) storing a program including instructions that when executed by the processor 1301 causes the processing
  • the implementer 1301 executes the methods described in various embodiments of the present disclosure.
  • the operations performed by at least one of the character recognition circuit 1305, the sound conversion circuit 1306, the image processing circuit 1308, the word processing circuit 1309, and the curved text line correction circuit (electronic circuit) 1400 may be performed by The processor 1301 reads and executes instructions of one or more application programs 1302 to achieve.
  • the executable code or source code of the instructions of the software element may be stored in a non-transitory computer-readable storage medium (such as the storage device 1303), and may be stored in the working memory 1302 (possibly by compile and/or install). Accordingly, the present disclosure provides a computer-readable storage medium storing a program comprising instructions that, when executed by a processor of an electronic device (such as a visually impaired assistive device), causes the electronic device to perform the functions described in the present disclosure. The method described in the examples. According to another embodiment, the executable code or the source code of the instructions of the software element (program) can also be downloaded from a remote location.
  • circuits, units, modules or elements may be implemented in hardware, software, firmware, middleware, microcode, hardware description languages or any combination thereof.
  • some or all of the circuits, units, modules, or elements included in the disclosed methods and devices can be programmed with assembly language or hardware programming language (such as VERILOG, VHDL, C++) by using logic and algorithms according to the present disclosure.
  • Hardware eg, programmable logic circuits including field programmable gate arrays (FPGA) and/or programmable logic arrays (PLA) can be programmed to implement.
  • the processors 1301 in the electronic device 1300 may be distributed over a network. For example, some processing may be performed using one processor while other processing may be performed by another processor remote from the one processor. Other modules of the electronic device 1300 may also be distributed similarly. As such, electronic device 1300 may be interpreted as a distributed computing system that performs processing at multiple locations.

Abstract

提供一种图像处理方法,包括:对输入图像进行文本行检测,以得到包括所述弯曲文本行的文本行图像;确定所述文本行图像中用于所述弯曲文本行的多个参考点;基于所述多个参考点确定用于所述弯曲文本行的文本行曲线;利用基于所述文本行曲线确定的调整参数对所述弯曲文本行进行调整,以得到对应于所述弯曲文本行的识别文本行,其中所述识别文本行包括水平显示的多个字符。利用本公开的实施例提供的方法,能够方便并准确地对弯曲文本行进行曲线拟合。通过对弯曲文本行进行分段处理,可以将包括多个字符的弯曲文本行调整成更易于文字识别的水平显示的文本行。

Description

图像处理方法、电子电路、视障辅助设备和介质 技术领域
本公开涉及图像处理领域,特别涉及一种图像处理方法、电子电路、视障辅助设备、电子设备、存储介质和程序产品。
背景技术
可以通过图像处理的方式实现图像中存在的文字,这样的文字识别功能在各种领域具有广阔的应用。
在此部分中描述的方法不一定是之前已经设想到或采用的方法。除非另有指明,否则不应假定此部分中描述的任何方法仅因其包括在此部分中就被认为是现有技术。类似地,除非另有指明,否则此部分中提及的问题不应认为在任何现有技术中已被公认。
发明内容
根据本公开的一个方面,提供了一种图像处理方法,包括:对输入图像进行文本行检测,以得到包括所述弯曲文本行的文本行图像;确定所述文本行图像中用于所述弯曲文本行的多个参考点;基于所述多个参考点确定用于所述弯曲文本行的文本行曲线;利用基于所述文本行曲线确定的调整参数对所述弯曲文本行进行调整,以得到对应于所述弯曲文本行的识别文本行,其中所述识别文本行包括水平显示的多个字符。
根据本公开的另一方面,提供一种电子电路,包括:被配置为执行上述方法的步骤的电路。
根据本公开的另一方面,还提供了一种视障辅助设备,包括:摄像机,被配置为获取图像,其中所述图像中包括弯曲文本行;由如前所述的电子电路实现的弯曲文本行校正电路;被配置为对所述弯曲文本行校正电路得到的识别文本行进行文字检测和/或识别以获得文字数据的电路;被配置为将所述文字数据转换成声音数据的电路;以及被配置为输出所述声音数据的电路。
根据本公开的另一方面,还提供一种电子设备,包括:处理器;以及存储程序的存储器,所述程序包括指令,所述指令在由所述处理器执行时使所述处理器执行上述的方法。
根据本公开的另一方面,还提供一种存储程序的非暂态计算机可读存储介质,所述程序包括指令,所述指令在由电子设备的处理器执行时,致使所述电子设备执行上述的方法。
根据本公开的另一方面,还提供一种计算机程序产品,包括计算机程序,其中,所述计算机程序在被处理器执行时实现上述的方法。
附图说明
附图示例性地示出了实施例并且构成说明书的一部分,与说明书的文字描述一起用于讲解实施例的示例性实施方式。所示出的实施例仅出于例示的目的,并不限制权利要求的范围。在所有附图中,相同的附图标记指代类似但不一定相同的要素。
图1示出了根据本公开的实施例可以将本文描述的各种方法和装置在其中实施的示例性系统的示意图;
图2示出了根据本公开的实施例的图像处理方法的示例性的流程图;
图3A示出了包括弯曲文本行的文本行图像的一个示例;
图3B示出了通过字符目标检测得到的文本行图像中的多个字符检测框的示例;
图3C中示出了通过图像分割得到的文本行图像中的文本行区域的示例;
图3D示出了基于文本行区域的高度和预定步长确定参考点的示例;
图3E示出了利用B样条插值的方法得到的文本行曲线的示例;
图4示出了根据本公开的实施例的对弯曲文本行进行调整的方法的示例性的流程图;
图5示出了根据本公开的实施例的确定弯曲文本行中的多个文本子区域的方法的示例性的流程图;
图6A示出了在文本行曲线上确定的与至少一个点对应的位置处的斜率的示例;;
图6B中示出了基于文本行曲线上与各个参考点对应的位置处的斜率划分得到的多个文本子区域的示例;
图7示出了根据本公开的实施例的对弯曲文本行进行调整的方法的一种示例性的流程图;
图8示出了根据图7中描述的方法得到的多个调整后的文本子区域拼接得到的识别文本行的示例;
图9示出了根据本公开的实施例的对弯曲文本行进行调整的方法的另一种示例性的流程图;
图10A-图10C示出了根据图9中描述的方法确定识别文本行的示例;
图11示出了根据本公开的实施例的文本识别过程的示例性的流程图;
图12示出了根据本公开的实施例的图像处理装置的示例性的框图;以及
图13是示出根据本公开的示例性实施例的电子设备的示例的框图。
具体实施方式
在本公开中,除非另有说明,否则使用术语“第一”、“第二”等来描述各种要素不意图限定这些要素的位置关系、时序关系或重要性关系,这种术语只是用于将一个元件与另一元件区分开。在一些示例中,第一要素和第二要素可以指向该要素的同一实例,而在某些情况下,基于上下文的描述,它们也可以指代不同实例。
在本公开中对各种所述示例的描述中所使用的术语只是为了描述特定示例的目的,而并非旨在进行限制。除非上下文另外明确地表明,如果不特意限定要素的数量,则该要素可以是一个也可以是多个。此外,本公开中所使用的术语“和/或”涵盖所列出的项目中的任何一个以及全部可能的组合方式。
在对图像中包括的文字信息进行识别的过程中,如果文字信息所在的表面存在弯曲/变形,那么图像待识别的文字信息具有弯曲的形状。针对弯曲的文本信息的识别是一种挑战。由于弯曲文本行中的文字显示是不规律的,直接训练文字识别模型来识别弯曲文本行中的文字数据将使得文字识别模型变得相当复杂,并且文字识别的准确性相对于针对水平显示的文本行的识别准确性更低。
在下文中以文本行中的字符在横向依次排列的示例描述本公开的原理。“水平方向”指的是和文本行的字符排列方向一致的方向。“竖直方向”指的是和文本行的字符排列方向垂直的方向。“弯曲文本行”指的是文本行的字符连线不在一条水平线上,例如各个字符偏离同一水平线的距离超过预定距离阈值。利用本公开提供的方法,可以对弯曲文本行进行拉直以得到水平显示的多个字符。
可以理解的时候,在不脱离本公开原理的情况下,也可以将“水平方向”和“竖直方向”进行互换,以对纵向排列的文本列进行弯曲校正。
其中,在横向排列的文本中,“水平显示”指的是各个字符基本位于同一水平线上,也就是说,水平显示的各个字符偏离同一水平线的距离不超过预定距离阈值。在纵向排列的文本中,“水平显示”是指各个字符基本位于同一垂直线上,也就是说,水平显示的各个字符偏离同一垂直线的距离不超过预定距离阈值。
为了准确并高效地文本行图像中的弯曲文本,本公开提供了一种新的图像处理方法。下文中将结合附图描述本公开的原理。
图1示出了根据本公开的实施例可以将本文描述的各种方法和装置在其中实施的示例性系统100的示意图。参考图1,该系统100包括一个或多个终端设备101、服务器120以及将一个或多个终端设备101耦接到服务器120的一个或多个通信网络110。终端设备101可以被配置为执行一个或多个应用程序。
在本公开的实施例中,服务器120可以运行使得能够执行根据本公开的用于图像处理的方法的一个或多个服务或软件应用。在一些实施例中,也可以使用终端设备101运行根据本公开的用于图像处理的方法的一个或多个服务或软件应用。在一些实现方式中,终端设备101可以实现为视障辅助设备。
在某些实施例中,服务器120还可以提供可以包括非虚拟环境和虚拟环境的其他服务或软件应用。在某些实施例中,这些服务可以作为基于web的服务或云服务提供,例如在软件即服务(SaaS)模型下提供给终端设备101的用户。
在图1所示的配置中,服务器120可以包括实现由服务器120执行的功能的一个或多个组件。这些组件可以包括可由一个或多个处理器执行的软件组件、硬件组件或其组合。操作终端设备101的用户可以依次利用一个或多个终端应用程序来与服务器120进行交互以利用这些组件提供的服务。应当理解,各种不同的系统配置是可能的,其可以与系统100不同。因此,图1是用于实施本文所描述的各种方法的系统的一个示例,并且不旨在进行限制。
终端设备101可以提供使终端设备的用户能够与终端设备进行交互的接口。终端设备还可以经由该接口向用户输出信息。尽管图1仅描绘了一个终端设备,但是本领域技术人员将能够理解,本公开可以支持任何数量的终端设备。
终端设备101可以包括各种类型的计算机设备,例如便携式手持设备、通用计算机(诸如个人计算机和膝上型计算机)、工作站计算机、可穿戴设备、游戏系统、瘦客户端、各种消息收发设备、传感器或其他感测设备等。这些计算机设备可以运行各种类型和版本的软件应用程序和操作系统,例如Microsoft Windows、Apple iOS、类UNIX操作系统、Linux或类Linux操作系统(例如Google Chrome OS);或包括各种移动操作系统,例如Microsoft Windows Mobile OS、iOS、Windows Phone、Android。便携式手持设备可以包括蜂窝电话、智能电话、平板电脑、个人数字助理(PDA)等。可穿戴设备可以包括头戴式显示器和其他设备。游戏系统可以包括各种手持式游戏设备、支持互联网的游 戏设备等。终端设备能够执行各种不同的应用程序,例如各种与Internet相关的应用程序、通信应用程序(例如电子邮件应用程序)、短消息服务(SMS)应用程序,并且可以使用各种通信协议。
网络110可以是本领域技术人员熟知的任何类型的网络,其可以使用多种可用协议中的任何一种(包括但不限于TCP/IP、SNA、IPX等)来支持数据通信。仅作为示例,一个或多个网络110可以是局域网(LAN)、基于以太网的网络、令牌环、广域网(WAN)、因特网、虚拟网络、虚拟专用网络(VPN)、内部网、外部网、公共交换电话网(PSTN)、红外网络、无线网络(例如蓝牙、WIFI)和/或这些和/或其他网络的任意组合。
服务器120可以包括一个或多个通用计算机、专用服务器计算机(例如PC(个人计算机)服务器、UNIX服务器、中端服务器)、刀片式服务器、大型计算机、服务器群集或任何其他适当的布置和/或组合。服务器120可以包括运行虚拟操作系统的一个或多个虚拟机,或者涉及虚拟化的其他计算架构(例如可以被虚拟化以维护服务器的虚拟存储设备的逻辑存储设备的一个或多个灵活池)。在各种实施例中,服务器120可以运行提供下文所描述的功能的一个或多个服务或软件应用。
服务器120中的计算单元可以运行包括上述任何操作系统以及任何商业上可用的服务器操作系统的一个或多个操作系统。服务器120还可以运行各种附加服务器应用程序和/或中间层应用程序中的任何一个,包括HTTP服务器、FTP服务器、CGI服务器、JAVA服务器、数据库服务器等。
在一些实施方式中,服务器120可以包括一个或多个应用程序,以分析和合并从终端设备101的用户接收的数据馈送和/或事件更新。服务器120还可以包括一个或多个应用程序,以经由终端设备101的一个或多个显示设备来显示数据馈送和/或实时事件。
在一些实施方式中,服务器120可以为分布式系统的服务器,或者是结合了区块链的服务器。服务器120也可以是云服务器,或者是带人工智能技术的智能云计算服务器或智能云主机。云服务器是云计算服务体系中的一项主机产品,以解决传统物理主机与虚拟专用服务器(VPS,Virtual Private Server)服务中存在的管理难度大、业务扩展性弱的缺陷。
系统100还可以包括一个或多个数据库130。在某些实施例中,这些数据库可以用于存储数据和其他信息。例如,数据库130中的一个或多个可用于存储诸如音频文件和视频文件的信息。数据库130可以驻留在各种位置。例如,由服务器120使用的数据存储库可以在服务器120本地,或者可以远离服务器120且可以经由基于网络或专用的连接 与服务器120通信。数据库130可以是不同的类型。在某些实施例中,由服务器120使用的数据库130可以是关系数据库。这些数据库中的一个或多个可以响应于命令而存储、更新和检索到数据库以及来自数据库的数据。
在某些实施例中,数据库130中的一个或多个还可以由应用程序使用来存储应用程序数据。由应用程序使用的数据库可以是不同类型的数据库,例如键值存储库,对象存储库或由文件系统支持的常规存储库。
图1的系统100可以以各种方式配置和操作,以使得能够应用根据本公开所描述的各种方法和装置。
图2示出了根据本公开的实施例的图像处理方法200的示例性的流程图。图2中示出的方法可以由图1中示出的终端设备101或服务器120来执行。可以利用图2中示出的方法200对包含弯曲文本行的图像进行处理,以将图像中的弯曲文本行中的字符校正为水平显示,以用于进一步的文字识别过程。
在步骤202中,可以对输入图像进行文本行检测,以得到包括弯曲文本行的文本行图像。
在一些实施例中,可以由终端设备上安装的图像获取单元(如摄像头)获取输入图像。在另一些实施例中,可以从存储器中读取预先获取的图像作为输入图像。其中输入图像中可以包括一个或多个文本行,并且输入图像中的一个或多个文本行中是弯曲文本行。
可以通过预先得到的用于在图像中检测文本行的图像处理模型对输入图像进行处理,以得到包括单个弯曲文本行的文本行图像。在一些实施例中,可以利用预先训练好的基于神经网络的文本行检测模型对输入图像进行处理,以得到输入图像中包括一个文本行的子图像作为文本行图像。
图3A示出了包括弯曲文本行的文本行图像的一个示例。如图3A所示,通过对输入图像进行文本行检测,可以从输入图像中检测得到其中包括并且仅包括一个文本行的文本行图像。在图3A示出的示例中包括的文本行是弯曲文本行,即文本行中包括的多个字符不是水平显示的。
在步骤S204中,可以确定文本行图像中用于弯曲文本行的多个参考点。
其中,上述各个参考点的位置可以与弯曲文本行中包括的至少一个字符的位置相同,也可以是弯曲文本行中包括的字符的模拟位置,而不一定对应于真实的字符所在的位置。 这里所说的文本行中的字符可以是文本行中的任意形式的单位,如英文单词、英文字母、中文字符、标点符号等。在此不对文本行中的文本形式进行限制。
在一些实施例中,可以对文本行图像进行字符目标检测,以得到用于弯曲文本行中的多个参考点。其中,各个参考点指示弯曲文本行中包括的各个字符的位置。
在一些实现方式中,可以利用预先训练好的基于神经网络的用于识别字符的目标检测模型对文本行图像进行处理,以得到弯曲文本行中包括的各个字符的字符检测框。
图3B示出了通过字符目标检测得到的文本行图像中的多个字符检测框的示例。每个字符检测框301中可以包括至少一个字符。如图3B所示,大多数的字符检测框中仅包括一个字符。然而,由于目标检测模型的检测存在一定误差,一些字符检测框中也可以包括多个字符。根据本公开的原理,不要求字符目标检测得到的字符检测框中包含的字符数量,只要字符检测的结果能够基本反映弯曲文本行中的字符的走势即可。
可以基于如图3B中示出的多个字符检测框的位置确定用于弯曲文本行中的多个参考点的位置。例如,可以将识别到的字符检测框中的至少一个字符检测框的中心点确定为参考点,也就是说,可以将至少一个字符检测框的中心点的位置确定为相应的参考点的位置。
在另一些实施例中,可以对文本行图像进行图像分割,以得到文本行图像中对应于弯曲文本行的文本行区域。例如,可以利用预先训练好的基于神经网络的用于文本行分割的图像分割模型对文本行图像进行逐像素的分割,以得到文本行图像中各个像素属于文本行或不属于文本行的分割结果,从而确定对应于弯曲文本行的文本行区域。
图3C中示出了通过图像分割得到的文本行图像中的文本行区域的示例。其中,黑色区域表示文本行图像中被确定为不包括字符的图像区域,白色区域表示文本行图像中被确定为字符所在的文本行区域。
利用如图3C中示出的文本行区域的高度以及预定步长,可以确定文本行区域中多个参考点的。
图3D示出了基于文本行区域的高度和预定步长确定参考点的示例。其中,预定步长可以指示预定的字符宽度。可以理解的是,本领域技术人员可以根据实际情况任意设置预定步长的值,这里的预定步长与弯曲文本行中的字符的真实宽度可以是不同的。图3D中示出的预定步长小于弯曲文本行中的字符的真实宽度。在其他实施例中,预定步长也可以大于弯曲文本行中的字符的真实宽度。
如图3D所示,可以基于预定步长对文本行区域进行分割,以得到多个模拟字符位置的模拟字符框302。可以将各个模拟字符框的中心点的位置确定为参考点的位置。在一些实施例中,参考点的横坐标可以是对应的模拟字符框的左边界和右边界的横坐标的平均值,参考点的纵坐标可以是模拟字符框的区域内的点的纵坐标均值。
在步骤S206中,可以基于步骤S304中确定的多个参考点确定用于弯曲文本行的文本行曲线。文本行曲线可以指示弯曲文本行中字符排列的具体形状。通过对表示弯曲文本行的形状的文本行曲线进行数学分析,可以获取用于将弯曲文本行拉直的调整参数。
如前所述,利用步骤S204可以得到指示弯曲文本行的位置的多个参考点的位置。通过对上述多个参考点进行曲线拟合,可以得到用于模拟弯曲文本行中字符所在的曲线的文本行曲线。
在一些实施例中,可以利用B样条插值的方法对多个参考点的位置进行曲线拟合,以得到文本行曲线的数学表达式。在另一些实施例中,也可以利用例如多项式拟合等任意曲线拟合的方式对多个参考点的位置进行曲线拟合。
图3E示出了利用B样条插值的方法得到的文本行曲线的示例。从图3E中示出的实例中可以看到,利用B样条插值的方法得到的文本行曲线303能够准确地对弯曲文本行中字符所在的曲线进行拟合。可以理解的是,在不脱离本公开原理的情况下,本领域技术人员可以采取任何能够对弯曲文本行进行良好拟合的数学方法进行拟合。
在步骤S208中,可以利用基于文本行曲线确定的调整参数对弯曲文本行进行调整,以得到对应于弯曲文本行的识别文本行。其中,识别文本行包括水平显示的多个字符。
由于识别文本行中包括的字符基本显示在同一水平线上,可以利用训练好的文字识别模型对识别文本行进行处理,以获取识别文本行中的文字数据。可以理解的是,由于识别文本行是通过对弯曲文本行进行拉直得到的结果,因此识别文本行中的文字数据与弯曲文本行中的文字数据是相同的。
利用本公开的实施例提供的图像处理方法,可以基于用于弯曲文本行的参考点的位置得到能够准确表示弯曲文本行的文本行曲线。通过利用文本行曲线得到的调整参数可以对弯曲文本行进行拉直。由于基于参考点得到的文本行曲线能够准确表示弯曲文本行的位置,因此利用本公开提供的方法可以得到更好的拉直效果。在后续的文字识别过程中,文字识别算法将可以直接对其中字符基本在同一水平线上进行显示的识别文本行进行文字识别。例如,可以采取端到端seq2seq深度学习模型对文字序列进行识别。
图4示出了根据本公开的实施例的对弯曲文本行进行调整的方法400的示例性的流程图。
在步骤S402中,可以确定用于弯曲文本行的多个文本子区域。
为了实现对弯曲文本行的拉直,可以将弯曲文本行划分成多个文本子区域,从而对弯曲文本行进行分段处理。例如,可以分别对各个文本子区域的显示效果进行调整,以使得各个文本子区域中的字符基本显示在同一水平线上。
在一些实施例中,多个文本子区域中的每个文本子区域可以包括单个字符。在另一些实施例中,多个文本子区域中的每个文本子区域可以包括至少两个字符。各个文本子区域中包括的字符数量可以是相同的,也可以是不同的。在又一些实施例中,多个文本子区域中的每个文本子区域可以包括单列像素的宽度。可以理解的是,以上描述仅作为将弯曲文本行进行分段处理的示例性的说明,而不作为本公开的限制。
在步骤S404中,对于多个文本子区域中的每个文本子区域,可以基于利用文本行曲线确定的用于该文本子区域的调整参数对该文本子区域进行调整。
可以理解的是,在弯曲文本行中,不同区域中的字符在图像上显示的位置是不同的,并且各个字符没有显示在同一水平线上。基于结合图2描述的方法获取的文本行曲线,可以为每个文本子区域确定相应的调整参数,用于对该文本子区域中的字符的显示的方向和位置中的至少一项进行调整,以使得各个文本子区域中的字符水平显示,从而实现拉直弯曲文本行的效果。
在一些实施例中,用于各个文本子区域的调整参数可以包括文本子区域的字符的排列方向与水平方向的角度。下文中将结合图7描述对文本子区域进行调整的具体方法过程,在此不再加以赘述。
在步骤S406中,可以基于调整后的文本子区域确定对应于弯曲文本行的识别文本行。
在一些实施例中,可以对调整后的多个文本子区域进行缩放,以使得调整后的多个文本子区域具有相同的高度。
由于调整后的文本子区域的高度依赖于调整过程中的旋转角度,因此调整后的多个文本子区域的高度可能是不同的。
为了使得调整后的多个文本子区域能够被拼接成一个文本行,可以通过对调整后的多个文本子区域尺寸进行缩放,以使得调整后的多个文本子区域具有相同的高度。
在一些实现方式中,可以仅在高度方向上对调整后的多个文本子区域的尺寸进行缩放。在另一些实现方式中,可以在高度方向和长度方向上对调整后的多个文本子区域的 尺寸进行等比例的缩放,以使得调整后的多个文本子区域具有相同的高度。例如,可以基于预先确定的基准高度对调整后的多个文本子区域进行缩放,以使得缩放后的多个文本子区域都具有基准高度。
可以在水平方向上对缩放后的多个文本子区域进行拼接,以得到识别文本行,其中所述识别文本行中的字符水平显示。
利用本公开提供的方法,可以对弯曲文本行中的多个字符进行分段处理,并基于文本行曲线确定的调整参数将弯曲文本行拉直。本公开提供的图像处理方法能够将任意长度的弯曲文本行进行拉直并获得具有水平显示的多个字符的识别文本行。
图5示出了根据本公开的实施例的确定弯曲文本行中的多个文本子区域的方法500的示例性的流程图。
在步骤S502中,可以确定文本行曲线上对应于至少一个点的位置处的斜率。
在步骤S504中,可以基于对应于上述至少一个点的位置处的斜率对弯曲文本行进行划分,以得到多个文本子区域,其中相邻的文本子区域对应于不同斜率。
如前所述,利用多个参考点拟合得到的文本行曲线能够模拟弯曲文本行中的字符的走势以及位置。通过确定文本行曲线中至少一个点的位置处的斜率,可以得到该处字符的走势。
如果文本行曲线上与相邻的两个参考点对应的两个位置处的斜率相近,则表示上述相邻的两个参考点之间存在的字符的走势相似。可以基于对应的斜率将走势相似的字符划分在同一文本子区域内。
图6A示出了在文本行曲线上确定的与至少一个参考点对应的位置处的斜率的示例。其中,在基于前述方法得到文本行曲线的数学表达的情况下,可以将与每个参考点横坐标相同的文本行曲线上的点确定为与该参考点对应的位置。图6A中示出的箭头指示文本行曲线在不同位置处的不同斜率。
在一些实施例中,可以基于文本行曲线上各点处的斜率确定相邻参考点之间斜率的变化率,并可以将斜率的变化率小于变化阈值的相邻点之间的区域划分在同一文本子区域内。在这种情况下,可以认为每个文本子区域内的字符对应于相同的斜率,也就是说,每个文本子区域内的字符的走势是基本相同的。
图6B中示出了基于文本行曲线上至少一个点处的斜率划分得到的多个文本子区域的示例。如图6B所示,各个文本子区域中的字符的排列方向与水平方向的夹角基本相同。如图6B所示,各个文本子区域的边界以带角度的四边形区域进行表示,其中每个文本子 区域的左边界和右边界垂直于水平方向,上边界和下边界与水平方向的夹角与该文本曲子区域对应的斜率所指示的字符走势和水平方向的夹角相同。此外,每个文本子区域的高度可以是基于弯曲文本行中的字符高度来得到的。例如,可以基于目标检测得到的字符检测框的高度确定文本子区域的高度。又例如,可以基于文本行分割的结果中文本行的高度确定文本子区域的高度。
图7示出了根据本公开的实施例的对弯曲文本行进行调整的方法的一种示例性的流程图。可以利用图7中示出的方法700对图6B中示出的文本子区域进行调整。
在步骤S702中,可以确定弯曲文本行的多个文本子区域中每个文本子区域的调整参数。其中,调整参数可以包括基于与该文本子区域对应的文本行曲线的斜率确定的该文本子区域中包括的字符的排列方向与水平方向之间的角度。
在步骤S704中,可以基于步骤S702中确定的该文本子区域中包括的字符的排列方向与水平方向之间的角度对该文本子区域进行调整,以使得该文本子区域中的字符水平显示。
可以对基于步骤S702中确定的该文本子区域中包括的字符的排列方向与水平方向之间的角度对整个文本行图像进行逆向旋转,以使得该文本子区域内的字符水平显示,并基于旋转角度得到旋转后的文本行图像中对应的文本子区域的四个顶点位置。可以将旋转后的文本子区域的四个顶点的最小外接矩形,并从旋转后的文本行图像中裁剪该最小外接矩形以得到调整后的文本子区域。其中该最小外接矩形的上边界和下边界平行于水平方向,左边界和右边界平行于竖直方向。
图8示出了根据图7中描述的方法得到的多个调整后的文本子区域拼接得到的识别文本行的示例。其中,调整后的多个文本子区域在水平方向上被拼接,从而得到用于文本识别的识别文本行。
图9示出了根据本公开的实施例的对弯曲文本行进行调整的方法的另一种示例性的流程图。
在步骤S902中,对于文本行图像中的每列像素,确定用于该列像素的调整参数。其中,对于文本行图像中的每列像素,用于该列像素的调整参数包括该列像素中文本行曲线上的点的纵坐标与基准位置之间的偏移。在一些实施例中,可以预先确定文本行图像中的基准位置。例如,可以将文本行图像的水平中线的位置确定为基准位置。又例如,可以将文本行图像中任一字符所在的水平线的位置确定为基准位置。再例如,可以将文本行图像中通过字符检测得到的字符检测框的平均纵坐标的值确定为基准位置。
在步骤S904中,可以利用调整参数调整文本行图像中各列像素的显示。例如,可以基于该列像素中文本行曲线上的点的纵坐标与基准位置之间的偏移在竖直方向上调整该列像素中文本行曲线上的点的位置,以使得该列像素中文本行曲线上的点的调整后的竖直位置与基准位置一致。
在步骤S906中,可以基于调整后的文本行图像确定所述识别文本行。例如,可以基于字符高度对调整后的文本行图像的图像背景进行裁剪,以得到识别文本行。
图10A-图10C示出了根据图9中描述的方法确定识别文本行的示例。
如图10A所示,对于列像素1001,可以确定该文本子区域内文本行曲线上的点位于基准线1002下方距离为d的位置。在这种情况下,可以将位于文本行曲线上的点的相对一侧(对于列像素1001来说,即位于基准线1002上方)的高度为d的像素点序列裁剪下来,将文本子区域1001中裁剪后剩余的部分向上移动d的距离,并将裁剪下的高度为d的像素点序列反向填充到文本行曲线上的点下方,从而使得列像素1001内文本行曲线上的点的位置被调整为与基准位置是一致的。
图10B示出了利用结合图10A描述的方法对文本行图像中各列像素点分别进行调整后得到的结果。如图10B所示,弯曲文本行的所有字符被调整成水平显示。图10C示出了对图10B中示出的结果根据字符高度进行裁剪后得到的识别文本行的结果。
利用本公开提供的上述对文本行图像进行调整的方法,可以基于文本行曲线的参数和基准位置,逐列对文本行图像中的像素进行反向填充,使得各列像素中对应于字符的像素基本显示在基准位置。利用上述方法能够方便地实现对于任意长度的弯曲文本行的拉直。
图11示出了根据本公开的实施例的文本识别过程1100的示例性的流程图。
在步骤1102中,可以对所获取的输入图像进行文本行检测,以得到包括单个文本行的文本行图像。
在步骤S1104中,可以对文本行图像中的文本行进行弯曲校正,以得到识别文本行,其中识别文本行包括水平显示的多个字符。可以利用前述结合图2-图10C描述的方法的过程对文本行图像中的弯曲文本行进行弯曲校正,在此不再加以赘述。
在步骤S1106中,可以对识别文本行进行文字识别,以得到文本行中包括的文字数据。
可以通过基于神经网络的训练好的文字识别模型对识别文本行进行处理。由于识别文本行中的字符以水平方式进行显示,文字识别模型无需直接识别弯曲文本行中的内容,因此减少了文字识别模型的复杂度,并提高了文字识别的准确性。
利用本公开提供的文本识别方法,通过先将弯曲文本行进行校正得到基本显示在同一水平线上的识别文本行,可以缓解了文本识别模型对弯曲程度较大/较长文本的识别压力,提高文本识别性能。
图12示出了根据本公开的实施例的图像处理装置的示例性的框图。
如图12所示,图像处理装置1200可以包括文本行检测单元1210、参考点确定单元1220、曲线确定单元1230以及识别文本确定单元1240。其中,文本行检测单元1210可以配置成对输入图像进行文本行检测,以得到包括弯曲文本行的文本行图像。参考点确定单元1220可以配置成确定文本行图像中用于弯曲文本行的多个参考点。曲线确定单元1230可以配置成基于所述多个参考点确定用于所述弯曲文本行的文本行曲线。识别文本确定单元1240可以配置成利用基于所述文本行曲线确定的调整参数对所述弯曲文本行进行调整,以得到对应于所述弯曲文本行的识别文本行,其中所述识别文本行包括水平显示的多个字符。
其中,文本行检测单元1210、参考点确定单元1220、曲线确定单元1230以及识别文本确定单元1240可以用于实现前述结合图2-图10C描述的图像处理方法的步骤,在此不再加以赘述。
利用本公开的实施例提供的图像处理装置,可以基于用于弯曲文本行的参考点的位置得到能够准确表示弯曲文本行的文本行曲线。通过利用文本行曲线得到的调整参数可以对弯曲文本行进行拉直。由于基于参考点得到的文本行曲线能够准确表示弯曲文本行的位置,因此利用本公开提供的方法可以得到更好的拉直效果。在后续的文字识别过程中,文字识别算法将可以直接对其中字符基本在同一水平线上进行显示的识别文本行进行文字识别。例如,可以采取端到端seq2seq深度学习模型对文字序列进行识别。
以上已经结合附图描述了根据本公开的示例性方法。下面将结合附图对利用本公开的电子电路以及电子设备等的示例性实施例进行进一步描述。
根据本公开的另一个方面,提供一种电子电路,包括:被配置为执行本公开中所述的方法的步骤的电路。
根据本公开的另一个方面,提供一种电子设备,包括:处理器;以及存储程序的存储器,所述程序包括指令,所述指令在由所述处理器执行时使所述处理器执行本公开中所述的方法。
根据本公开的另一个方面,提供一种存储程序的计算机可读存储介质,所述程序包括指令,所述指令在由电子设备的处理器执行时,致使所述电子设备执行本公开中所述的方法。
根据本公开的另一个方面,提供了一种计算机程序产品,包括计算机程序,其中,所述计算机程序在被处理器执行时实现本公开中所述的方法。
图13是示出根据本公开的示例性实施例的电子设备的示例的框图。要注意的是,图13所示出的结构仅是一个示例,根据具体的实现方式,本公开的电子设备可以仅包括图13所示出的组成部分中的一种或多个。
电子设备1300例如可以是通用计算机(例如膝上型计算机、平板计算机等等各种计算机)、移动电话、个人数字助理。根据一些实施例,电子设备1300可以是视障辅助设备。电子设备1300可以包括摄像机以及用于弯曲文本行校正的电子电路。其中,摄像机可以被配置为获取图像,其中图像中包括弯曲文本行,电子电路可以被配置为执行结合图2-图10C描述的用于文本行校正的图像处理方法。
根据一些实施方式,所述电子设备1300可以被配置为包括眼镜架或者被配置为能够可拆卸地安装到眼镜架(例如眼镜架的镜框、连接两个镜框的连接件、镜腿或任何其他部分)上,从而能够拍摄到近似包括用户的视野的图像。
根据一些实施方式,所述电子设备1300也可被安装到其它可穿戴设备上,或者与其它可穿戴设备集成为一体。所述可穿戴设备例如可以是:头戴式设备(例如头盔或帽子等)、可佩戴在耳朵上的设备等。根据一些实施例,所述电子设备可被实施为可附接到可穿戴设备上的配件,例如可被实施为可附接到头盔或帽子上的配件等。
根据一些实施方式,所述电子设备1300也可具有其他形式。例如,电子设备1300可以是移动电话、通用计算设备(例如膝上型计算机、平板计算机等)、个人数字助理,等等。电子设备1300也可以具有底座,从而能够被安放在桌面上。
电子设备1300可以包括摄像机1304,用于获取图像。摄像机1304可以包括但不限于摄像头或照相机等。电子设备1300还可以包括弯曲文本行校正电路(电子电路)1400,所述弯曲文本行校正电路(电子电路)1400包括被配置为执行如前所述的用于文本行校正的图像方法的步骤(例如结合图2-图10C描述方法步骤)的电路。
电子设备1300还可以包括文字识别电路1305,所述文字识别电路1305被配置为对弯曲文本行校正电路输出的图像中包含的校正后的文本行进行文字检测和/或识别(例如OCR处理),从而获得文字数据。所述文字识别电路1305例如可以通过专用芯片实现。电子设备1300还可以包括声音转换电路1306,所述声音转换电路1306被配置为将所述文字数据转换成声音数据。所述声音转换电路1306例如可以通过专用芯片实现。电子设备1300还可以包括声音输出电路1307,所述声音输出电路1307被配置为输出所述声音 数据。所述声音输出电路1307可以包括但不限于耳机、扬声器、或振动器等,及其相应驱动电路。
根据一些实施方式,所述电子设备1300还可以包括图像处理电路1308,所述图像处理电路1308可以包括被配置为对图像进行各种图像处理的电路。图像处理电路1308例如可以包括但不限于以下中的一个或多个:被配置为对图像进行降噪的电路、被配置为对图像进行去模糊化的电路、被配置为对图像进行几何校正的电路、被配置为对图像进行特征提取的电路、被配置为对图像中的对象进行对象检测和/或识别的电路、被配置为对图像中包含的文字进行文字检测的电路、被配置为从图像中提取文本行的电路、被配置为从图像中提取文字坐标的电路、被配置为从图像中提取对象框的电路、被配置为从图像中提取文本框的电路、被配置为基于图像进行版面分析(例如段落划分)的电路,等等。
根据一些实施方式,电子设备1300还可以包括文字处理电路1309,所述文字处理电路1309可以被配置为基于所提取的与文字有关的信息(例如文字数据、文本框、段落坐标、文本行坐标、文字坐标等)进行各种处理,从而获得诸如段落排序、文字语义分析、版面分析结果等处理结果。
上述的各种电路(例如文字识别电路1305、声音转换电路1306、声音输出电路1307、图像处理电路1308、文字处理电路1309、弯曲文本行校正电路(电子电路)1400)中的一个或多个可以使用定制硬件,和/或可以用硬件、软件、固件、中间件、微代码,硬件描述语言或其任何组合来实现。例如,上述的各种电路中的一个或多个可以通过使用根据本公开的逻辑和算法,用汇编语言或硬件编程语言(诸如VERILOG,VHDL,C++)对硬件(例如,包括现场可编程门阵列(FPGA)和/或可编程逻辑阵列(PLA)的可编程逻辑电路)进行编程来实现。
根据一些实施方式,电子设备1300还可以包括通信电路1310,所述通信电路1310可以是使得能够与外部设备和/或与网络通信的任何类型的设备或系统,并且可以包括但不限于调制解调器、网卡、红外通信设备、无线通信设备和/或芯片组,例如蓝牙设备、802.11设备、WiFi设备、WiMax设备、蜂窝通信设备和/或类似物。
根据一些实施方式,电子设备1300还可以包括输入设备1311,所述输入设备1311可以是能向电子设备1300输入信息的任何类型的设备,并且可以包括但不限于各种传感器、鼠标、键盘、触摸屏、按钮、控制杆、麦克风和/或遥控器等等。
根据一些实施方式,电子设备1300还可以包括输出设备1312,所述输出设备1312可以是能呈现信息的任何类型的设备,并且可以包括但不限于显示器、视觉输出终端、振动器和/或打印机等。尽管电子设备1300根据一些实施例用于视障辅助设备,基于视觉的输出设备可以方便用户的家人或维修工作人员等从电子设备1300获得输出信息。
根据一些实施方式,电子设备1300还可以包括处理器1301。所述处理器1301可以是任何类型的处理器,并且可以包括但不限于一个或多个通用处理器和/或一个或多个专用处理器(例如特殊处理芯片)。处理器1301例如可以是但不限于中央处理单元CPU或微处理器MPU等等。电子设备1300还可以包括工作存储器1302,所述工作存储器1302可以存储对处理器1301的工作有用的程序(包括指令)和/或数据(例如图像、文字、声音,以及其他中间数据等)的工作存储器,并且可以包括但不限于随机存取存储器和/或只读存储器设备。电子设备1300还可以包括存储设备1303,所述存储设备1303可以包括任何非暂时性存储设备,非暂时性存储设备可以是非暂时性的并且可以实现数据存储的任何存储设备,并且可以包括但不限于磁盘驱动器、光学存储设备、固态存储器、软盘、柔性盘、硬盘、磁带或任何其他磁介质,光盘或任何其他光学介质、ROM(只读存储器)、RAM(随机存取存储器)、高速缓冲存储器和/或任何其他存储器芯片或盒、和/或计算机可从其读取数据、指令和/或代码的任何其他介质。工作存储器1302和存储设备1303可以被集合地称为“存储器”,并且在有些情况下可以相互兼用。
根据一些实施方式,处理器1301可以对摄像机1304、文字识别电路1305、声音转换电路1306、声音输出电路1307、图像处理电路1308、文字处理电路1309、通信电路1310、弯曲文本行校正电路(电子电路)1400以及电子设备1300包括的其他各种装置和电路中的至少一个进行控制和调度。根据一些实施方式,图13中所述的各个组成部分中的至少一些可通过总线1313而相互连接和/或通信。
软件要素(程序)可以位于所述工作存储器1302中,包括但不限于操作系统1302a、一个或多个应用程序1302b、驱动程序和/或其他数据和代码。
根据一些实施方式,用于进行前述的控制和调度的指令可以被包括在操作系统1302a或者一个或多个应用程序1302b中。
根据一些实施方式,执行本公开所述的方法步骤(例如结合图2-图10C描述的方法步骤)的指令可以被包括在一个或多个应用程序1302b中,并且上述电子设备1300的各个模块可以通过由处理器1301读取和执行一个或多个应用程序1302b的指令来实现。换言之,电子设备1300可以包括处理器1301以及存储程序的存储器(例如工作存储器1302 和/或存储设1303),所述程序包括指令,所述指令在由所述处理器1301执行时使所述处理器1301执行如本公开各种实施例所述的方法。
根据一些实施方式,文字识别电路1305、声音转换电路1306、图像处理电路1308、文字处理电路1309、弯曲文本行校正电路(电子电路)1400中的至少一个所执行的操作中的一部分或者全部可以由处理器1301读取和执行一个或多个应用程序1302的指令来实现。
软件要素(程序)的指令的可执行代码或源代码可以存储在非暂时性计算机可读存储介质(例如所述存储设备1303)中,并且在执行时可以被存入工作存储器1302中(可能被编译和/或安装)。因此,本公开提供存储程序的计算机可读存储介质,所述程序包括指令,所述指令在由电子设备(例如视障辅助设备)的处理器执行时,致使所述电子设备执行如本公开各种实施例所述的方法。根据另一种实施方式,软件要素(程序)的指令的可执行代码或源代码也可以从远程位置下载。
还应该理解,可以根据具体要求而进行各种变型。例如,也可以使用定制硬件,和/或可以用硬件、软件、固件、中间件、微代码,硬件描述语言或其任何组合来实现各个电路、单元、模块或者元件。例如,所公开的方法和设备所包含的电路、单元、模块或者元件中的一些或全部可以通过使用根据本公开的逻辑和算法,用汇编语言或硬件编程语言(诸如VERILOG,VHDL,C++)对硬件(例如,包括现场可编程门阵列(FPGA)和/或可编程逻辑阵列(PLA)的可编程逻辑电路)进行编程来实现。
根据一些实施方式,电子设备1300中的处理器1301可以分布在网络上。例如,可以使用一个处理器执行一些处理,而同时可以由远离该一个处理器的另一个处理器执行其他处理。电子设备1300的其他模块也可以类似地分布。这样,电子设备1300可以被解释为在多个位置执行处理的分布式计算系统。
虽然已经参照附图描述了本公开的实施例或示例,但应理解,上述的方法、系统和设备仅仅是示例性的实施例或示例,本发明的范围并不由这些实施例或示例限制,而是仅由授权后的权利要求书及其等同范围来限定。实施例或示例中的各种要素可以被省略或者可由其等同要素替代。此外,可以通过不同于本公开中描述的次序来执行各步骤。进一步地,可以以各种方式组合实施例或示例中的各种要素。重要的是随着技术的演进,在此描述的很多要素可以由本公开之后出现的等同要素进行替换。

Claims (17)

  1. 一种图像处理方法,包括:
    对输入图像进行文本行检测,以得到包括所述弯曲文本行的文本行图像;
    确定所述文本行图像中用于所述弯曲文本行的多个参考点;
    基于所述多个参考点确定用于所述弯曲文本行的文本行曲线;
    利用基于所述文本行曲线确定的调整参数对所述弯曲文本行进行调整,以得到对应于所述弯曲文本行的识别文本行,其中所述识别文本行包括水平显示的多个字符。
  2. 如权利要求1所述的图像处理方法,其中,确定所述文本行图像中用于所述弯曲文本行的多个参考点包括:
    对所述文本行图像进行字符目标检测,以得到用于所述弯曲文本行的多个参考点,其中各个参考点指示所述弯曲文本行中的各个字符的位置。
  3. 如权利要求1所述的图像处理方法,其中,确定所述文本行图像中用于所述弯曲文本行的多个参考点包括:
    对所述文本行图像进行图像分割,以得到所述文本行图像中对应于所述弯曲文本行的文本行区域;
    基于所述文本行区域的高度以及预定步长,确定所述文本行区域中用于所述弯曲文本行的多个参考点。
  4. 如权利要求1-3中任一项所述的图像处理方法,其中,基于所述多个参考点确定用于所述弯曲文本行的文本行曲线包括:
    基于B样条插值的方法对所述多个参考点的位置进行曲线拟合,以得到所述文本行曲线。
  5. 如权利要求1所述的图像处理方法,其中,利用基于所述文本行曲线确定的调整参数对所述弯曲文本行进行调整,以得到对应于所述弯曲文本行的识别文本行包括:
    确定用于所述弯曲文本行的多个文本子区域;
    对于所述多个文本子区域中的每个文本子区域,基于利用所述文本行曲线确定的用于该文本子区域的调整参数对该文本子区域进行调整;
    基于调整后的文本子区域确定对应于所述弯曲文本行的识别文本行。
  6. 如权利要求5所述的图像处理方法,其中,确定所述弯曲文本行中的多个文本子区域包括:
    确定所述文本行曲线上至少一个点的位置处的斜率;
    基于对应于所述至少一个点的位置处的斜率对所述弯曲文本行进行划分,以得到多个文本子区域,其中相邻的文本子区域对应于不同斜率。
  7. 如权利要求6所述的图像处理方法,其中,所述多个文本子区域中每个文本子区域的调整参数包括基于该文本子区域对应的斜率确定的该文本子区域中包括的字符的排列方向与水平方向之间的角度。
  8. 如权利要求7所述的图像处理方法,其中,基于利用所述文本行曲线确定的用于该文本子区域的调整参数对该文本子区域进行调整包括:
    基于所述角度对该文本子区域进行调整,以使得调整后的文本子区域中的字符水平显示。
  9. 如权利要求5-8中任一项所述的图像处理方法,其中,基于调整后的文本子区域确定对应于所述弯曲文本行的识别文本行包括:
    对调整后的多个文本子区域进行缩放,以使得所述调整后的多个文本子区域具有相同的高度;
    在水平方向上对缩放后的多个文本子区域进行拼接,以得到所述识别文本行,其中所述识别文本行中的字符水平显示。
  10. 如权利要求1所述的图像处理方法,其中,利用基于所述文本行曲线确定的调整参数对所述弯曲文本行进行调整,以得到对应于所述弯曲文本行的识别文本行包括:
    对于所述文本行图像中的每列像素,确定用于该列像素的调整参数;
    利用所述调整参数调整所述文本行图像中各列像素的显示;
    基于调整后的文本行图像确定所述识别文本行。
  11. 如权利要求10所述的图像处理方法,其中,对于所述文本行图像中的每列像素,用于该列像素的调整参数包括该列像素中所述文本行曲线上的点的纵坐标与基准位置之间的偏移。
  12. 如权利要求11所述的图像处理方法,其中,利用所述调整参数调整该列像素的显示包括:
    基于所述偏移在竖直方向上调整该列像素中文本行曲线上的点的位置,以使得该列像素中文本行曲线上的点的调整后的竖直位置与所述基准位置一致。
  13. 一种电子电路,包括:
    被配置为执行根据权利要求1-12中任一项所述的方法的步骤的电路。
  14. 一种视障辅助设备,包括:
    摄像机,被配置为获取图像,其中所述图像中包括弯曲文本行;
    由如权利要求13所述的电子电路实现的弯曲文本行校正电路;
    被配置为对所述弯曲文本行校正电路得到的识别文本行进行文字检测和/或识别以获得文字数据的电路;
    被配置为将所述文字数据转换成声音数据的电路;以及
    被配置为输出所述声音数据的电路。
  15. 一种电子设备,包括:
    处理器;以及
    存储程序的存储器,所述程序包括指令,所述指令在由所述处理器执行时使所述处理器执行根据权利要求1-12中任一项所述的方法。
  16. 一种存储程序的非暂态计算机可读存储介质,所述程序包括指令,所述指令在由电子设备的处理器执行时,致使所述电子设备执行根据权利要求1-12中任一项所述的方法。
  17. 一种计算机程序产品,包括计算机程序,其中,所述计算机程序在被处理器执行时实现权利要求1-12中任一项所述的方法。
PCT/CN2022/092625 2021-05-13 2022-05-13 图像处理方法、电子电路、视障辅助设备和介质 WO2022237893A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110523036.8A CN113139537A (zh) 2021-05-13 2021-05-13 图像处理方法、电子电路、视障辅助设备和介质
CN202110523036.8 2021-05-13

Publications (1)

Publication Number Publication Date
WO2022237893A1 true WO2022237893A1 (zh) 2022-11-17

Family

ID=76817540

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/092625 WO2022237893A1 (zh) 2021-05-13 2022-05-13 图像处理方法、电子电路、视障辅助设备和介质

Country Status (2)

Country Link
CN (1) CN113139537A (zh)
WO (1) WO2022237893A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113139537A (zh) * 2021-05-13 2021-07-20 上海肇观电子科技有限公司 图像处理方法、电子电路、视障辅助设备和介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109753971A (zh) * 2017-11-06 2019-05-14 阿里巴巴集团控股有限公司 扭曲文字行的矫正方法及装置、字符识别方法及装置
US20190188528A1 (en) * 2016-12-08 2019-06-20 Tencent Technology (Shenzhen) Company Limited Text detection method and apparatus, and storage medium
CN111191649A (zh) * 2019-12-31 2020-05-22 上海眼控科技股份有限公司 一种识别弯曲多行文本图像的方法与设备
CN113139537A (zh) * 2021-05-13 2021-07-20 上海肇观电子科技有限公司 图像处理方法、电子电路、视障辅助设备和介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190188528A1 (en) * 2016-12-08 2019-06-20 Tencent Technology (Shenzhen) Company Limited Text detection method and apparatus, and storage medium
CN109753971A (zh) * 2017-11-06 2019-05-14 阿里巴巴集团控股有限公司 扭曲文字行的矫正方法及装置、字符识别方法及装置
CN111191649A (zh) * 2019-12-31 2020-05-22 上海眼控科技股份有限公司 一种识别弯曲多行文本图像的方法与设备
CN113139537A (zh) * 2021-05-13 2021-07-20 上海肇观电子科技有限公司 图像处理方法、电子电路、视障辅助设备和介质

Also Published As

Publication number Publication date
CN113139537A (zh) 2021-07-20

Similar Documents

Publication Publication Date Title
CN109359575B (zh) 人脸检测方法、业务处理方法、装置、终端及介质
WO2022134337A1 (zh) 人脸遮挡检测方法、系统、设备及存储介质
US10616475B2 (en) Photo-taking prompting method and apparatus, an apparatus and non-volatile computer storage medium
CN110610453B (zh) 一种图像处理方法、装置及计算机可读存储介质
US10467466B1 (en) Layout analysis on image
CN108090450B (zh) 人脸识别方法和装置
JP7132654B2 (ja) レイアウト解析方法、読取り支援デバイス、回路および媒体
JP2016523397A (ja) 情報認識のための方法及びシステム
WO2019196745A1 (zh) 人脸建模方法及相关产品
EP3998576A2 (en) Image stitching method and apparatus, device, and medium
CN111652796A (zh) 图像处理方法、电子设备及计算机可读存储介质
WO2023035531A1 (zh) 文本图像超分辨率重建方法及其相关设备
EP3866475A1 (en) Image text broadcasting method and device, electronic circuit, and computer program product
US20220301108A1 (en) Image quality enhancing
WO2019119396A1 (zh) 人脸表情识别方法及装置
WO2022237893A1 (zh) 图像处理方法、电子电路、视障辅助设备和介质
US20200410737A1 (en) Image display method and device applied to electronic device, medium, and electronic device
CN110827301B (zh) 用于处理图像的方法和装置
CN114549557A (zh) 一种人像分割网络训练方法、装置、设备及介质
CN115761826A (zh) 掌静脉有效区域提取方法、系统、介质及电子设备
CN113780201B (zh) 手部图像的处理方法及装置、设备和介质
CN109376618B (zh) 图像处理方法、装置及电子设备
CN110751004A (zh) 二维码检测方法、装置、设备及存储介质
US11776286B2 (en) Image text broadcasting
US9786030B1 (en) Providing focal length adjustments

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22806855

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE