CN113920286A

CN113920286A - Character positioning method and device

Info

Publication number: CN113920286A
Application number: CN202010575360.XA
Authority: CN
Inventors: 不公告发明人
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Beijing ByteDance Network Technology Co Ltd
Priority date: 2020-06-22
Filing date: 2020-06-22
Publication date: 2022-01-11

Abstract

The embodiment of the disclosure discloses a character positioning method and a character positioning device. One embodiment of the method comprises: acquiring a target image; carrying out text line detection on the target image to obtain text line position information of a text line in the target image; performing character recognition on an image area located at the position indicated by the text line position information in the target image to obtain a character sequence included in the text line; and generating character position information of the characters in the text line according to the character sequence. The implementation mode determines the position of the character through text line detection and character recognition, can improve the speed of single character positioning, reduces the calculation amount of single character positioning, and reduces the consumption of calculation resources in the positioning process.

Description

Character positioning method and device

Technical Field

The embodiment of the disclosure relates to the technical field of computers, in particular to a character positioning method and device.

Background

Character recognition is a technology for automatically recognizing characters by using a computer, and is an important field of pattern recognition application.

In the prior art, character recognition methods are basically divided into three categories, namely statistics, logic judgment and syntax. Among them, the common methods include a template matching method and a geometric feature extraction method. The template matching method is to match the input characters with the given standard characters (templates) of each category, calculate the similarity degree between the input characters and each template, and take the category with the maximum similarity as the recognition result. The character extraction method can extract some geometric characters of the characters, such as end points, branch points, concave-convex parts of the characters, line segments in all directions of horizontal, vertical and inclined directions, closed loops and the like, and carry out logical combination judgment according to the positions and the mutual relations of the characters to obtain a recognition result.

Disclosure of Invention

The disclosure provides a character positioning method and device.

In a first aspect, an embodiment of the present disclosure provides a character positioning method, including: acquiring a target image; carrying out text line detection on the target image to obtain text line position information of a text line in the target image; performing character recognition on an image area located at the position indicated by the text line position information in the target image to obtain a character sequence included in the text line; and generating character position information of the characters in the text line according to the character sequence.

In some embodiments, the sequence of characters includes placeholder characters and non-placeholder characters; and generating character position information of the characters in the text line according to the character sequence, including: and generating non-placeholder character position information of each non-placeholder character in the text line according to placeholder character position information of placeholder characters in the character sequence.

In some embodiments, the sequence of characters includes placeholder characters and non-placeholder characters; and generating character position information of the characters in the text line according to the character sequence, including: generating character position information of placeholder characters and character position information of each non-placeholder character in the character sequence according to the character sequence; and for two adjacent space-occupying characters in the character sequence, in response to that each non-space-occupying character between the two adjacent space-occupying characters in the character sequence is the same, generating character position information of a character corresponding to the non-space-occupying character between the two adjacent space-occupying characters in the text line according to the non-space-occupying character position information of each non-space-occupying character between the two adjacent space-occupying characters.

In some embodiments, the text line position information is represented by a quadrilateral frame, and the character sequence includes placeholder characters and non-placeholder characters; and generating character position information of the characters in the text line according to the character sequence, including: generating character position information of an placeholder character in the character sequence according to the character sequence; aiming at two adjacent non-space-occupying characters in the character sequence, determining segmentation position information for segmenting the quadrilateral frame according to space-occupying character position information of the space-occupying character between the two adjacent non-space-occupying characters; the quadrangular frame is divided according to the position indicated by each obtained division position information, and the division result is determined as character position information of the characters in the text line.

In some embodiments, the generating the character position information of the characters in the text line according to the character sequence includes: generating position information of the characters in the text line in the image area according to the character sequence; and determining character position information of the character in the target image according to the text line position information and the position information of the character in the image area.

In some embodiments, the acquiring the target image includes: acquiring an image presented by a target display screen as a target image; and, the above method further comprises: and in response to the detection of the user operation on the characters in the target image, determining the characters to be operated indicated by the user operation from the text lines indicated by the text line position information according to the operation position of the user operation on the target display screen and the character position information of the characters in the text lines.

In some embodiments, the above method further comprises: and outputting output information pre-associated with the character to be operated.

In some embodiments, the above method further comprises: and processing the user operation instruction according to the instruction of the user operation on the character to be operated.

In a second aspect, an embodiment of the present disclosure provides a character positioning apparatus, including: an acquisition unit configured to acquire a target image; the detection unit is configured to detect text lines of the target image to obtain text line position information of the text lines in the target image; a recognition unit configured to perform character recognition on an image area located at a position indicated by the text line position information in the target image to obtain a character sequence included in the text line; and a generating unit configured to generate character position information of the characters in the text line according to the character sequence.

In some embodiments, the sequence of characters includes placeholder characters and non-placeholder characters; and, the generating unit includes: and the first generating subunit is configured to generate non-placeholder character position information of each non-placeholder character in the text line according to placeholder character position information of placeholder characters in the character sequence.

In some embodiments, the sequence of characters includes placeholder characters and non-placeholder characters; and, the generating unit includes: the second generating subunit is configured to generate character position information of placeholder characters and character position information of each non-placeholder character in the character sequence according to the character sequence; and a third generating subunit, configured to, for two adjacent space-occupying characters in the character sequence, in response to that each non-space-occupying character between the two adjacent space-occupying characters in the character sequence is the same, generate, according to the non-space-occupying character position information of each non-space-occupying character between the two adjacent space-occupying characters, character position information of a character in the text line corresponding to the non-space-occupying character between the two adjacent space-occupying characters.

In some embodiments, the text line position information is represented by a quadrilateral frame, and the character sequence includes placeholder characters and non-placeholder characters; and, the generating unit includes: a fourth generating subunit, configured to generate, according to the character sequence, character position information of a placeholder character in the character sequence; a first determining subunit, configured to determine, for two adjacent non-space-occupying characters in the character sequence, segmentation position information for segmenting the quadrilateral frame according to space-occupying character position information of a space-occupying character between the two adjacent non-space-occupying characters; and a second determination subunit configured to divide the quadrangular frame by the positions indicated by the obtained respective division position information, and determine the division result as character position information of the characters in the text line.

In some embodiments, the generating unit includes: a fifth generating subunit configured to generate position information of the characters in the text line in the image area, based on the character sequence; and a third determining subunit configured to determine character position information of a character in the target image based on the text line position information and the position information of the character in the image area.

In some embodiments, the obtaining unit includes: an acquisition subunit configured to acquire an image presented by a target display screen as a target image; and, the above-mentioned apparatus also includes: and the determining unit is configured to determine the character to be operated indicated by the user operation from the text line indicated by the text line position information according to the operation position of the user operation on the target display screen and the character position information of the character in the text line in response to the detection of the user operation on the character in the target image.

In some embodiments, the above apparatus further comprises: and the output unit is configured to output information which is pre-associated with the character to be operated.

In some embodiments, the above apparatus further comprises: and the processing unit is configured to perform the processing of the user operation instruction on the character to be operated according to the instruction of the user operation.

In a third aspect, an embodiment of the present disclosure provides a character-location electronic device, including: one or more processors; a storage device, on which one or more programs are stored, which, when executed by the one or more processors, cause the one or more processors to implement the method of any of the embodiments of the character location method described above.

In a fourth aspect, embodiments of the present disclosure provide a computer-readable medium on which a computer program is stored, which when executed by a processor, implements the method of any of the embodiments of the character location method described above.

According to the character positioning method and device provided by the embodiment of the disclosure, a target image is obtained firstly, then text line detection is carried out on the target image to obtain text line position information of a text line in the target image, then character recognition is carried out on an image area in the target image, which is located at a position indicated by the text line position information, to obtain a character sequence included by the text line, and finally, character position information of characters in the text line is generated according to the character sequence, so that the positions of the characters can be determined through the text line detection and the character recognition, the speed of single character positioning can be improved, the calculation amount of the single character positioning can be reduced, and the consumption of calculation resources in the positioning process can be reduced.

Drawings

Other features, objects and advantages of the disclosure will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present disclosure may be applied;

FIG. 2 is a flow diagram for one embodiment of a character location method according to the present disclosure;

3A-3C are schematic diagrams of an application scenario of a character location method according to the present disclosure;

FIG. 4 is a flow diagram of yet another embodiment of a character location method according to the present disclosure;

FIGS. 5A-5B are schematic diagrams of yet another application scenario of a character location method according to the present disclosure;

FIG. 6 is a schematic structural diagram of one embodiment of a character-locating device according to the present disclosure;

FIG. 7 is a schematic block diagram of a computer system suitable for use with an electronic device implementing embodiments of the present disclosure.

Detailed Description

The present disclosure is described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 illustrates an exemplary system architecture 100 to which embodiments of a character localization method or a character localization apparatus of embodiments of the present disclosure may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

A user may use the

terminal devices

101, 102, 103 to interact with the server 105 over the network 104 to receive or transmit data (e.g., target images), etc. The

terminal devices

101, 102, 103 may have various client applications installed thereon, such as an image processing application, video playing software, news information application, web browser application, shopping application, search application, instant messaging tool, mailbox client, social platform software, and the like.

The

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices including, but not limited to, smart phones, tablet computers, laptop portable computers, desktop computers, and the like. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., software or software modules used to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

The server 105 may be a server that provides various services, such as a background server that performs character localization on a target image transmitted by the

terminal devices

101, 102, 103. The background server may perform text line detection on the target image to obtain text line position information of the target image, perform character recognition on an image area in the target image at a position indicated by the text line position information to obtain a character sequence included in the text line, and generate character position information of characters in the text line according to the character sequence. Optionally, after generating the character position information, the background server may also feed back the character position information to the terminal device. As an example, the server 105 may be a cloud server.

The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., software or software modules used to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.

It should be further noted that the character positioning method provided by the embodiments of the present disclosure may be executed by a server, may also be executed by a terminal device, and may also be executed by the server and the terminal device in cooperation with each other. Accordingly, each part (for example, each unit and sub-unit) included in the character positioning apparatus may be entirely disposed in the server, may be entirely disposed in the terminal device, and may be disposed in the server and the terminal device, respectively.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. When the electronic device on which the character localization method operates does not need to perform data transmission with other electronic devices, the system architecture may include only the electronic device (e.g., a server or a terminal device) on which the character localization method operates.

With continued reference to FIG. 2, a flow 200 of one embodiment of a character location method according to the present disclosure is shown. The character positioning method comprises the following steps:

step 201, acquiring a target image.

In this embodiment, an execution subject of the character positioning method (for example, a server or a terminal device shown in fig. 1) may acquire the target image from other electronic devices or locally through a wired connection manner or a wireless connection manner. Wherein the target image may be an image containing characters.

In practice, the characters contained in the above-mentioned target image may be any symbols. As an example characters in the target image may include, but are not limited to, at least one of: words, letters, numbers, operands, punctuation, and other symbols, as well as some functional symbols. As an example, the characters contained in the target image may be space-occupying characters (e.g., spaces), characters (e.g., chinese characters). In general, the characters contained in the target image may be arranged in lines. Here, the characters arranged in lines may be referred to as text lines.

Step 202, performing text line detection on the target image to obtain text line position information of a text line in the target image.

In this embodiment, the executing entity may perform text line detection on the target image acquired in step 201 to obtain text line position information of a text line in the target image.

Where text line detection may be used to determine text line position information for text lines in an image. As an example, the execution body may adopt Text line detection based on EAST (Efficient and Accurate Scene Text detection), Text line detection based on PixelLink, or other Text line detection, to obtain Text line position information of a Text line in the target image.

The text line position information may be used to indicate the position of the text line in the target image. In practice, the representation mode of the text line position information can be determined according to actual needs. Illustratively, the text line position information may be characterized by a quadrangular box. For example, the quadrangle may be a rectangle or a quadrangle such as a parallelogram. Here, the sides of the quadrangular frame may be parallel to or non-parallel to the sides of a rectangle indicated by the boundary range of the target image (the boundary range of the normal image is a rectangle). In addition, the text line position information can also be represented by two line segments. When the word sizes of all characters in the text line are equal, the text line position information can be represented by two parallel line segments; when the font sizes of the characters in the text line are sequentially increased or decreased, the text line position information can be represented by two non-parallel line segments. Optionally, the text line position information may also be represented by coordinates.

In some cases, the text line position information may indicate a position of the curved text. In this case, the quadrangular frame may not include a side parallel to the side of the rectangle indicated by the boundary range of the target image.

Step 203, performing character recognition on the image area in the target image, where the image area is located at the position indicated by the text line position information, to obtain a character sequence included in the text line.

In this embodiment, the execution main body may perform character recognition on an image area in the target image, the image area being located at a position indicated by the text line position information, to obtain a character sequence included in the text line.

Wherein character recognition may be used to determine a sequence of characters in a text line in an image region. As an example, when an image region contains a text line "from grass on the original, dry and go by year", the character sequence included in the text line may be "from grass on the original, dry and go by year". It will be appreciated that images may be rendered of lines of text by variations in the arrangement and color of pixels in the image. However, the execution body described above cannot directly obtain the content of the text line in the image. Through character recognition, the execution body can obtain a character sequence included by the text line, so that the content of the text line in the image is obtained.

As an example, the execution body may obtain the character sequence included in the text line by using CNN (Convolutional Neural Networks) character recognition, CRNN (Convolutional Recurrent Neural Networks) character recognition, or other character recognition methods.

It is to be understood that the above character sequence is obtained by the above execution main body through character recognition. Thus, character sequences may include overlapping characters. For example, for the text line "from the original, the executing entity may get the character sequence" from the original, and on the cursive. The Chinese character input method comprises the following steps that a first Chinese character comprises 3 space-occupying characters before separating, a second Chinese character comprises 2 space-occupying characters between separating and separating, a fourth Chinese character comprises 3 space-occupying characters between separating and separating, a fifth Chinese character comprises 3 space-occupying characters between separating and separating, a seventh Chinese character comprises 2 space-occupying characters between separating and separating, an eighth Chinese character comprises 3 space-occupying characters between separating and separating, a ninth Chinese character comprises 3 space-occupying characters between separating and separating, and an eleventh Chinese character comprises 3 space-occupying characters after separating.

And step 204, generating character position information of the characters in the text line according to the character sequence.

In this embodiment, the execution main body may generate character position information of characters in the text line according to the character sequence.

Here, the execution body may generate character position information of all or part of the characters in the text line. The character position information of the characters in the text line may represent the position of the character in the target image, and may also represent the position of the character in the image area where the text line containing the character is located.

In some optional implementations of the embodiment, the sequence of characters includes placeholder characters and non-placeholder characters. Wherein a placeholder character may characterize a distance between two adjacent non-placeholder characters in the sequence of characters. Generally, the greater the distance between two adjacent non-placeholder characters, the greater the number of placeholder characters between the two adjacent non-placeholder characters. Illustratively, the placeholder character may be a space and the non-placeholder character may be any character other than the placeholder character.

Here, the placeholder character may be a predetermined one of the known characters. For example, the placeholder characters may be spaces, may be "□," or may be other predetermined characters.

Based on this, the execution main body may execute the step 204 by:

and generating non-placeholder character position information of each non-placeholder character in the text line according to placeholder character position information of placeholder characters in the character sequence.

The character position information of the placeholder characters in the character sequence can represent the appearance positions of the placeholder characters in the character sequence. Optionally, the character position information of the placeholder character in the character sequence may represent a position of the placeholder character in the target image, and may also represent a position of the placeholder character in an image area where the text line containing the placeholder character is located.

Here, the execution subject may first determine placeholder character position information of all or part of placeholder characters in the character sequence, so that on the basis of obtaining placeholder character position information of placeholder characters in the character sequence, the execution subject may calculate a distance between two determined target adjacent placeholder characters (the distance may be characterized by a number of non-placeholder characters between the two target adjacent placeholder characters or a number of pixels between the two target adjacent placeholder characters), and if the distance between the two target adjacent placeholder characters is greater than a first preset distance threshold, the execution subject may determine that a non-placeholder character exists between the two target adjacent placeholder characters. Thus, the execution subject may determine an arbitrary position (e.g., an intermediate position) between the two target adjacent placeholder characters as the segmentation position of the non-placeholder character. And then obtaining the position information of the non-placeholder characters in the text line. Wherein the two target adjacent placeholder characters may be two placeholder characters in a sequence of characters without placeholder characters (which may include non-placeholder characters) therebetween. For example, for the character sequence "□ from □ □ from □ □ □", two target adjacent placeholder characters may include: first □ and second □, third □ and fourth □, fourth □ and fifth □, fifth □ and sixth □.

Optionally, the executing body may also determine the positions of at least two placeholder characters (i.e. placeholder character groups) in the character sequence, which do not include the non-placeholder character therebetween.

For example, if the character sequence is "grass on the departure origin". The first Chinese character "leaves" includes 3 place-occupying characters before (hereinafter, the 3 place-occupying characters are referred to as a first place-occupying character group), 2 place-occupying characters between the second Chinese character "leaves" and the third Chinese character "leaves" (hereinafter, the 2 place-occupying characters are referred to as a second place-occupying character group), 3 place-occupying characters between the fourth Chinese character "leaves" and the fifth Chinese character "original" (hereinafter, the 3 place-occupying characters are referred to as a third place-occupying character group), 2 place-occupying characters between the seventh Chinese character "original" and the eighth Chinese character "upper" (hereinafter, the 2 place-occupying characters are referred to as a fourth place-occupying character group), 3 place-occupying characters between the ninth Chinese character "upper" and the tenth Chinese character "upper" grass "(hereinafter, the 3 place-occupying characters are referred to as a fifth place-occupying character group), and 3 place-occupying characters after the eleventh Chinese character" grass "(hereinafter, the 3 place-occupying characters are referred to as a sixth place-occupying character group). It will be appreciated that the position of the placeholder character set may be determined based on the position of the placeholder character in the placeholder character set. The placeholder character set may be a subsequence of characters in the sequence of characters that do not include a non-placeholder character therebetween and for which both the preceding and following characters are non-placeholder characters.

Then, the execution main body may generate non-placeholder character position information of each non-placeholder character in the text line according to the obtained position of the placeholder character group.

It can be understood that the above optional implementation manner may generate the non-placeholder character position information of each non-placeholder character in the text line according to the placeholder character position information of the placeholder character in the character sequence, thereby improving the accuracy of character positioning.

In some optional implementations of the embodiment, the sequence of characters includes placeholder characters and non-placeholder characters. Wherein a placeholder character may characterize a distance between two adjacent non-placeholder characters in the sequence of characters. Generally, the greater the distance between two adjacent non-placeholder characters, the greater the number of placeholder characters between the two adjacent non-placeholder characters. Illustratively, the placeholder characters may be spaces and the non-placeholder characters may include any characters other than placeholder characters.

Based on this, the executing body may also execute the step 204 by:

firstly, generating character position information of placeholder characters and character position information of non-placeholder characters in the character sequence according to the character sequence.

Then, for two adjacent space-occupying characters in the character sequence, in response to that each non-space-occupying character between the two adjacent space-occupying characters in the character sequence is the same, generating character position information of a character in the text line corresponding to the non-space-occupying character between the two adjacent space-occupying characters according to the non-space-occupying character position information of each non-space-occupying character between the two adjacent space-occupying characters. Two adjacent placeholder characters in the character sequence may be two adjacent placeholder characters in the character sequence, and a non-placeholder character may be included between the two adjacent placeholder characters in the character sequence, or the placeholder character may not be included.

It will be appreciated that, in general, for a single character in the text line in the target image, the character sequence resulting from the character recognition at step 203 may contain two or more characters corresponding to the single character. For example, if the text line in the target image is "off the original sketch", and for a single character "off" in the text line, the character sequence obtained after the execution subject performs character recognition may contain two characters "off" corresponding to the single character. Illustratively, for the text line "grass off the original", the above-described executive body may derive the character sequence "grass off the original"

In this application scenario, the execution main body may determine a position of a first chinese character (i.e., a first departure of the text line) in the text line "from the original upper sketch" according to the first chinese character (i.e., the first departure of the character sequence) and the second chinese character (i.e., the second departure of the character sequence) in the character sequence "from the original upper sketch", so as to obtain the character position information of the non-space-occupying character "from".

It can be understood that, for the case that each non-placeholder character between two adjacent placeholder characters in the character sequence is the same, the above alternative implementation may further improve the accuracy of character positioning.

In some optional implementations of this embodiment, the executing main body may execute the step 204 by:

first, position information of the characters in the text line in the image area is generated based on the character sequence.

Then, character position information of the character in the target image is determined according to the text line position information and the position information of the character in the image area.

It is understood that since the image area belongs to a part of the target image, there is a positional relationship between the image area and the target image, and according to the positional relationship, the character position information of the character in the target image can be determined according to the position information of the character in the image area.

In some optional implementation manners of this embodiment, the step 201 may be: and acquiring an image presented by the target display screen as a target image. The target display screen may be a display screen of the execution main body, or may be a display screen of an electronic device communicatively connected to the execution main body.

Based on this, when the user operation on the character in the target image is detected, the execution main body may further determine the character to be operated, which is indicated by the user operation, from the text line indicated by the text line position information according to the operation position of the user operation on the target display screen and the character position information of the character in the text line. Wherein, the user operation may include, but is not limited to, at least one of the following: word fetching operations, point reading operations, and the like.

It can be understood that the above alternative implementation manner can more accurately determine the character to be operated indicated by the user operation, so that the corresponding processing, such as word fetching operation, point reading operation, and the like, can be more accurately executed according to the indication of the user operation.

In some optional implementation manners of this embodiment, the execution main body may further output information pre-associated with the character to be operated.

As an example, the output information pre-associated with the character to be operated may be translation information, introduction information, audio information, and the like of the character to be operated. For example, if the character to be operated is "apple", the output information may be "applet" when the output information is translation information of the character to be operated; when the output information is introduction information of the character to be operated, the output information can be' apple nutritive value: … … "rich in minerals and vitamins; when the output information is the audio information of the character to be operated, the output information can be the audio of the apple.

It can be understood that the optional implementation manner can determine the output information pre-associated with the character to be operated more quickly, and further improve the speed of outputting the information.

In some optional implementation manners of this embodiment, the execution main body may further perform processing of the user operation instruction on the character to be operated according to the instruction of the user operation.

As an example, when the user operation indicates a word fetching operation, the execution main body may perform the word fetching operation on the character to be operated, and perform corresponding processing, such as translation, interpretation and the like, on the character to be operated selected by the word fetching operation; when the user operation indicates a click-to-read operation, the execution main body may perform a click-to-read operation on the character to be operated, and execute corresponding processing of the character to be operated selected by the click-to-read operation, for example, play an operation such as a voice corresponding to the character to be operated selected by the click-to-read operation.

It can be understood that the above alternative implementation manner can improve the speed of processing the user operation instruction on the character to be operated.

With continuing reference to fig. 3A-3C, fig. 3A-3C are schematic diagrams of an application scenario of the character location method according to the present embodiment. In fig. 3A, the terminal device 31 first acquires a target image 301. Then, referring to fig. 3B, the terminal device 31 performs text line detection on the target image 301 to obtain text line position information 3011 and 3016 of the text line in the target image. Then, the terminal device 31 performs character recognition on the image areas located at the positions respectively indicated by the text line position information 3011 and 3016 in the target image 301 to obtain the character sequence included in the text line. Finally, as shown in fig. 3C, the terminal device 31 generates character position information of the characters in the text line (character position information of "chu", "state", "west", "jian", "down", "heading", "poem", "person", "down", "wei", "should", "thing", "unique", "mobile", "pylo", "herb", "edge", "raw", "", "" up "," having "," yellow "," bird "," deep "," tree "," ring "," song "," talk "," spring "," tide "," band "," rain "," late "," come "," fast "," "," wild "," ferry "," no "," person "," boat "," self "," cross ",". ", and" are generated in fig. 3C, respectively, and the character position information of each character is represented by a quadrangular box).

In general, character recognition is performed by a character recognition method based on single character detection and single character recognition (for example, single character detection based on connected component analysis and single character recognition based on CNN (Convolutional Neural Networks)), and problems such as false recall are likely to occur. However, character recognition is performed by using a method based on Text line detection and character sequence recognition (e.g., Text line detection based on EAST (Efficient and Accurate Scene Text detection) and Text line recognition based on CRNN (Convolutional Recurrent Neural Network), and a single character position cannot be obtained.

The method provided by the embodiment of the disclosure includes obtaining a target image, performing text line detection on the target image to obtain text line position information of a text line in the target image, performing character recognition on an image area in the target image, where the image area is located at a position indicated by the text line position information, to obtain a character sequence included in the text line, and finally generating character position information of characters in the text line according to the character sequence, so that the positions of the characters can be determined through the text line detection and the character recognition, the speed of single-character positioning can be increased, the calculation amount of single-character positioning can be reduced, and the consumption of calculation resources in the positioning process can be reduced.

With further reference to FIG. 4, a flow 400 of yet another embodiment of a character location method is shown. The process 400 of the character location method includes the following steps:

step 401, a target image is acquired.

In this embodiment, an execution subject of the character positioning method (for example, a server or a terminal device shown in fig. 1) may acquire the target image from other electronic devices or locally through a wired connection manner or a wireless connection manner.

In this embodiment, step 401 is substantially the same as step 201 in the corresponding embodiment of fig. 2, and is not described here again.

Step 402, performing text line detection on the target image to obtain text line position information of a text line in the target image.

In this embodiment, the executing entity may perform text line detection on the target image acquired in step 201 to obtain text line position information of a text line in the target image. The text line position information is represented by a quadrilateral frame. The quadrangular frame may be a quadrangular frame containing lines of text. In some cases, in the quadrangular frame, two sides along the arrangement direction of the respective characters in the text line are parallel.

In this embodiment, step 402 is substantially the same as step 202 in the corresponding embodiment of fig. 2, and is not described herein again.

Step 403, performing character recognition on the image area located at the position indicated by the text line position information in the target image to obtain a character sequence included in the text line.

In this embodiment, the execution main body may perform character recognition on an image area in the target image, the image area being located at a position indicated by the text line position information, to obtain a character sequence included in the text line. The character sequence comprises placeholder characters and non-placeholder characters. The placeholder characters may characterize a distance between two adjacent non-placeholder characters in the sequence of characters. Generally, the greater the distance between two adjacent non-placeholder characters, the greater the number of placeholder characters between the two adjacent non-placeholder characters. Illustratively, the placeholder characters may be spaces and the non-placeholder characters may include any characters other than placeholder characters.

In this embodiment, step 403 is substantially the same as step 203 in the corresponding embodiment of fig. 2, and is not described herein again.

And step 404, generating character position information of the placeholder characters in the character sequence according to the character sequence.

In this embodiment, the execution main body may generate the character position information of the placeholder character in the character sequence according to the character sequence.

Here, for each placeholder character in the character sequence, the execution body may determine a position of the placeholder character, thereby generating character position information of the placeholder character in the character sequence. The character position information of the placeholder characters in the character sequence can represent the appearance positions of the placeholder characters in the character sequence.

Step 405, determining, for two adjacent non-space-occupying characters in the character sequence, partition position information for partitioning the quadrilateral frame according to space-occupying character position information of the space-occupying character between the two adjacent non-space-occupying characters.

In this embodiment, for two adjacent non-space-occupying characters in the character sequence, the execution main body may further determine, according to space-occupying character position information of the space-occupying character between the two adjacent non-space-occupying characters, segmentation position information for segmenting the quadrilateral frame. Two adjacent non-placeholder characters in the character sequence may be two adjacent non-placeholder characters in the character sequence, and a placeholder character may be included between the two adjacent non-placeholder characters in the character sequence. For example, for the character sequence "□ from □ □ from □ □ □ from □ □ grass □ on original □ □", two target adjacent non-placeholder characters may include: the first Chinese character separation and the second Chinese character separation, the second Chinese character separation and the third Chinese character separation, the third Chinese character separation and the fourth Chinese character separation, the fourth Chinese character separation and the fifth Chinese character separation.

The segmentation position information may be represented by a line segment, and when the text line position information of the text line is represented by a quadrilateral frame including two parallel edges, the line segment may be perpendicular to the two parallel edges of the quadrilateral frame.

Illustratively, assume the size of the image area is C × H × W. Wherein C represents the number of channels of the image area, H represents the height of the image area, and W represents the length of the image area. After CNN encoding, reshaping and transposing (transpose), the image region is obtained as w × c, where w is the length of the feature map and c is the number of kernels of the last convolution of CNN. And (3) passing the feature map with the shape of w × c through a recognition model (CTC) to obtain the position and confidence of each non-placeholder character on the feature map, thereby obtaining a text line recognition result. Therefore, the position idx of the placeholder character on the feature map can be obtained according to the position of each non-placeholder character on the feature map. And then obtaining the position idx/W W of the placeholder character on the image area according to the position idx of the placeholder character on the feature map, the length W of the feature map and the length W of the image area.

As an example, the position indicated by the segmentation position information may be any position where the placeholder character is located between two adjacent non-placeholder characters. For example, the line segments characterizing the segmentation location information may be line segments parallel to the top left vertex and the bottom left vertex of a rectangular box including the target placeholder character between two adjacent non-placeholder characters, and may also be line segments parallel to the top right vertex and the bottom right vertex of a rectangular box including the target placeholder character between two adjacent non-placeholder characters. The target placeholder character may be any one of placeholder characters between two adjacent non-placeholder characters, for example, the first placeholder character or the last placeholder character. Optionally, the line segment representing the segmentation position information may also be a central position where the placeholder character is located between two adjacent non-placeholder characters.

In addition, the line segments characterizing the segmentation position information may further include the following line segments: a line segment parallel to the top left vertex and the bottom left vertex of the rectangular box containing the first placeholder character between two adjacent non-placeholder characters, and a line segment parallel to the top right vertex and the bottom right vertex of the rectangular box containing the last placeholder character between two adjacent non-placeholder characters.

Wherein the first placeholder character between two adjacent non-placeholder characters may be preceded by a non-placeholder character. After the last placeholder character between two adjacent non-placeholder characters, a non-placeholder character may be followed.

Step 406, dividing the quadrilateral frame according to the position indicated by each obtained separation position information, and determining the division result as the character position information of the characters in the text line.

In this embodiment, the execution main body may divide the quadrangular frame by the position indicated by each of the obtained division position information, and determine the division result as character position information of the character in the text line.

As an example, please refer to fig. 5A and 5B. Fig. 5A-5B are schematic diagrams of still another application scenario of a character localization method according to the present disclosure. In fig. 5A, the terminal device 51 performs text line detection on the target image, and obtains text line position information of the text line in the target image. Wherein the text line position information is characterized by a quadrilateral frame. In the quadrangular frame, two sides along the arrangement direction of the respective characters in the text line are parallel. As shown in fig. 5A, the

sides

501 and 502 of the quadrangular frame are parallel. Then, the terminal device 51 generates character position information of the placeholder character in the character sequence according to the character sequence. Then, for two adjacent non-space-occupying characters in the character sequence, the terminal device 51 determines, according to the space-occupying character position information of the space-occupying character between the two adjacent non-space-occupying characters, the segmentation position information for segmenting the quadrilateral frame. As shown in fig. 5B, the divided position information determined by the terminal device 51 indicates

positions

503, 504, 505. Finally, the terminal device 51 divides the quadrangular frame in accordance with the positions indicated by the obtained respective division position information, and determines the division result as character position information of the characters in the text line.

Now return to fig. 4.

It should be noted that, besides the above-mentioned contents, the embodiment of the present disclosure may also include the same or similar features and effects as the embodiment corresponding to fig. 2, and no further description is provided herein.

As can be seen from fig. 4, in the process 400 of the character positioning method in this embodiment, the quadrangular frame is used to represent the position information of the text line, and the position of the character in the text line is obtained by determining the segmentation position information, so that the accuracy of character positioning is further improved.

With further reference to fig. 6, as an implementation of the method shown in the above figures, the present disclosure provides an embodiment of a character-locating device, which corresponds to the embodiment of the method shown in fig. 2, and which may include the same or corresponding features as the embodiment of the method shown in fig. 2, and which produces the same or corresponding effects as the embodiment of the method shown in fig. 2, in addition to the features described below. The device can be applied to various electronic equipment.

As shown in fig. 6, the character positioning apparatus 600 of the present embodiment includes: an acquisition unit 601 configured to acquire a target image; a detecting unit 602 configured to perform text line detection on the target image to obtain text line position information of a text line in the target image; a recognition unit 603 configured to perform character recognition on an image area located at a position indicated by the text line position information in the target image, and obtain a character sequence included in the text line; a generating unit 604 configured to generate character position information of the characters in the text line according to the character sequence.

In the present embodiment, the acquisition unit 601 of the character localization apparatus 600 may acquire the target image.

In this embodiment, the detecting unit 602 may perform text line detection on the target image acquired by the acquiring unit 601 to obtain text line position information of a text line in the target image.

In this embodiment, the recognition unit 603 may perform character recognition on an image area located at a position indicated by the text line position information obtained by the detection unit 602 in the target image obtained by the acquisition unit 601 to obtain a character sequence included in the text line

In this embodiment, the generating unit 604 may generate character position information of characters in the text line based on the character sequence obtained by the identifying unit 603 and obtained by the detecting unit 602.

In some optional implementations of the embodiment, the sequence of characters includes placeholder characters and non-placeholder characters. Based on this, the generating unit 604 may include: and a first generating subunit (not shown in the figure) configured to generate non-placeholder character position information of each non-placeholder character in the text line according to the placeholder character position information of the placeholder character in the character sequence.

In some optional implementations of the embodiment, the sequence of characters includes placeholder characters and non-placeholder characters. Based on this, the generating unit 604 may include: a second generating subunit (not shown in the figure), configured to generate, according to the character sequence, character position information of placeholder characters and character position information of each non-placeholder character in the character sequence; a third generating subunit (not shown in the figure), configured to, for two adjacent placeholder characters in the character sequence, in response to that each non-placeholder character between the two adjacent placeholder characters in the character sequence is the same, generate, according to non-placeholder character position information of each non-placeholder character between the two adjacent placeholder characters, character position information of a character in the text line corresponding to the non-placeholder character between the two adjacent placeholder characters.

In some optional implementations of the embodiment, the text line position information is represented by a quadrilateral frame, and the character sequence includes placeholder characters and non-placeholder characters. Based on this, the generating unit 604 may include: a fourth generating subunit (not shown in the figure), configured to generate character position information of the placeholder character in the character sequence according to the character sequence; a first determining subunit (not shown in the figure), configured to determine, for two adjacent non-placeholder characters in the character sequence, segmentation position information for segmenting the quadrilateral frame according to placeholder character position information of a placeholder character between the two adjacent non-placeholder characters; a second determination subunit (not shown in the figure) configured to divide the quadrangular frame in accordance with the positions indicated by the obtained respective division position information, and determine the division result as character position information of the characters in the text line.

In some optional implementations of this embodiment, the generating unit 604 may include: a fifth generating subunit (not shown in the figure) configured to generate position information of the characters in the text line in the image area according to the character sequence; a third determining subunit (not shown in the figure) configured to determine character position information of a character in the target image according to the text line position information and the position information of the character in the image area.

In some optional implementation manners of this embodiment, the obtaining unit 601 may include: and an acquisition subunit (not shown in the figure) configured to acquire the image presented by the target display screen as a target image. Based on this, the apparatus 600 further includes: and a determining unit (not shown in the figure) configured to, in response to detection of a user operation on a character in the target image, determine a character to be operated, which is indicated by the user operation, from a text line indicated by the text line position information according to an operation position of the user operation on the target display screen and character position information of the character in the text line.

In some optional implementations of this embodiment, the apparatus 600 further includes: and an output unit (not shown in the figure) configured to output information pre-associated with the character to be operated.

In some optional implementations of this embodiment, the apparatus 600 further includes: and a processing unit (not shown in the figure) configured to perform processing of the user operation instruction on the character to be operated according to the user operation instruction.

In the apparatus provided in the above embodiment of the present disclosure, first, the obtaining unit 601 obtains a target image, then, the detecting unit 602 performs text line detection on the target image to obtain text line position information of a text line in the target image, then, the identifying unit 603 performs character recognition on an image area in the target image at a position indicated by the text line position information to obtain a character sequence included in the text line, and finally, the generating unit 604 generates character position information of characters in the text line according to the character sequence, so that the positions of the characters can be determined through text line detection and character recognition, the speed of single character positioning can be increased, the calculation amount of single character positioning can be reduced, and the consumption of calculation resources in the positioning process can be reduced.

Referring now to fig. 7, a schematic diagram of an electronic device (e.g., the server or terminal device of fig. 1) 700 suitable for use in implementing embodiments of the present disclosure is shown. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a fixed terminal such as a digital TV, a desktop computer, and the like. The terminal device/server shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 7, electronic device 700 may include a processing means (e.g., central processing unit, graphics processor, etc.) 701 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)702 or a program loaded from storage 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data necessary for the operation of the electronic apparatus 700 are also stored. The processing device 701, the ROM 702, and the RAM 703 are connected to each other by a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

Generally, the following devices may be connected to the I/O interface 705: input devices 706 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 707 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 708 including, for example, magnetic tape, hard disk, etc.; and a communication device 709. The communication means 709 may allow the electronic device 700 to communicate wirelessly or by wire with other devices to exchange data. While fig. 7 illustrates an electronic device 700 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 7 may represent one device or may represent multiple devices as desired.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 609, or may be installed from the storage means 608, or may be installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of embodiments of the present disclosure. It should be noted that the computer readable medium described in the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In embodiments of the present disclosure, however, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a target image; carrying out text line detection on the target image to obtain text line position information of a text line in the target image; performing character recognition on an image area located at the position indicated by the text line position information in the target image to obtain a character sequence included in the text line; and generating character position information of the characters in the text line according to the character sequence.

Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an acquisition unit, a detection unit, an identification unit, and a generation unit. The names of these units do not in some cases constitute a limitation on the unit itself, and for example, the acquisition unit may also be described as a "unit that acquires a target image".

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept as defined above. For example, the above features and (but not limited to) technical features with similar functions disclosed in the embodiments of the present disclosure are mutually replaced to form the technical solution.

Claims

1. A character location method, comprising:

acquiring a target image;

performing text line detection on the target image to obtain text line position information of a text line in the target image;

performing character recognition on an image area in the target image, which is located at the position indicated by the text line position information, to obtain a character sequence included in the text line;

and generating character position information of the characters in the text line according to the character sequence.

2. The method of claim 1, wherein the sequence of characters includes placeholder characters and non-placeholder characters; and

generating character position information of characters in the text line according to the character sequence, wherein the character position information comprises:

3. The method of claim 1, wherein the sequence of characters includes placeholder characters and non-placeholder characters; and

generating character position information of placeholder characters and character position information of each non-placeholder character in the character sequence according to the character sequence;

and for two adjacent placeholder characters in the character sequence, responding to the fact that each non-placeholder character between the two adjacent placeholder characters in the character sequence is the same, and generating character position information of a character corresponding to the non-placeholder character between the two adjacent placeholder characters in the text line according to the non-placeholder character position information of each non-placeholder character between the two adjacent placeholder characters.

4. The method of claim 1, wherein the text line position information is characterized by a quadrilateral box, the sequence of characters including placeholder characters and non-placeholder characters; and

generating character position information of an occupancy character in the character sequence according to the character sequence;

aiming at two adjacent non-space-occupying characters in the character sequence, determining segmentation position information for segmenting the four frames according to space-occupying character position information of the space-occupying character between the two adjacent non-space-occupying characters;

and dividing the four frames according to the positions indicated by the obtained separation position information, and determining the division result as character position information of the characters in the text line.

5. The method of claim 1, wherein the generating character position information for characters in the text line from the character sequence comprises:

generating position information of the characters in the text line in the image area according to the character sequence;

and determining character position information of the character in the target image according to the text line position information and the position information of the character in the image area.

6. The method of one of claims 1 to 5, wherein said acquiring a target image comprises:

acquiring an image presented by a target display screen as a target image; and

the method further comprises the following steps:

in response to the detection of the user operation on the characters in the target image, determining the characters to be operated indicated by the user operation from the text lines indicated by the text line position information according to the operation position of the user operation on the target display screen and the character position information of the characters in the text lines.

7. The method of claim 6, wherein the method further comprises:

and outputting output information pre-associated with the character to be operated.

8. The method of claim 6, wherein the method further comprises:

and processing the user operation instruction on the character to be operated according to the instruction of the user operation.

9. A character-locating device comprising:

an acquisition unit configured to acquire a target image;

the detection unit is configured to detect text lines of the target image to obtain text line position information of the text lines in the target image;

the recognition unit is configured to perform character recognition on an image area, located at the position indicated by the text line position information, in the target image to obtain a character sequence included in the text line;

a generating unit configured to generate character position information of characters in the text line according to the character sequence.

10. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-8.

11. A computer-readable medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the method of any one of claims 1-8.