CN110991445B

CN110991445B - Vertical text recognition method, device, equipment and medium

Info

Publication number: CN110991445B
Application number: CN201911147784.XA
Authority: CN
Inventors: 张水发; 李岩
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2019-11-21
Filing date: 2019-11-21
Publication date: 2023-09-29
Anticipated expiration: 2039-11-21
Also published as: CN110991445A

Abstract

The disclosure relates to a method, a device, equipment and a medium for recognizing vertical characters, and belongs to the technical field of multimedia. According to the method and the device, the vertical image of the vertical text to be identified is rotated to obtain the transverse image corresponding to the vertical text, the image characteristic of the transverse text corresponding to the vertical text is obtained based on the image characteristic of the transverse image, the image characteristic of the transverse text is identified, compared with the method of directly identifying the vertical image of the vertical text, the method and the device have the advantages that the vertical text is adjusted to the image characteristic of the transverse text, the identification is easy, the probability of identifying the text is greatly improved, and meanwhile, the identification accuracy is also improved.

Description

Vertical text recognition method, device, equipment and medium

Technical Field

The disclosure relates to the technical field of multimedia, and in particular relates to a vertical text recognition method, a device, equipment and a medium.

Background

The character recognition technology can be applied to character recognition in various pictures, with the continuous development of the technology, the precision of the character recognition is higher and higher, people can realize the character recognition by means of a character recognition tool, characters in the pictures comprise horizontal character regions and vertical character regions, and correspondingly, the character recognition process also comprises horizontal character recognition and vertical character recognition.

In the related art, the recognition method of the vertical text may be: the vertical image of the vertical text to be identified is input into an identification model, the identification model is used for identifying the vertical image, but the identification model is used for identifying the horizontal text, the difference between the image characteristics of the horizontal text and the vertical text is large, the probability of identifying the text is small, the accuracy is low, and the identification effect is poor.

Disclosure of Invention

The disclosure provides a method, a device, equipment and a medium for recognizing vertical characters, which at least solve the problems of low recognition probability and low accuracy of recognizing the vertical characters in the related technology. The technical scheme of the present disclosure is as follows:

according to a first aspect of an embodiment of the present disclosure, there is provided a vertical text recognition method, including:

rotating the vertical image of the vertical text to be identified to obtain a transverse image corresponding to the vertical text;

based on the image characteristics of the transverse image, acquiring the image characteristics of the transverse text corresponding to the vertical text;

and recognizing the image characteristics of the horizontal characters to obtain character recognition results of the vertical characters.

In one possible implementation manner, the acquiring, based on the image features of the lateral image, the image features of the lateral text corresponding to the vertical text includes:

Extracting features of the transverse image to obtain image features;

according to the image characteristics, first transformation information which is transformed from target image characteristics to the image characteristics is obtained, wherein the target image characteristics are used for representing the image characteristics of the transverse image after the vertical characters are adjusted to the transverse characters;

and based on the first transformation information, sampling the image characteristics of the transverse image to obtain the image characteristics of the transverse characters corresponding to the vertical characters.

In one possible implementation, the obtaining, from the image feature, first transformation information for transforming from the target image feature to the image feature includes:

acquiring target image characteristics based on the image characteristics and second transformation information, wherein the second transformation information is used for adjusting vertical characters in the transverse image into transverse characters;

based on the image feature and the target image feature, the first transformation information transformed from the target image feature to the image feature is obtained.

In one possible implementation manner, the sampling, based on the first transformation information, the image feature of the lateral image to obtain the image feature of the lateral text corresponding to the vertical text includes:

generating a grid based on the resolution of the lateral image;

Acquiring the position of the image feature of the horizontal text corresponding to the vertical text in the image feature of the horizontal image based on the grid and the first transformation information;

and based on the position of the image feature of the horizontal text in the image feature of the horizontal image, sampling the image feature of the horizontal image to obtain the image feature of the horizontal text corresponding to the vertical text.

In one possible implementation manner, the rotating the vertical image of the vertical text to be identified to obtain the horizontal image corresponding to the vertical text includes any one of the following:

the vertical image is rotated anticlockwise to obtain a first transverse image, and the first transverse image is used as a transverse image corresponding to the vertical text;

clockwise rotating the vertical image to obtain a second transverse image, and taking the second transverse image as a transverse image corresponding to the vertical text;

and rotating the vertical image anticlockwise to obtain a first transverse image, rotating the vertical image clockwise to obtain a second transverse image, acquiring a third transverse image based on the first transverse image and the second transverse image, and taking the third transverse image as a transverse image corresponding to the vertical text.

In one possible implementation, the acquiring a third lateral image based on the first lateral image and the second lateral image includes:

and performing channel stitching on the first transverse image and the second transverse image to obtain the third transverse image.

In one possible implementation manner, the identifying the image features of the horizontal text to obtain the text identification result corresponding to the vertical text includes:

based on the image characteristics of the horizontal characters and the two-way semantic relation of the characters in the horizontal characters, the image characteristics of the horizontal characters are identified, and a character identification result corresponding to the vertical characters is obtained.

According to a second aspect of embodiments of the present disclosure, there is provided a vertical text recognition apparatus, including:

in one possible implementation, the apparatus includes:

the rotation unit is configured to rotate the vertical image of the vertical text to be identified to obtain a transverse image corresponding to the vertical text;

an acquisition unit configured to perform image features of the horizontal text corresponding to the vertical text based on the image features of the horizontal image;

and the recognition unit is configured to perform recognition on the image characteristics of the horizontal characters to obtain character recognition results of the vertical characters.

In one possible implementation, the apparatus further includes:

an extraction unit configured to perform feature extraction on the lateral image, resulting in image features;

the acquisition unit is further configured to perform a first transformation from a target image feature to the image feature according to the image feature, the target image feature being used to represent an image feature in which the vertical text in the horizontal image is adjusted to the horizontal text;

and the sampling unit is configured to sample the image characteristics of the transverse image based on the first transformation information to obtain the image characteristics of the transverse text corresponding to the vertical text.

In one possible implementation, the acquisition unit is further configured to perform:

In one possible implementation, the apparatus further includes:

a generation unit configured to perform generation of a grid based on a resolution of the lateral image;

The acquisition unit is further configured to perform: acquiring the position of the image feature of the horizontal text corresponding to the vertical text in the image feature of the horizontal image based on the grid and the first transformation information;

the sampling unit is further configured to perform: and based on the position of the image feature of the horizontal text in the image feature of the horizontal image, sampling the image feature of the horizontal image to obtain the image feature of the horizontal text corresponding to the vertical text.

In one possible implementation, the rotation unit is further configured to perform any one of the following:

In one possible implementation, the apparatus further includes:

and the stitching unit is configured to perform channel stitching on the first transverse image and the second transverse image to obtain the third transverse image.

In one possible implementation, the identification unit is further configured to perform:

According to a third aspect of embodiments of the present disclosure, there is provided a computer device comprising:

one or more processors;

one or more memories for storing the one or more processor-executable instructions;

wherein the one or more processors are configured to execute the instructions to implement the above-described text-to-vertical recognition method.

According to a fourth aspect of embodiments of the present disclosure, there is provided a storage medium comprising:

the instructions in the storage medium, when executed by a processor of a computer device, enable the computer device to perform the above-described method of recognition of text in a vertical line.

According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product comprising: the computer program product stores at least one instruction that is loaded and executed by a processor to perform the operations performed by the above-described text recognition.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

according to the method and the device provided by the embodiment of the disclosure, the vertical image of the vertical text to be identified is rotated to obtain the horizontal image corresponding to the vertical text, the image characteristic of the horizontal text corresponding to the vertical text is obtained based on the image characteristic of the horizontal image, the image characteristic of the horizontal text is identified, compared with the method for directly identifying the vertical image of the vertical text, the method and the device have the advantages that the vertical text is adjusted to the image characteristic of the horizontal text, the identification is easy, the probability of identifying the text is greatly improved, and meanwhile, the identification accuracy is also improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure and do not constitute an undue limitation on the disclosure.

FIG. 1 is a flow chart illustrating a method of vertical text recognition according to an exemplary embodiment;

FIG. 2 is a flow chart illustrating a method of vertical text recognition according to an exemplary embodiment;

FIG. 3 is a flowchart illustrating an example of acquiring image features of horizontal text corresponding to vertical text in accordance with an example embodiment;

FIG. 4 is a block diagram illustrating a vertical text recognition device according to an exemplary embodiment;

fig. 5 is a schematic diagram of a structure of a terminal according to an exemplary embodiment;

fig. 6 is a schematic diagram illustrating a structure of a server according to an exemplary embodiment.

Detailed Description

In order to enable those skilled in the art to better understand the technical solutions of the present disclosure, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the foregoing figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the disclosure described herein may be capable of operation in sequences other than those illustrated or described herein. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.

The user information referred to in the present disclosure may be information authorized by the user or sufficiently authorized by each party.

Fig. 1 is a flowchart illustrating a vertical text recognition method according to an exemplary embodiment, as shown in fig. 1, including the following steps.

In step S11, the vertical image of the vertical text to be identified is rotated, so as to obtain a horizontal image corresponding to the vertical text.

In step S12, based on the image features of the horizontal image, the image features of the horizontal text corresponding to the vertical text are acquired.

In step S13, the image features of the horizontal text are identified, and a text identification result of the vertical text is obtained.

According to the method provided by the embodiment of the disclosure, the vertical image of the vertical text to be identified is rotated to obtain the horizontal image corresponding to the vertical text, the image characteristic of the horizontal text corresponding to the vertical text is obtained based on the image characteristic of the horizontal image, and the image characteristic of the horizontal text is identified.

extracting features of the transverse image to obtain image features;

Generating a grid based on the resolution of the lateral image;

Any combination of the above-mentioned optional solutions may be adopted to form an optional embodiment of the present disclosure, which is not described herein in detail.

Fig. 2 is a flowchart illustrating a vertical text recognition method according to an exemplary embodiment, as shown in fig. 2, including the following steps.

In step S21, the computer device acquires a vertical image of the vertical text to be recognized.

In the embodiment of the present disclosure, the arrangement manner of the characters in the vertical image of the vertical text is vertical, and the fonts, the number, and the like of the characters in the vertical image may be arbitrary, that is, the height, the width, and the number of the characters included in the vertical image of the vertical text may be arbitrary.

The computer device may be a terminal or a server. The vertical image may be obtained by a computer device in a variety of manners, such as extracting from an image database, downloading from an image website, self-shooting, etc., which is not limited in this embodiment of the disclosure.

In step S22, the computer device rotates the vertical image of the vertical text to be identified, to obtain a horizontal image corresponding to the vertical text.

If the computer equipment directly extracts and identifies the features of the vertical image, the difference between the features of the vertical characters and the features of the horizontal characters is larger, and the accuracy of the identification result is very low, so that the computer equipment can rotate the vertical image to obtain a horizontal image, the characters in the image are all arranged in the horizontal direction, and the difference between the extracted features can be reduced.

After the computer device obtains the vertical image of the vertical text to be identified, the vertical image may be rotated in a plurality of manners, and specifically, the computer device may rotate the vertical image in any one of the following manners one to three:

in the first mode, the computer equipment rotates the vertical image anticlockwise to obtain a first transverse image, and the first transverse image is used as a transverse image corresponding to the vertical text.

And the computer equipment rotates the vertical image clockwise to obtain a second transverse image, and the second transverse image is used as a transverse image corresponding to the vertical text.

And thirdly, the computer equipment rotates the vertical image anticlockwise to obtain a first transverse image, rotates the vertical image clockwise to obtain a second transverse image, obtains a third transverse image based on the first transverse image and the second transverse image, and takes the third transverse image as a transverse image corresponding to the vertical text.

In the first to third modes, the vertical image is rotated in a certain direction to obtain the horizontal image, the characters in the horizontal image are still in a vertical row form, but the position relationship between the adjacent characters in the image features becomes a horizontal relationship, and the method is more suitable for semantic judgment according to the front and rear characters.

In the rotation process, the vertical image can be rotated by 90 degrees anticlockwise or clockwise, and rotated into the horizontal image. The vertical image of the vertical text is not fixed in text direction, and may be from top to bottom (forward direction) or from bottom to top (reverse direction). In the third mode, the two directions can be rotated to synthesize the two horizontal images obtained by rotation to represent the vertical image. For the vertical images of the vertical text with different text directions, the computer equipment rotates the vertical images clockwise and anticlockwise, the acquired third transverse image can be the same based on the first transverse image and the second transverse image which are obtained by rotation, and the process is a process of carrying out data amplification on the vertical text images, so that the data volume can be increased, and the accuracy of text recognition is improved.

The computer device may obtain the third transverse image based on the first transverse image and the second transverse image in a number of ways. In one possible implementation, the computer device may perform channel stitching on the first lateral image and the second lateral image to obtain the third lateral image.

In one possible implementation, the number of channels of the third landscape image may be a sum of the number of channels of the first landscape image and the number of channels of the second landscape image. For example, when the number of channels of the first lateral image and the second lateral image is three, the number of channels of the third lateral image may be six. The number of channels of the first lateral image and the second lateral image is not limited in the embodiments of the present disclosure.

In a specific possible embodiment, the computer device may implement the channel stitching process described above through a connection (concat) function. The embodiment of the disclosure does not limit the manner in which the computer device implements the channel splicing process.

In one specific example, the computer device may rotate the portrait image counterclockwise to obtain a first landscape image, and rotate the portrait image clockwise to obtain a second landscape image, where the number of channels of the first landscape image and the second landscape image are both three. The computer equipment can acquire a third transverse image based on the first transverse image and the second transverse image through the concat function, the channel number of the third transverse image is six, and the third transverse image can be used as the transverse image corresponding to the vertical text. Because the vertical image of the vertical text is not fixed in text direction, which may be from top to bottom (forward direction) or from bottom to top (reverse direction), the vertical image is rotated counterclockwise and clockwise, and the images obtained by the counterclockwise rotation and the clockwise rotation are spliced in a channel, and the same third image can be obtained after the rotation splicing step is performed on the vertical images with different text directions. For example, when the direction of the characters in the vertical image of the vertical text is from top to bottom, the vertical image is rotated counterclockwise and rotated clockwise, so that a first lateral image and a second lateral image with different directions of the vertical text can be obtained, for example, the direction of the vertical text in the first lateral image can be from left to right, and the direction of the vertical text in the second lateral image can be from right to left. When the direction of the characters in the vertical image of the vertical text is from bottom to top, the vertical image is rotated anticlockwise and clockwise, so that a first transverse image and a second transverse image with different vertical text directions can be obtained, for example, the direction of the characters in the vertical row in the first transverse image can be from right to left, and the direction of the characters in the vertical row in the second transverse image can be from left to right. And rotating the vertical images with different text directions from top to bottom and from bottom to top to obtain corresponding first transverse images and second transverse images, and then based on the first rotating images and the second rotating images, splicing the obtained third images by using the concat function channels can be identical. The step is a process of amplifying the data, so that the data volume can be increased, and further, the text accuracy is improved.

In step S23, the computer device performs feature extraction on the lateral image, and obtains an image feature.

After the computer equipment acquires the transverse image, the transverse image can be subjected to feature extraction to obtain image features, the image features are used as data basis for acquiring the image features of the transverse characters, instead of directly taking the image features of the vertical image as the data basis, and compared with the image features of the vertical image, the image features of the transverse image are closer to the image features of the transverse characters, so that the character recognition accuracy can be improved.

The computer device may perform the feature extraction step in a number of ways, for example, the computer device may also input the transverse image into a feature extraction model from which the image features are extracted.

The feature extraction model may be a model of different results, in one specific example, the feature extraction model may be a convolutional neural network (Convolutional Neural Network, CNN), and the feature extraction process may be: the computer device may input the transverse image into a convolutional neural network, and extract and output image features from the convolutional neural network, and the specific method of extracting features is not limited in the embodiments of the present disclosure.

The convolutional neural Network may be a plurality of types of convolutional neural networks, for example, the convolutional neural Network may be a visual geometry group 16 (Visual Geometry Group, vgg 16) Network, a google initial Network (Google inception Network, google net) or a Residual Network (Residual Network), which is not limited by the embodiments of the present disclosure.

The image features extracted by the computer device may be color features, texture features, spatial relationship features, or other features of the horizontal image, which are not limited in the embodiments of the present disclosure.

In step S24, the computer device acquires first transformation information transformed from the target image feature to the image feature based on the image feature.

The target image features are used for representing the image features of the transverse image after the vertical characters are adjusted to the transverse characters. The first transformation information may include transformation information for transforming the target image feature into the image feature, and by using the first transformation information, a transformation condition of the target image feature into the image feature can be known, and further, a next sampling step can be performed.

The computer device may obtain the first transformation information according to the image feature in a plurality of ways, and in one possible implementation, the specific process may be the following steps one to two:

Step one, the computer equipment acquires target image characteristics based on the image characteristics and the second transformation information.

The second transformation information is used for adjusting the vertical text in the transverse image into the horizontal text. The second transformation information is information that changes from the image feature to the target image feature, that is, a transformation process indicated by the first transformation information and a transformation process indicated by the second transformation information are reciprocal.

And the computer equipment can obtain the target image characteristic of the transverse image after the vertical text is adjusted to the transverse text according to the second transformation information and the image characteristic. Naturally, the first transformation information can also be obtained from the image feature and the target image feature.

The second transformation information can be set by a technician according to requirements, can be trained in advance, and can be directly acquired by the computer equipment when in use.

In one possible implementation, a specific example is provided herein, where the process of obtaining the second transformation information by the computer device through training may be: the computer equipment can acquire sample transverse images of a plurality of vertical characters and target transverse images after the vertical characters corresponding to each sample transverse image are adjusted to be the transverse characters, and for the sample transverse images of each vertical character, the computer equipment can predict and obtain a predicted transverse image after the vertical characters are adjusted to be the transverse characters based on the sample transverse images of the vertical characters and second transformation information, so that the second transformation information is adjusted according to the similarity between the predicted transverse image and the target transverse image until the predicted second transformation information meets target conditions, and the predicted second transformation information is used as the second transformation information. The target condition may be that the similarity converges, or that the number of iterations reaches a target number, which is not limited by the embodiments of the present disclosure.

And step two, the computer equipment acquires first transformation information transformed from the target image characteristic to the image characteristic based on the image characteristic and the target image characteristic.

The computer device may obtain transformation information, i.e. the first transformation information, for transforming the target image feature into the image feature based on the image feature and the target image feature.

The above-mentioned first transformation information obtaining process may be implemented by a model or a network layer inside the model, for example, the obtaining process is implemented by a spatial transformation layer inside the model, the above-mentioned training may be implemented by a process of training to obtain the second transformation information, which may be actually a process of enabling the spatial transformation layer to obtain accurate first transformation information according to image features, that is, the process of calculating the first transformation information by using image features is already trained in the model training process, and when using the model, the computer device may input the image features into the spatial network layer and output the corresponding first transformation information.

In step S25, the computer device samples the image features of the lateral image based on the first transformation information, to obtain the image features of the lateral text corresponding to the vertical text.

The sampling process is a process that the computer equipment adjusts the vertical text in the horizontal image into horizontal text. The computer device may sample the image features of the lateral image directly based on the first transformation information, or may sample the image features in other ways. In one possible implementation manner, the computer device may complete the above sampling step through the following steps one to three:

step one, computer equipment generates grids based on the resolution of the transverse images.

The resolution may refer to the amount of information stored in the lateral image, i.e., how many pixels are within the image per inch. The computer device may generate the grid based on the resolution of the lateral image in a number of ways. In one possible implementation, the computer device divides the grid according to the resolution, and coordinate information of each location may be acquired according to the grid. The existence form of the grid is a plurality of coordinate position information, and the coordinate information is used for adding pixel points so as to form a target image converted into horizontal characters.

And step two, the computer equipment acquires the position of the image characteristic of the horizontal text corresponding to the vertical text in the image characteristic of the horizontal image based on the grid and the first transformation information.

The computer equipment can obtain the position of the image feature of the horizontal text corresponding to the vertical text based on the grid, and can obtain the position of the image feature of the horizontal text corresponding to the vertical text in the image feature of the horizontal image based on the position and the first transformation information.

The grid is composed of coordinates, the grid and the first transformation information are processed to obtain a processed grid, and each piece of coordinate information in the processed grid is used for representing the position of the image feature in the transverse image corresponding to the coordinate information, so that the processed grid can know which position in the plurality of image features is sampled to fill the position in the network.

In one possible implementation manner, the first transformation information is in a matrix form, and the computer device may multiply the coordinate information in the grid with the first transformation information in the matrix form to obtain the coordinate information of the image feature of the horizontal text corresponding to the vertical text in the image feature of the horizontal image. In one specific example, the specific budgeting process may be the following equation one:

equation one:

in the first of the formulas of this equation, And->Can represent the position in the image feature of the horizontal text corresponding to the vertical text,/->And->Can represent the coordinate position of the image feature of the horizontal text corresponding to the vertical text in the image feature of the horizontal image, i represents the feature identifier,/>Representing the first transformation information in matrix form, θ ₁₁ To theta ₂₃ Representing parameters in the matrix.

And thirdly, the computer equipment samples the image characteristics of the transverse image based on the position of the image characteristics of the transverse characters in the image characteristics of the transverse image, so as to obtain the image characteristics of the transverse characters corresponding to the vertical characters.

The computer equipment obtains the position of the image feature of the horizontal text in the image feature of the horizontal image, and then can sample the corresponding positions in the images to obtain the image feature of the horizontal text.

The steps S23 to S25 are the process that the computer device obtains the image feature of the horizontal text corresponding to the vertical text based on the image feature of the horizontal image, and the computer device may obtain the image feature of the horizontal text corresponding to the vertical text through the steps S23 to S25, or may directly obtain the image feature of the horizontal text corresponding to the vertical text based on the image feature of the horizontal image, which is not limited in this embodiment of the disclosure.

In a specific example, the computer device may use the third transverse image as the transverse image corresponding to the vertical text, perform feature extraction on the third transverse image to obtain an image feature, obtain, through a space conversion layer, first transformation information transformed from a target image feature to the image feature according to the image feature, that is, the step S24, and then generate, based on the resolution of the transverse image, a grid, where the computer device obtains, based on the grid and the first transformation information, a position of an image feature of a horizontal text corresponding to the vertical text in the image feature of the transverse image, and sample, based on a position of the image feature of the horizontal text in the image feature of the transverse image, an image feature of the horizontal text corresponding to the vertical text, so as to obtain the image feature of the horizontal text.

The first to third steps are the process of obtaining the adjusted feature of the vertical text in the horizontal image by using the deformation parameter by the computer device, wherein the deformation parameter can be automatically learned by the space conversion layer, the process can solve the problem that the vertical text in the horizontal image becomes inverted after the vertical image rotates to the horizontal image, if the horizontal image is not processed by the steps, the character recognition is directly performed based on the inverted vertical text in the horizontal image, the probability of recognizing the vertical text is small, and the accuracy of recognizing the obtained text is low. Based on the deformation parameters, the characteristics of the vertical characters in the transverse image after adjustment can be obtained, and the recognition rate of the vertical characters is greatly improved. The space conversion layer can meet the calculation requirement by only increasing a small amount of calculation amount, the calculation time is not increased, the space conversion layer can be conveniently and simply connected into a network without too great change, and the requirement on a sample, namely the transverse image, can be greatly reduced.

For example, fig. 3 is a flowchart illustrating a method for acquiring image features of horizontal text corresponding to vertical text according to an exemplary embodiment, as shown in fig. 3, a computer device performs feature extraction on the third horizontal image to obtain image features, inputs a spatial conversion layer as an input feature, outputs the first conversion information by the spatial conversion layer, then generates a grid based on the resolution of the third horizontal image, acquires a position of the image features of the horizontal text corresponding to the vertical text in the input feature based on the grid and the first conversion information, and a sampling layer may sample the input feature according to the position to obtain an output feature, where the output feature is the image feature of the horizontal text corresponding to the vertical text.

In step S26, the computer device identifies the image features of the horizontal text, and obtains the text identification result of the vertical text.

The computer device may identify the image features of the text in a number of ways, for example, the image features of the text may be entered into a target identification model, the recognition is performed by the target model, and can also be performed by a target recognition algorithm, and the specific types of the target recognition model and the target recognition algorithm are not limited in the embodiments of the disclosure.

In one possible implementation manner, the computer device may identify the image features of the horizontal text based on the image features of the horizontal text and the two-way semantic relationship of the text in the horizontal text, so as to obtain a text identification result corresponding to the vertical text. The bidirectional semantic relation of the text can be a forward semantic dependency relation and a backward semantic dependency relation of the text, the front-back sequence of words in sentences can be comprehensively considered based on the bidirectional semantic relation, and the recognized result can be more accurate.

In one specific possible implementation, the computer device may use a Bi-Long Short-Term Memory (BiLSTM) model, where the BiLSTM model is adapted to identify a character sequence, and may identify the image features of the horizontal text based on the image features of the horizontal text and the Bi-directional semantic relationships of the text in the horizontal text. Of course, other models may be used, such as Long Short-Term Memory (LSTM) models, to which embodiments of the present disclosure are not limited.

Fig. 4 is a block diagram illustrating a vertical text recognition device according to an exemplary embodiment. Referring to fig. 4, the apparatus includes a rotation unit 401, an acquisition unit 402, and an identification unit 403.

The rotation unit 401 is configured to perform rotation on the vertical image of the vertical text to be identified, so as to obtain a lateral image corresponding to the vertical text.

The acquiring unit 402 is configured to perform image feature acquisition of the horizontal text corresponding to the vertical text based on the image feature of the horizontal image.

The recognition unit 403 is configured to perform recognition on the image features of the horizontal text, so as to obtain a text recognition result of the vertical text.

In one possible implementation, the apparatus further includes:

According to the device provided by the embodiment of the disclosure, the vertical image of the vertical text to be identified is rotated to obtain the transverse image corresponding to the vertical text, the image characteristic of the transverse text corresponding to the vertical text is obtained based on the image characteristic of the transverse image, the image characteristic of the transverse text is identified, compared with the method of directly identifying the vertical image of the vertical text, the method has the advantages that the vertical text is adjusted to the image characteristic of the transverse text, the identification is easy, the probability of identifying the text is greatly improved, and meanwhile, the identification accuracy is also improved.

It should be noted that: in the vertical text recognition device provided in the above embodiment, only the division of the above functional modules is used for illustration in the vertical text recognition, and in practical application, the above functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the vertical text recognition device and the vertical text recognition method provided in the foregoing embodiments belong to the same concept, and specific implementation processes thereof are detailed in the method embodiments, which are not repeated herein.

Fig. 5 is a schematic structural view of a terminal according to an exemplary embodiment. The terminal 500 may be: a smart phone, a tablet, an MP3 (Moving Picture Experts Group Audio Layer III, motion picture expert compression standard audio plane 3) player, an MP4 (Moving Picture Experts Group Audio Layer IV, motion picture expert compression standard audio plane 4) player, a notebook or a desktop. The terminal 500 may also be referred to by other names of user devices, portable terminals, laptop terminals, desktop terminals, etc.

In general, the terminal 500 includes: one or more processors 501 and one or more memories 502.

Processor 501 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The processor 501 may be implemented in at least one hardware form of DSP (Digital Signal Processing ), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array ). The processor 501 may also include a main processor and a coprocessor, the main processor being a processor for processing data in an awake state, also referred to as a CPU (Central Processing Unit ); a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 501 may integrate a GPU (Graphics Processing Unit, image processor) for rendering and drawing of content required to be displayed by the display screen. In some embodiments, the processor 501 may also include an AI (Artificial Intelligence ) processor for processing computing operations related to machine learning.

Memory 502 may include one or more computer-readable storage media, which may be non-transitory. Memory 502 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 502 is used to store at least one instruction for execution by processor 501 to implement the method of text recognition provided by the method embodiments in the present disclosure.

In some embodiments, the terminal 500 may further optionally include: a peripheral interface 503 and at least one peripheral. The processor 501, memory 502, and peripheral interface 503 may be connected by buses or signal lines. The individual peripheral devices may be connected to the peripheral device interface 503 by buses, signal lines or circuit boards. Specifically, the peripheral device includes: at least one of radio frequency circuitry 504, a display 505, a camera assembly 506, audio circuitry 507, a positioning assembly 508, and a power supply 509.

Peripheral interface 503 may be used to connect at least one Input/Output (I/O) related peripheral to processor 501 and memory 502. In some embodiments, processor 501, memory 502, and peripheral interface 503 are integrated on the same chip or circuit board; in some other embodiments, either or both of the processor 501, memory 502, and peripheral interface 503 may be implemented on separate chips or circuit boards, which is not limited in this embodiment.

The Radio Frequency circuit 504 is configured to receive and transmit RF (Radio Frequency) signals, also known as electromagnetic signals. The radio frequency circuitry 504 communicates with a communication network and other communication devices via electromagnetic signals. The radio frequency circuit 504 converts an electrical signal into an electromagnetic signal for transmission, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 504 includes: antenna systems, RF transceivers, one or more amplifiers, tuners, oscillators, digital signal processors, codec chipsets, subscriber identity module cards, and so forth. The radio frequency circuitry 504 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocol includes, but is not limited to: metropolitan area networks, various generations of mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (Wireless Fidelity ) networks. In some embodiments, the radio frequency circuitry 504 may also include NFC (Near Field Communication, short range wireless communication) related circuitry, which is not limited by the present disclosure.

The display 505 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display 505 is a touch display, the display 505 also has the ability to collect touch signals at or above the surface of the display 505. The touch signal may be input as a control signal to the processor 501 for processing. At this time, the display 505 may also be used to provide virtual buttons and/or virtual keyboards, also referred to as soft buttons and/or soft keyboards. In some embodiments, the display 505 may be one, providing a front panel of the terminal 500; in other embodiments, the display 505 may be at least two, respectively disposed on different surfaces of the terminal 500 or in a folded design; in still other embodiments, the display 505 may be a flexible display disposed on a curved surface or a folded surface of the terminal 500. Even more, the display 505 may be arranged in a non-rectangular irregular pattern, i.e., a shaped screen. The display 505 may be made of LCD (Liquid Crystal Display ), OLED (Organic Light-Emitting Diode) or other materials.

The camera assembly 506 is used to capture images or video. Optionally, the camera assembly 506 includes a front camera and a rear camera. Typically, the front camera is disposed on the front panel of the terminal and the rear camera is disposed on the rear surface of the terminal. In some embodiments, the at least two rear cameras are any one of a main camera, a depth camera, a wide-angle camera and a tele camera, so as to realize that the main camera and the depth camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize a panoramic shooting and Virtual Reality (VR) shooting function or other fusion shooting functions. In some embodiments, camera assembly 506 may also include a flash. The flash lamp can be a single-color temperature flash lamp or a double-color temperature flash lamp. The dual-color temperature flash lamp refers to a combination of a warm light flash lamp and a cold light flash lamp, and can be used for light compensation under different color temperatures.

The audio circuitry 507 may include a microphone and a speaker. The microphone is used for collecting sound waves of users and environments, converting the sound waves into electric signals, and inputting the electric signals to the processor 501 for processing, or inputting the electric signals to the radio frequency circuit 504 for voice communication. For the purpose of stereo acquisition or noise reduction, a plurality of microphones may be respectively disposed at different portions of the terminal 500. The microphone may also be an array microphone or an omni-directional pickup microphone. The speaker is used to convert electrical signals from the processor 501 or the radio frequency circuit 504 into sound waves. The speaker may be a conventional thin film speaker or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, not only the electric signal can be converted into a sound wave audible to humans, but also the electric signal can be converted into a sound wave inaudible to humans for ranging and other purposes. In some embodiments, audio circuitry 507 may also include a headphone jack.

The location component 508 is used to locate the current geographic location of the terminal 500 to enable navigation or LBS (Location Based Service, location-based services). The positioning component 508 may be a positioning component based on the United states GPS (Global Positioning System ), the Beidou system of China, the Granati system of Russia, or the Galileo system of the European Union.

A power supply 509 is used to power the various components in the terminal 500. The power supply 509 may be an alternating current, a direct current, a disposable battery, or a rechargeable battery. When the power supply 509 comprises a rechargeable battery, the rechargeable battery may support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, the terminal 500 further includes one or more sensors 510. The one or more sensors 510 include, but are not limited to: an acceleration sensor 511, a gyro sensor 512, a pressure sensor 513, a fingerprint sensor 514, an optical sensor 515, and a proximity sensor 516.

The acceleration sensor 511 can detect the magnitudes of accelerations on three coordinate axes of the coordinate system established with the terminal 500. For example, the acceleration sensor 511 may be used to detect components of gravitational acceleration on three coordinate axes. The processor 501 may control the display 505 to display a user interface in a landscape view or a portrait view according to a gravitational acceleration signal acquired by the acceleration sensor 511. The acceleration sensor 511 may also be used for acquisition of motion data of a game or a user.

The gyro sensor 512 may detect a body direction and a rotation angle of the terminal 500, and the gyro sensor 512 may collect a 3D motion of the user to the terminal 500 in cooperation with the acceleration sensor 511. The processor 501 may implement the following functions based on the data collected by the gyro sensor 512: motion sensing (e.g., changing UI according to a tilting operation by a user), image stabilization at shooting, game control, and inertial navigation.

The pressure sensor 513 may be disposed at a side frame of the terminal 500 and/or at a lower layer of the display 505. When the pressure sensor 513 is disposed at a side frame of the terminal 500, a grip signal of the user to the terminal 500 may be detected, and the processor 501 performs left-right hand recognition or quick operation according to the grip signal collected by the pressure sensor 513. When the pressure sensor 513 is disposed at the lower layer of the display screen 505, the processor 501 controls the operability control on the UI interface according to the pressure operation of the user on the display screen 505. The operability controls include at least one of a button control, a scroll bar control, an icon control, and a menu control.

The fingerprint sensor 514 is used for collecting the fingerprint of the user, and the processor 501 identifies the identity of the user according to the fingerprint collected by the fingerprint sensor 514, or the fingerprint sensor 514 identifies the identity of the user according to the collected fingerprint. Upon recognizing that the user's identity is a trusted identity, the user is authorized by the processor 501 to perform relevant sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying for and changing settings, etc. The fingerprint sensor 514 may be provided on the front, back or side of the terminal 500. When a physical key or a vendor Logo is provided on the terminal 500, the fingerprint sensor 514 may be integrated with the physical key or the vendor Logo.

The optical sensor 515 is used to collect the ambient light intensity. In one embodiment, the processor 501 may control the display brightness of the display screen 505 based on the intensity of ambient light collected by the optical sensor 515. Specifically, when the intensity of the ambient light is high, the display brightness of the display screen 505 is turned up; when the ambient light intensity is low, the display brightness of the display screen 505 is turned down. In another embodiment, the processor 501 may also dynamically adjust the shooting parameters of the camera assembly 506 based on the ambient light intensity collected by the optical sensor 515.

A proximity sensor 516, also referred to as a distance sensor, is typically provided on the front panel of the terminal 500. The proximity sensor 516 serves to collect a distance between the user and the front surface of the terminal 500. In one embodiment, when the proximity sensor 516 detects that the distance between the user and the front of the terminal 500 gradually decreases, the processor 501 controls the display 505 to switch from the bright screen state to the off screen state; when the proximity sensor 516 detects that the distance between the user and the front surface of the terminal 500 gradually increases, the processor 501 controls the display 505 to switch from the off-screen state to the on-screen state.

Those skilled in the art will appreciate that the structure shown in fig. 5 is not limiting and that more or fewer components than shown may be included or certain components may be combined or a different arrangement of components may be employed.

Fig. 6 is a schematic structural diagram of a server according to an exemplary embodiment, where the server 600 may have a relatively large difference due to configuration or performance, and may include one or more processors (Central Processing Units, CPU) 601 and one or more memories 602, where the one or more memories 602 store at least one instruction, and the at least one instruction is loaded and executed by the one or more processors 601 to implement the text recognition method provided in the foregoing method embodiments. Of course, the server 600 may also have a wired or wireless network interface, a keyboard, an input/output interface, and other components for implementing the functions of the device, which are not described herein.

In an exemplary embodiment, a storage medium, such as a memory, including instructions executable by a processor to perform the method of recognizing text in the above embodiment is also provided. For example, the storage medium may be Read-Only Memory (ROM), random-access Memory (Random Access Memory, RAM), compact disc Read-Only (Compact Disc Read-Only Memory, CD-ROM), magnetic tape, floppy disk, optical data storage device, and the like.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and the above storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The foregoing is illustrative of the present disclosure and is not to be construed as limiting thereof, but rather as various modifications, equivalent arrangements, improvements, etc., which are intended to be included within the spirit and principles of the present disclosure.

Claims

1. A method for recognizing vertical text, the method comprising:

rotating a vertical image of a vertical text to be identified to obtain a transverse image corresponding to the vertical text;

extracting features of the transverse image to obtain image features of the transverse image;

acquiring target image characteristics based on the image characteristics and second transformation information, wherein the second transformation information is used for adjusting vertical characters in the transverse image into horizontal characters, and the target image characteristics are used for representing the image characteristics of the transverse image after the vertical characters are adjusted into the horizontal characters;

Acquiring first transformation information transformed from the target image feature to the image feature based on the image feature and the target image feature;

based on the first transformation information, sampling the image characteristics of the transverse image to obtain the image characteristics of the transverse text corresponding to the vertical text;

2. The method according to claim 1, wherein the step of sampling the image features of the lateral image based on the first transformation information to obtain the image features of the lateral text corresponding to the vertical text includes:

generating a grid based on the resolution of the lateral image;

acquiring the position of the image features of the horizontal characters corresponding to the vertical characters in the image features of the horizontal images based on the grids and the first transformation information;

and sampling the image features of the transverse image based on the positions of the image features of the transverse characters in the image features of the transverse image to obtain the image features of the transverse characters corresponding to the vertical characters.

3. The method according to claim 1, wherein the rotating the vertical image of the vertical text to be identified to obtain the horizontal image corresponding to the vertical text comprises any one of the following:

4. A method according to claim 3, wherein said acquiring a third lateral image based on said first and second lateral images comprises:

5. The method of claim 1, wherein the identifying the image features of the horizontal text to obtain the text identification result corresponding to the vertical text comprises:

and identifying the image features of the horizontal characters based on the image features of the horizontal characters and the two-way semantic relation of the characters in the horizontal characters to obtain character identification results corresponding to the vertical characters.

6. A vertical text recognition device, the device comprising:

an extraction unit configured to perform feature extraction on the lateral image, resulting in image features of the lateral image;

an acquisition unit configured to perform acquisition of a target image feature based on the image feature and second transformation information for adjusting the vertical text in the lateral image to horizontal text, the target image feature being used for representing an image feature after the vertical text in the lateral image is adjusted to horizontal text; acquiring first transformation information transformed from the target image feature to the image feature based on the image feature and the target image feature;

The sampling unit is configured to sample the image characteristics of the transverse image based on the first transformation information to obtain the image characteristics of the transverse text corresponding to the vertical text;

7. The apparatus of claim 6, wherein the apparatus further comprises:

the acquisition unit is further configured to perform: acquiring the position of the image features of the horizontal characters corresponding to the vertical characters in the image features of the horizontal images based on the grids and the first transformation information;

the sampling unit is further configured to perform: and sampling the image features of the transverse image based on the positions of the image features of the transverse characters in the image features of the transverse image to obtain the image features of the transverse characters corresponding to the vertical characters.

8. The apparatus of claim 6, wherein the rotation unit is further configured to perform any one of:

9. The apparatus of claim 8, wherein the apparatus further comprises:

and the splicing unit is configured to perform channel splicing on the first transverse image and the second transverse image to obtain the third transverse image.

10. The apparatus of claim 6, wherein the identification unit is further configured to perform:

11. A computer device, comprising:

one or more processors;

wherein the one or more processors are configured to execute the instructions to implement the method of text recognition as recited in any one of claims 1 to 5.

12. A storage medium, wherein instructions in the storage medium, when executed by a processor of a computer device, enable the computer device to perform the method of recognition of text in vertical as defined in any one of claims 1 to 5.