CN111626383B

CN111626383B - Font identification method and device, electronic equipment and storage medium

Info

Publication number: CN111626383B
Application number: CN202010478196.0A
Authority: CN
Inventors: 尚太章
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2020-05-29
Filing date: 2020-05-29
Publication date: 2023-11-07
Anticipated expiration: 2040-05-29
Also published as: CN111626383A

Abstract

The embodiment of the application discloses a font identification method, which comprises the following steps: acquiring an image to be identified; performing text position recognition and font recognition on the image to be recognized by utilizing a pre-training recognition model to obtain at least one font type corresponding to at least one text at least one text position; the pre-training recognition model is used for determining the text position in the image and the font type of the text at the text position; performing content recognition on the at least one text to obtain at least one piece of content information; and displaying the at least one content information on a display interface of the electronic device based on the at least one font type and the at least one text position. The embodiment of the application also discloses a font identification device, electronic equipment and a storage medium.

Description

Font identification method and device, electronic equipment and storage medium

Technical Field

The present application relates to image processing technology in the field of computers, and in particular, to a font identification method and apparatus, an electronic device, and a storage medium.

Background

The fonts are external form features of the characters, are main carriers for information transmission, and are tools with practical values. In practical application, when performing real-time optical character recognition (Optical Character Recognition, OCR) on an image, the obtained text recognition results are displayed in a display interface in a specific font; typically, the fonts displayed after recognition are not identical to the fonts in the original image; thus, the original font information cannot be embodied, and information loss is caused. Therefore, it is necessary for the electronic device to perform automated font recognition for the text in the image.

Disclosure of Invention

The embodiment of the application provides a font identification method and device, electronic equipment and storage medium, which can identify the font type in an image and improve the integrity of information display.

In a first aspect, a font identification method is provided and applied to an electronic device, and the method includes:

acquiring an image to be identified;

performing text position recognition and font recognition on the image to be recognized by utilizing a pre-training recognition model to obtain at least one font type corresponding to at least one text at least one text position; the pre-training recognition model is used for determining the text position in the image and the font type of the text at the text position;

performing content recognition on the at least one text to obtain at least one piece of content information;

and displaying the at least one content information on a display interface of the electronic device based on the at least one font type and the at least one text position.

Optionally, the training process of the pre-training recognition model includes:

acquiring a sample image and a sample label; the sample tag comprises a text position tag and the font type tag;

processing the sample image based on the recognition model to be trained to obtain a first output result; the first output result is used for representing a first text position in the sample image and a first font type corresponding to the first text position;

Determining a first difference value between the sample tag and the first output result by a target loss function;

and training the recognition model to be trained based on the first difference value until the training ending condition is met, so as to obtain the pre-training recognition model.

Optionally, the target loss function includes a first loss function and a second loss function; the first loss function is used for calculating a difference value of the text position, and the second loss function is used for calculating a difference value of the font type;

the determining, by the target loss function, a first difference value between the sample tag and the first output result includes:

determining a text position difference value of the text position tag and the first text position information based on the first penalty function;

determining a font type difference value for the font type tag and the first font type based on the second loss function;

and carrying out weighting processing on the text position difference value and the font type difference value to obtain the first difference value.

Optionally, the content recognition of the at least one text to obtain at least one content information includes:

Identifying at least one text corresponding to the at least one text position in the image to be identified by using a preset text content identification model to obtain at least one text content information; the text positions correspond to the text content information one by one; the text content recognition model is used for determining content information in the text image.

Optionally, the displaying the at least one content information on the display interface of the electronic device based on the at least one font type and the at least one text position includes:

acquiring a font file corresponding to the at least one font type;

and displaying at least one content information of the at least one text position on the display interface according to the font file.

Optionally, the obtaining a font file corresponding to the at least one font type includes:

matching the at least one font type according to a preset font library;

if a first target font type matched with a preset font library exists in the at least one font type, determining a target font file matched with the first target font type from the preset font library; the first target font type is any one of the at least one font type;

If the second target font type matched with the preset font library does not exist in the at least one font type, acquiring a preset font file; the second target font type is any one font type except the first target font type in the at least one font type;

and when the at least one font type is matched, the acquired preset font file and the target font file are used as font files corresponding to the at least one font type.

Optionally, the acquiring the preset font file includes:

sending a font file request to a target server;

and responding to the font file request, and receiving the preset font file from the target server.

Optionally, the displaying at least one content information of the at least one text position on the display interface according to the font file includes:

displaying the at least one text content information at least one target location on the display interface according to the font file; wherein the at least one target position corresponds one-to-one with the at least one text position.

Optionally, before displaying the at least one content information of the at least one text position on the display interface of the electronic device, the method includes:

Acquiring at least one of font size information of a text at the at least one text position, line spacing information between text lines and word spacing information between fonts;

the displaying at least one content information of the at least one text position on the display interface includes:

at least one content information of the at least one text position is displayed on the display interface based on at least one of font size information of the text at the at least one text position, line spacing information between text lines, and line spacing information between fonts, and the font file.

Optionally, after the content recognition is performed on the at least one text to obtain at least one content information, the method further includes:

translating the at least one piece of content information to obtain at least one piece of translation information;

the at least one translation information is displayed on a display interface based on the at least one font type and the at least one text position.

In a second aspect, there is provided a font recognition apparatus, applied to an electronic device, the apparatus comprising:

the acquisition unit is used for acquiring the image to be identified;

The first recognition unit is used for carrying out text position recognition and font recognition on the image to be recognized by utilizing the pre-training recognition model to obtain at least one font type corresponding to at least one text at least one text position; the pre-training recognition model is used for determining the text position in the image and the font type of the text at the text position;

the second identification unit is used for carrying out content identification on the at least one text to obtain at least one piece of content information;

and the display unit is used for displaying the at least one content information on a display interface of the electronic equipment based on the at least one font type and the at least one text position.

In a third aspect, an electronic device is provided, the electronic device comprising a processor and a memory for storing a computer program capable of running on the processor;

wherein the processor is configured to execute the steps of the font identification method according to the first aspect when running the computer program.

In a fourth aspect, embodiments of the present application also provide a computer-readable storage medium having stored thereon a computer program for execution by a processor to perform the steps of the font identification method according to the first aspect.

The embodiment of the application provides a font identification method and device, electronic equipment and a storage medium, wherein firstly, an image to be identified is obtained; then, carrying out text position recognition and font recognition on the image to be recognized by utilizing a pre-training recognition model to obtain at least one font type corresponding to at least one text at least one text position; the pre-training recognition model is used for determining the text position in the image and the font type of the text at the text position; performing content recognition on the at least one text to obtain at least one piece of content information; and displaying the at least one content information on a display interface of the electronic device based on the at least one font type and the at least one text position. In this way, the text position and the font type of the text at the text position are identified, and the identified content information is displayed according to the identified font type; therefore, all information of the text can be displayed according to the text style in the original image, and the integrity of information display is improved.

Drawings

Fig. 1 is a schematic flow chart of a font identification method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of an exemplary recognition model architecture according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a training process of a pre-training recognition model according to an embodiment of the present application;

FIG. 4 is a schematic view of an exemplary sample image provided by an embodiment of the present application;

FIG. 5 is a schematic diagram of a training process of an exemplary pre-training recognition model according to an embodiment of the present application;

FIG. 6 is a schematic diagram of an exemplary application provided by an embodiment of the present application;

fig. 7 is a schematic structural diagram of a font recognition device according to an embodiment of the present application;

fig. 8 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application.

Detailed Description

For a more complete understanding of the nature and the technical content of the embodiments of the present application, reference should be made to the following detailed description of embodiments of the application, taken in conjunction with the accompanying drawings, which are meant to be illustrative only and not limiting of the embodiments of the application.

If a similar description of "first\second" appears in the application document, the following description is added, in which the terms "first\second" are merely used to distinguish similar objects and do not represent a specific ordering of the objects, it being understood that "first\second" may be interchanged with a specific order or precedence, if allowed, so that embodiments of the application described herein can be practiced in an order other than that illustrated or described herein.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used in the embodiments of the application is for the purpose of describing embodiments of the application only and is not intended to be limiting of the application.

In practical applications, OCR refers to a process in which an electronic device translates recognized shapes into machine characters by detecting the shapes of the characters printed or written on paper through a character recognition method. OCR technology is changing our lives, such as parking lots, tollgates identifying license plate information through ORC technology; the intelligent mobile phone scans the business card and the identity card through the ORC technology, and converts information in the business card and the identity card into characters for storage and the like. It can be seen that OCR technology has been widely used in our lives.

When performing real-time OCR recognition, for example, in the process of recognizing and translating english text in PPT, the obtained text recognition result is displayed in a specific font on the display interface. However, the fonts in the display interface are not identical to the fonts in the original image. For example, in OCR applications in electronic devices, the default display font that may be employed is the Song font, but the font in the actual image may be a regular script. In practical applications, the user may prefer to display the identified text content according to the text fonts in the original image.

In order to solve the problems in the related art, an embodiment of the present application provides a font identification method, where an execution body of the font identification method may be a font identification device provided by the implementation of the present application, or an electronic device integrated with the font identification device, where the font identification device may be implemented in a hardware or software manner. The electronic device may be a smart phone, a tablet computer, a personal computer, a server, an industrial computer, or the like.

Referring to fig. 1, fig. 1 is a flow chart of a font identification method according to an embodiment of the present application, as shown in fig. 1, the font identification method includes the following steps:

step 110, an image to be identified is acquired.

In an embodiment provided by the application, the image to be identified may be an image containing text. Here, the image to be recognized may be an image captured by the electronic device via the capturing means, for example an image of an identity document captured by a user. The image to be identified may be a video frame taken from a video file; for example, images taken by a user from a live video screen. The image to be identified may also be an image containing text downloaded by the user from a website. The source of the image to be identified in the embodiment of the present application is not limited herein.

Step 120, performing text position recognition and font recognition on the image to be recognized by using the pre-training recognition model to obtain at least one font type corresponding to at least one text at least one text position; wherein the pre-trained recognition model is used to determine the text position in the image and the font type of the text at the text position.

In the implementation provided by the application, the pre-trained recognition model is a pre-trained image processing model. Here, the preset recognition model may be obtained by training the electronic device on the sample data, or may be obtained by the electronic device from another server that provides the model.

Here, the pre-trained recognition model is used to determine the text position in the image, and the font type of the text at the text position. It can be appreciated that the pre-training recognition model may be a multi-task processing model, i.e. the pre-training recognition model may process the same image to be recognized to obtain two different types of processing results in the image to be recognized, i.e. text position and font type at the text position.

Here, at least one text position corresponds to at least one font type one-to-one. That is, there is one font type for each text position.

Please refer to an exemplary pre-trained recognition model architecture shown in fig. 2; in fig. 2, an image 21 containing text is input into a pre-trained recognition model 22; here, the pre-trained recognition model is a neural network model. Further, by pre-training the recognition model 22, two output results can be obtained, one being the text position 23 in the image and one being the font type 24 of the text.

It can be understood that, in the embodiment of the application, the pre-training recognition model is a multi-task processing model, and the electronic device performs one-time operation on one image to be recognized through the pre-training recognition model, namely, the text position in the image to be recognized and the font type at the text position can be determined, so that the calculated amount in the image processing process can be reduced in the font type recognition process, and the image processing speed can be improved.

And 130, carrying out content recognition on at least one text to obtain at least one piece of content information.

In the embodiment provided by the application, the electronic equipment can perform content recognition on the text at each text position of at least one text position in the image to be recognized to obtain the content information at each text position.

Here, the electronic device may perform content recognition on the text at the text position using any one of recognition methods. For example, the neural network model is used to perform content recognition on the text at the text position, and the scanning recognition mode may also be used to perform content recognition on the text at the text position, where the embodiment of the application does not limit the content recognition mode of the text.

Here, at least one text corresponds to at least one piece of content information one by one, i.e., one text corresponds to one piece of content information.

Step 140, displaying at least one content information on a display interface of the electronic device based on the at least one font type and the at least one text position.

In the embodiment provided by the application, the electronic equipment can display the identified content information. Specifically, the electronic device may display at least one content information in the display interface, and display the at least one content information as a corresponding font according to a font type corresponding to the content information.

In a possible implementation manner, the electronic device may further display, at preset positions of the display interface, corresponding content information at each text position according to the text positions. Here, the preset position corresponds to a text disposition of the text in the image to be recognized. And, when the content information is displayed, it is displayed according to the recognized font type.

In the font identification method provided by the embodiment of the application, the text position and the font type of the text at the text position are identified, and the identified content information is displayed according to the identified font type; therefore, all information of the text can be displayed according to the text style in the original image, and the integrity of information display is improved.

The training process for pre-training the recognition model is described in detail below.

At present, an artificial intelligent model is adopted to recognize the fonts of texts in an image, and the font recognition technology needs to detect the positions of the texts in the image first and then recognize the fonts of the texts at the positions of the texts. In the prior art, a model for detecting the text position and a model for recognizing the text font need to be trained separately and independently; this requires the same training sample to be input into the two models for multiple calculations; the problem of low training speed due to low model training efficiency is caused; and the recognition speed of the font recognition by the model is also relatively slow.

On this basis, please refer to fig. 3, fig. 3 is a schematic diagram of a training process of a pre-training recognition model according to an embodiment of the present application, as shown in fig. 3, including the following steps:

step 310, acquiring a sample image and a sample label; the sample tags include text position tags and font type tags.

In the embodiment provided by the application, in order to realize the recognition of fonts in a document image or an image containing text, a model for detecting text positions in the image and recognizing text fonts at the text positions can be trained in advance. Sample data needs to be acquired before training of the model can begin.

Here, the sample data includes a sample image and a sample tag. The sample image is an image containing a plurality of text areas and containing texts with various word sizes, shapes, fonts and languages, so that the trained pre-training recognition model can detect the positions of the texts in the image and the font types of the texts at the positions of the texts.

Additionally, the sample tags may include tags for text locations in the sample image, as well as tags for font types of text; it will be appreciated that for a sample image, the sample label can include two types of different labels, a text position label and a font type label, respectively.

It should be noted that the sample image may include a plurality of different text regions, and the different text regions are located at different positions of the sample image. It will be appreciated that each text region in the sample image may correspond to a text position tag and font type tag. The text position label refers to the actual text position of the sample image; font type labels refer to the actual font type of text in a sample image.

In the embodiments provided herein, the sample tags may be manually labeled or otherwise obtained. Here, the coordinate information of the upper left corner and the lower right corner of the text in the sample image may be used as a text position tag, or the coordinate information of the lower left corner and the upper right corner of the text in the sample image may be used as a text position tag, or the coordinate set of the text in the sample image may be used as a text position tag.

In addition, if all the words of the text at a certain text position adopt the same font, the font adopted by the text is used as a font type label of the text; if the fonts adopted by the characters contained in the text at a certain text position are different, the fonts adopted by most characters in the text are used as font type labels for the text.

Illustratively, referring to an exemplary sample image schematic diagram as shown in fig. 4, in which two text lines are included, and all the text in the first text line 41 is Song Ti, the Song body is set as the font type label of the first text line 41. Most of the fonts in the second text line 42 are regular script, and only a small part of the fonts are bold, the regular script can be set as the font type label of the second text line 42.

Step 320, processing the sample image based on the recognition model to be trained to obtain a first output result; the first output result is used for representing a first text position in the sample image and a first font type corresponding to the first text position.

In the embodiment provided by the application, a deep learning technology is also needed to be adopted to build the recognition model to be trained when training is carried out; the recognition model to be trained can be a neural network model built based on a backup, or can be built in other modes, and the embodiment of the application is not limited herein. Here, the sample image obtained in step 310 may be input into the recognition model to be trained, and the sample image is processed by the recognition model to be trained, so as to obtain a first output result of the sample image. Here, the first output result may include a position where the text line is located in the sample image, and a font type of the text line in the sample image, i.e., a first text position and a first font type.

It can be understood that the recognition model to be trained is an initial model architecture built by deep learning technology for detecting text positions and recognizing font types of texts. For example, the recognition model to be trained may be a neural network model based on a plurality of convolutional layers.

Step 330, determining a first difference value between the sample tag and the first output result through the target loss function.

In the embodiment provided by the application, because the recognition model to be trained is the built initial model, the first text position and the first font type in the obtained first output result are not the actual text position and the actual font type in the sample image after the sample data are processed through the recognition model to be trained. And the target text position and target font type indicated by the sample tag are the actual text position and font type in the sample image. Thus, there is a difference between the sample tag and the first output result.

Here, the difference between the sample tag and the first output result may be quantified by a target loss function; that is, the objective loss function may be a function that measures the degree to which the predicted value of the recognition model to be trained does not agree with the true value. Generally, the larger the difference between the sample label and the first output result, the larger the calculated first difference value. The first difference value may characterize to some extent the recognition effect of the recognition model to be trained.

And 340, training the recognition model to be trained based on the first difference value until the training ending condition is met, so as to obtain the pre-training recognition model.

In the embodiment provided by the application, after the first difference value is determined, iterative training can be performed on the recognition model to be trained according to the first difference value and the sample image, and the pre-training recognition model is finally obtained through training.

Specifically, the electronic device may adjust parameters in the recognition model to be trained according to the first difference value; when the electronic equipment trains the recognition model to be trained by using the first loss function, the electronic equipment adjusts model parameters of the recognition model to be trained by back propagation of the first difference value. Further, the electronic equipment processes the sample image based on the adjusted recognition model to be trained to obtain a second output result; the second output results characterize a second text position in the sample graphic and a second font type corresponding to the second text position. And then, the electronic equipment processes the sample image again by utilizing the adjusted recognition model to be trained, and extracts the text position and the font type in the sample image to obtain a second output result. Here, the adjusted recognition model to be trained has better effect in determining the text position and the font type at the text position than the unadjusted recognition model to be trained, so that the second text position and the second font type in the second output result have higher accuracy than the first text position and the second font type.

Further, the electronic device continues to determine a second difference value between the sample tag and the second output result based on the target loss function; and training the adjusted recognition model to be trained based on the second difference value until the training end condition is met, and taking the trained adjusted recognition model to be trained as a final trained pre-training recognition model.

The training process of the pre-training recognition model is obtained through iterative training; the iterative process is to continuously adjust model parameters of the model through the first difference value and the second difference value … … until the adjusted recognition model to be trained converges or the obtained Nth difference value is in a preset threshold range, and determining that the training ending condition is met; here, N is an integer of 1 or more. The adjusted recognition model to be trained obtained at this time is a final pre-training recognition model.

In short, the electronic device may initialize model parameters of the recognition model to be trained, and then input the sample image into the recognition model to be trained, that is, substituting the sample image and the model parameters into the recognition model to be trained for calculation, so as to obtain an output result; the aim of training the recognition model to be trained is to enable the output result of the recognition model to be infinitely close to the label data corresponding to the sample image. In the initial training, since the model parameters are obtained by artificial initialization, the output result has a larger phase difference with the label data, after each output result is obtained, the output result and the label data can be substituted into a preset target loss function to calculate a difference value, then the difference value is used for updating the model parameters, and after a large amount of sample data are used for repeatedly iterating the process, a group of model parameters which can enable the output result of the identification model to be very close to the label can be finally obtained.

In the embodiment provided by the application, the pre-training recognition model can process the input image, output the text position in the image and the font type of the text at the text position. That is, the pre-training recognition model obtained by training in the application is a multi-task processing model, and the pre-training recognition model can obtain the processing results of different processing tasks (namely, a text position detection processing task and a font recognition processing task).

It can be understood that in the training process of the recognition model to be trained, the embodiment of the application trains the parameters for realizing the text position detection and the font recognition task in the recognition model to be trained, thereby reducing the calculated amount of image processing and improving the training efficiency and speed.

In one possible implementation, the target loss function may include a first loss function and a second loss function; the first loss function is used to calculate a difference value for the text position and the second loss function is used to calculate a difference value for the font type.

It will be appreciated that the first loss function is used to measure the difference between the predicted text position and the true text position in the recognition model to be trained and the second loss function is used to measure the difference between the predicted font type and the true font type in the recognition model to be trained.

In the embodiments provided herein, the first and second loss functions include, but are not limited to, logarithmic, square, exponential, absolute value loss functions, and the like. The first loss function and the second loss function may be the same or different, and embodiments of the present application are not limited herein.

Further, in the embodiment provided by the present application, step 330 of determining, by the objective loss function, the first difference value between the sample tag and the first output result may be implemented through steps 3301 to 3303; wherein,

step 3301, determining a text position difference value of the target text position and the first text position information based on the first loss function;

step 3302, determining a font type difference value of the target font type and the first font type based on the second loss function;

and 3303, weighting the text position difference value and the font type difference value to obtain a first difference value.

In the embodiment provided by the application, after a first output result is obtained, a difference value is calculated according to the first output result of the identification model to be trained and each piece of labeling information of the sample image; that is, a text position difference value between a true text position (i.e., text position tag) in the sample image and a first text position in the first output result is calculated based on the first penalty function, and a font type difference value between a true font type (i.e., font type tag) in the sample image and a first font type in the first output result is calculated based on the second penalty function.

Further, the first electronic device may determine a first difference value based on the text position difference value and the font type difference value; that is, the first electronic device may integrate the text position difference value and the font type difference value to determine the first difference value. In this way, the recognition model to be trained is trained through the first difference value obtained after the two different difference values are integrated, namely, the model parameters in the recognition model to be trained are adjusted and optimized through the first difference value.

It can be understood that the embodiment of the application can synthesize the difference values of different dimensions, the model parameters are adjusted by the difference values, only the model parameters need to be adjusted once, and the model parameters do not need to be adjusted according to each difference value, so that the convergence speed of the target loss function is accelerated, and the training speed of the pre-training recognition model is accelerated.

In one possible implementation, the target loss function may be represented by formula (1):

L＝L _location +L _front (1)

wherein L is _location As a first loss function, L _front Is a second loss function.

Correspondingly, the step 3303 of weighting the text position difference value and the font type difference value to obtain the first difference value may be implemented in the following manner:

And calculating the sum of the text position difference value and the font type difference value, and taking the sum of the text position difference value and the font type difference value as a first difference value.

In the embodiment provided by the application, L can be used for _location Calculating text position difference value through L _front Calculating a font type difference value; and accumulating the text position difference value and the font type difference value to obtain a first difference value.

In another possible implementation manner, the objective loss function further includes a weight value corresponding to the font type; the weight value is used to indicate the weight of the font type in the target loss function. Specifically, the target loss function can also be expressed by the formula (2):

L＝L _location +λL _front (2)

wherein L is _location As a first loss function, L _front As a second loss function, λ is a weight value. Here, the λ value may be an arbitrary real number.

a first product between the weight value and the font type difference value is calculated, and a sum between the first product and the text position difference value is taken as a first difference value.

In the embodiment provided by the application, the weight value lambda can be understood as a harmonic coefficient for indicating that the font type difference value occupies the specific weight of the target loss function, and it can be understood that the larger lambda is, the larger the adjusting influence of the font type difference value on the model parameters of the recognition model to be trained is; likewise, the smaller λ, the smaller the adjustment effect of the font type difference value on the model parameters of the recognition model to be trained.

It should be noted that, the weight value λ is a super parameter, and is a parameter set before training the recognition model to be trained, and is not parameter data obtained by training.

Exemplary, reference is made to the training flow diagram of an exemplary pre-training recognition model shown in FIG. 5; as shown in fig. 5, after the electronic device obtains the sample image 5-1, the sample image 5-1 is input into the recognition model 5-2 to be trained, so as to obtain an output result 5-3 corresponding to the text position and an output result 5-4 corresponding to the font type. Further, the electronic equipment takes the text position label 5-5 and the text position corresponding output result 5-3 as the input of a first loss function 5-61 in the target loss function 5-6 to obtain a text position difference value 5-63; and the electronic device takes the font type label 5-7 and the output result 5-4 corresponding to the font type as the input of the second loss function 5-62 in the target loss function 5-6 to obtain the font type difference value 5-64. Then, the electronic device performs operation processing 5-65 on the text position difference value 5-63 and the font type difference value 5-64 through the target loss function 5-6 to obtain a difference value 5-7. Finally, the electronic equipment judges whether the difference value 5-7 is smaller than a preset threshold value, if the difference value 5-7 is smaller than the preset threshold value, the condition that the training end condition is met is indicated, and the current recognition model to be trained is used as a final pre-training recognition model 5-8. If the difference value 5-7 is greater than the preset threshold, the fact that the training ending condition is not met is indicated, model parameters in the recognition model 5-2 to be trained are adjusted according to the difference value, and the electronic equipment continues to process the sample image 5-1 according to the adjusted model to be trained in the same manner as the above until the difference value 5-7 is smaller than the preset threshold.

Therefore, the embodiment of the application can synthesize the difference values of different dimensions, the model parameters are adjusted through the difference values, only the model parameters are required to be adjusted once, and the model parameters are not required to be respectively adjusted according to each difference value, so that the convergence speed of the target loss function is increased, and the training speed of the recognition model is increased.

In a possible implementation manner, step 130 performs content recognition on the at least one text to obtain at least one content information, and specifically includes the following steps:

step 1301, performing content recognition on at least one text corresponding to at least one text position in an image to be recognized by using a preset text content recognition model to obtain at least one text content information; the text positions correspond to the text content information one by one; the text content recognition model is used to determine content information in the text image.

The text content recognition model preset here may be a pre-trained content recognition model. Here, the preset text content recognition model may be obtained by training the electronic device on the sample data, or may be obtained by the electronic device from another server that provides the model.

In the embodiment provided by the application, the electronic equipment can input the local image corresponding to each text position in the image to be identified into the preset text content identification model, and identify the text content at each text position to obtain the text content information at each text position.

Specifically, step 140 of displaying at least one content information on a display interface of the electronic device based on at least one font type and at least one text position may be implemented by:

step 1401, acquiring a font file corresponding to at least one font type;

step 1402, displaying at least one text content information of at least one text position on a display interface according to the font file.

In the embodiment provided by the application, the electronic equipment can store the font files corresponding to different font types in the local storage space. Thus, after the electronic device obtains the font type of the text, the content information of the text can be displayed as the corresponding font on the display interface of the electronic device according to the obtained font file.

Therefore, the electronic equipment can display the text content according to the style in the original image, so that the integrity of the text information in the display process is ensured, and meanwhile, the user experience is improved.

In one possible implementation, step 1401 obtains a font file corresponding to at least one font type, which may be implemented by the following steps:

step 1401a, matching at least one font type according to a preset font library.

In practical applications, the electronic device may not store font files corresponding to all font types. Before the electronic device displays the content information of the identified text, it may also be detected whether a font file corresponding to the identified at least one font type is stored in a preset font library local to the electronic device.

Specifically, after identifying at least one font type, the electronic device may check whether a font file corresponding to the identified at least one font type is stored in the preset font library of the body according to matching between the font type and each font in the preset font library.

Step 1401b, if a first target font type matched with a preset font library exists in at least one font type, determining a target font file matched with the first target font type from the preset font library; the first target font type is any one of the at least one font type.

It can be understood that when the electronic device detects that the font file of the first target font type is stored in the local preset font library, the electronic device directly obtains the font file corresponding to the first target font type from the preset font library, and displays the content information of the text corresponding to the first target font type as the first target font according to the obtained font file.

Step 1401c, if a second target font type matched with the preset font library does not exist in at least one font type, acquiring a preset font file; the second target font type is any one of the at least one font type other than the first target font type.

Here, when the electronic device detects that the font file corresponding to the second target font type is not stored in the preset font library, the content information of the text corresponding to the second target font type may be displayed as the preset font file from acquiring the preset font file.

In one possible implementation, the preset font file may be a font file set in advance, which may be understood as a default font file.

In another possible implementation, the preset font file may be a font file corresponding to a font type having a second target font type similarity higher than a preset similarity threshold.

Here, the electronic device may obtain the feature vector extracted in the font recognition process with respect to the second target font type, calculate the euclidean distance between the feature vector and the feature vector corresponding to each font in the preset font library, and determine the similarity between the second target font type and each font in the preset font library. And determining a preset font file according to the similarity.

In yet another possible implementation, the electronic device may also send a font file request to the target server; and receiving a preset font file from the target server in response to the font file request.

It can be appreciated that the electronic device may also download, from the target server, the font file corresponding to the second target font type to the local font library when it is detected that the font file corresponding to the second target font type is not stored in the storage space.

Therefore, when the font file corresponding to a certain font type is not stored in the preset font library of the electronic equipment, the corresponding font file can be downloaded from the target server, so that the displayed text is consistent with the text style in the original image, the integrity of the text information is ensured, and the use experience of a user is greatly improved.

And step 1401d, when at least one font type is matched, using the obtained preset font file and the target font file as the font file corresponding to the at least one font type.

Here, the electronic device may find, for each of the at least one font type, a matching font file until a font file is found that matches a last font type of the at least one font type. In this way, the acquired preset font file and the target font file can be used as the font file corresponding to at least one font type.

In one possible implementation, step 1402 may display at least one content information of at least one text position on the display interface according to the font file by:

displaying the at least one text content information at least one target location on a display interface according to a font file; wherein the at least one target position corresponds one-to-one with the at least one text position.

It can be understood that after determining the font type of the text at the text position and the content information of the text at the text position, the electronic device can display the content information at the corresponding position of the display interface according to the position of the text in the original image, and display the content information in the corresponding font. That is, the corresponding font is displayed at the position of the display interface response according to the text position and the font type in the image to be processed. Illustratively, the text position is located at the upper right of the image to be recognized and the font type is Song Ti; after the image to be identified is identified, when the image to be identified is displayed on the display interface, the content information of the text is displayed on the upper right of the display interface in a Song style.

In another possible implementation, before displaying at least one content information of at least one text position on a display interface of the electronic device, the following steps may be further performed:

acquiring at least one of font size information of a text at least one text position, line spacing information between text lines and word spacing information between fonts;

accordingly, displaying at least one content information of at least one text position on the display interface may be achieved by:

at least one content information of the at least one text position is displayed on the display interface based on at least one of font size information of the text at the at least one text position, line spacing information between lines of text, and line spacing information between fonts, and a font file.

Here, the electronic device may acquire font size information at each text position in the image to be recognized, line spacing information between text lines, and word spacing information between fonts through the above-described pre-training model. The electronic device may further obtain volume size information at each text position, line spacing information between text lines, and word spacing information between fonts by measuring a partial image at each text position in the image to be recognized. In the embodiment of the present application, the manner of acquiring the font size information at the text position, the line spacing information between text lines, and the line spacing information between fonts is not limited.

In the embodiment provided by the application, the electronic equipment can select at least one of the font size information, the line spacing information and the word spacing information according to the requirement of the user, and simultaneously display the content information at each text position by combining the font type of the text.

In this way, the content information displayed by the display interface is consistent with the font type of the text in the original image; meanwhile, the font size, the line spacing or the word spacing displayed in the display interface can also be consistent with the font size, the line spacing or the word spacing of the text in the original image. Therefore, the electronic equipment can display the text content according to the style in the original image, so that the integrity of the text information in the display process is ensured, and meanwhile, the user experience is improved.

In another possible implementation manner, step 130 may further include the following steps after performing content recognition on at least one text to obtain at least one content information:

step 131, translating at least one piece of content information to obtain at least one piece of translation information;

step 132, displaying at least one translation information on a display interface based on at least one font type and at least one text.

In the implementation provided by the application, the electronic equipment can also perform translation processing on the content information of the identified text to obtain the translation information corresponding to the content information of each text. Further, the translation information is also displayed in the original font.

In one application scenario, an electronic device may receive a translation instruction for a picture issued by a user, where the translation instruction may include a target translation language. In this way, the electronic equipment responds to the translation instruction to perform text position recognition and font recognition on the picture to obtain a text position and a font type, and in addition, the electronic equipment performs content recognition on the text at the recognized text position to obtain content information of the text. After the content information is obtained, the electronic equipment translates the content information of the text to obtain translation information. And finally, the electronic equipment displays the translation information on a display interface according to the recognized font type.

In this way, in the text translation scene, the electronic device can display the texts with the same fonts on the display interface in different languages in real time, so that the use experience of the user can be greatly improved.

In the following, an exemplary application of the embodiment of the present application in a practical application scenario will be described.

Referring to fig. 6, fig. 6 is a schematic diagram of an exemplary application provided by an embodiment of the present application; as shown in fig. 6, the second electronic device starts a camera to shoot in response to a shooting instruction of a user, so as to obtain an original image 6-1; the second electronic device obtains an identification model from the first electronic device to process the original image 6-1, and obtains a text position 6-2 and a font type 6-3 at the text position in the original image 6-1; and recognizing the text at the text position in the original image 6-1 through a preset text content recognition model to obtain text content 6-4. Further, the second electronic device judges whether a font file corresponding to the font type 6-3 is stored in the local storage space, and if the font file corresponding to the font type 6-3 is stored in the local storage space, the text content 6-4 is displayed as a font in the original image according to the font file; if the font file corresponding to the font type 6-3 is not stored in the local storage space, the text content 6-4 is displayed as a default font.

Based on the foregoing embodiments, an embodiment of the present application provides a font recognition device, which may be applied to the electronic apparatus described above, as shown in fig. 7, including:

An acquisition unit 71 for acquiring an image to be recognized;

a first recognition unit 72, configured to perform text position recognition and font recognition on the image to be recognized by using a pre-training recognition model, so as to obtain at least one font type corresponding to at least one text at least one text position; the pre-training recognition model is used for determining the text position in the image and the font type of the text at the text position;

a second identifying unit 73, configured to identify content of the at least one text, so as to obtain at least one piece of content information;

a display unit 74 for displaying the at least one content information on a display interface of the electronic device based on the at least one font type and the at least one text position.

In an embodiment of the present application, the font recognition device further includes a training unit: wherein,

the training unit is used for acquiring a sample image and a sample label; the sample tag comprises a text position tag and the font type tag; processing the sample image based on the recognition model to be trained to obtain a first output result; the first output result is used for representing a first text position in the sample image and a first font type corresponding to the first text position; determining a first difference value between the sample tag and the first output result by a target loss function; and training the recognition model to be trained based on the first difference value until the training ending condition is met, so as to obtain the pre-training recognition model.

In an embodiment provided by the present application, the target loss function includes a first loss function and a second loss function; the first loss function is used for calculating a difference value of the text position, and the second loss function is used for calculating a difference value of the font type;

the training unit is used for determining a text position difference value of the text position label and the first text position information based on the first loss function; determining a font type difference value for the font type tag and the first font type based on the second loss function; and carrying out weighting processing on the text position difference value and the font type difference value to obtain the first difference value.

In the embodiment provided by the present application, the second identifying unit 73 is configured to identify, using a preset text content identifying model, at least one text corresponding to the at least one text position in the image to be identified, so as to obtain the at least one text content information; the text positions correspond to the text content information one by one; the text content recognition model is used for determining content information in the text image.

In the embodiment provided by the present application, the obtaining unit 71 is configured to obtain a font file corresponding to the at least one font type;

And a display unit 74 for displaying at least one text content information of the at least one text position on a display interface according to the font file.

In the embodiment provided by the present application, the obtaining unit 71 is specifically configured to match the at least one font type according to a preset font library; if a first target font type matched with a preset font library exists in the at least one font type, determining a target font file matched with the first target font type from the preset font library; the first target font type is any one of the at least one font type; if the second target font type matched with the preset font library does not exist in the at least one font type, acquiring a preset font file; the second target font type is any one font type except the first target font type in the at least one font type; and when the at least one font type is matched, the acquired preset font file and the target font file are used as font files corresponding to the at least one font type.

In the embodiment provided by the present application, the obtaining unit 71 is further configured to send a font file request to the target server;

In the embodiment provided by the present application, the display unit 74 is specifically configured to display, according to the font file, the at least one text content information at least one target position on the display interface; wherein the at least one target position corresponds one-to-one with the at least one text position.

In the embodiment provided by the present application, the obtaining unit 71 is configured to obtain at least one of font size information of the text at the at least one text position, line spacing information between text lines, and word spacing information between fonts;

the display unit 74 is further configured to display at least one piece of content information of the at least one text position on the display interface based on at least one of font size information of the text at the at least one text position, line spacing information between text lines, and line spacing information between fonts, and the font file.

In an embodiment of the present application, the font recognition device further includes a translation unit, where the translation unit is configured to translate the at least one piece of content information to obtain at least one piece of translation information;

The display unit 74 is configured to display the at least one translation information on a display interface based on the at least one font type and the at least one text.

It can be understood that the font recognition device provided by the embodiment of the application can determine the text position in the image to be recognized and the font type at the text position by performing one-time operation on the input image to be recognized, so that the calculated amount in the image processing process is reduced, and the image processing speed is improved.

Based on the foregoing embodiments, the embodiments of the present application further provide an electronic device, which corresponds to a font identification method applied to the foregoing embodiments; fig. 8 is a schematic diagram of a hardware composition structure of an electronic device according to an embodiment of the present application, and as shown in fig. 8, the electronic device includes a processor 81 and a memory 82 storing a computer program.

Wherein the processor 81 is configured to execute the method steps of the corresponding embodiment of fig. 1 described above when running the computer program.

Of course, in practice, as shown in FIG. 8, the various components of the electronic device are coupled together by a bus system 83. It is understood that the bus system 83 is used to enable connected communications between these components. The bus system 83 includes a power bus, a control bus, and a status signal bus in addition to the data bus. But for clarity of illustration the various buses are labeled as bus system 83 in fig. 8.

It will be appreciated that the memory in this embodiment may be either volatile memory or nonvolatile memory, and may include both volatile and nonvolatile memory. Wherein the nonvolatile Memory may be Read Only Memory (ROM), programmable Read Only Memory (Programmable Read-OnlyMemory, PROM), erasable programmable Read Only Memory (Erasable Programmable Read-Only Memory, EPROM), electrically erasable programmable Read Only Memory (Electrically Erasable Programmable Read-Only Memory, EEPROM), magnetic random access Memory (Ferromagnetic Random Access Memory, FRAM), flash Memory (Flash Memory), magnetic surface Memory, optical disk, or compact disk Read Only Memory (Compact Disc Read-Only Memory, CD-ROM); the magnetic surface memory may be a disk memory or a tape memory. The volatile memory may be random access memory (Random Access Memory, RAM) which acts as an external cache. By way of example, and not limitation, many forms of RAM are available, such as static random access memory (Static Random Access Memory, SRAM), synchronous static random access memory (Synchronous Static Random Access Memory, SSRAM), dynamic random access memory (Dynamic Random Access Memory, DRAM), synchronous dynamic random access memory (Synchronous Dynamic Random Access Memory, SDRAM), double data rate synchronous dynamic random access memory (Double Data Rate Synchronous Dynamic Random Access Memory, ddr SDRAM), enhanced synchronous dynamic random access memory (Enhanced Synchronous Dynamic Random Access Memory, ESDRAM), synchronous link dynamic random access memory (SyncLink Dynamic Random Access Memory, SLDRAM), direct memory bus random access memory (Direct Rambus Random Access Memory, DRRAM). The memory described by embodiments of the present application is intended to comprise, without being limited to, these and any other suitable types of memory.

The method disclosed by the embodiment of the application can be applied to a processor or realized by the processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or by instructions in the form of software. The processor may be a general purpose processor, DSP, or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. The processor may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the present application. The general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed in the embodiment of the application can be directly embodied in the hardware of the decoding processor or can be implemented by combining hardware and software modules in the decoding processor. The software modules may be located in a storage medium having memory and a processor reading information from the memory and performing the steps of the method in combination with hardware.

The embodiment of the application also provides a computer storage medium, in particular a computer readable storage medium. On which computer instructions are stored which, as a first embodiment, when executed by a processor, implement any of the steps in the data processing method described above in the embodiments of the present application, when the computer storage medium is located at a terminal.

In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above described device embodiments are only illustrative, e.g. the division of the units is only one logical function division, and there may be other divisions in practice, such as: multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. In addition, the various components shown or discussed may be coupled or directly coupled or communicatively coupled to each other via some interface, whether indirectly coupled or communicatively coupled to devices or units, whether electrically, mechanically, or otherwise.

The units described as separate units may or may not be physically separate, and units displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units; some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may be separately used as one unit, or at least two units may be integrated in one unit; the integrated units may be implemented in hardware or in hardware plus software functional units.

Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware associated with program instructions, where the foregoing program may be stored in a computer readable storage medium, and when executed, the program performs steps including the above method embodiments; and the aforementioned storage medium includes: a removable storage device, ROM, RAM, magnetic or optical disk, or other medium capable of storing program code.

Alternatively, the above-described integrated units of the present application may be stored in a computer-readable storage medium if implemented in the form of software functional modules and sold or used as separate products. Based on such understanding, the technical solutions of the embodiments of the present application may be embodied in essence or a part contributing to the prior art in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a removable storage device, ROM, RAM, magnetic or optical disk, or other medium capable of storing program code.

It should be noted that: the technical schemes described in the embodiments of the present application may be arbitrarily combined without any collision.

The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A font recognition method, applied to an electronic device, the method comprising:

acquiring an image to be identified;

performing text position recognition and font recognition on the image to be recognized by utilizing a pre-training recognition model to obtain at least one font type corresponding to at least one text at least one text position; the pre-training recognition model is used for determining the text position in the image and the font type of the text at the text position; the pre-training recognition model is a multi-task processing model and is used for processing an image to be recognized and outputting the at least one text position of the image to be recognized and the font type of the corresponding text at the at least one text position at one time;

displaying the at least one content information on a display interface of the electronic device based on the at least one font type and the at least one text position;

the displaying the at least one content information on a display interface of the electronic device based on the at least one font type and the at least one text position includes:

acquiring a font file corresponding to the at least one font type;

2. The method of claim 1, wherein the training process of the pre-trained recognition model comprises:

3. The method of claim 2, wherein the target loss function comprises a first loss function and a second loss function; the first loss function is used for calculating a difference value of the text position, and the second loss function is used for calculating a difference value of the font type;

4. A method according to any one of claims 1-3, wherein said content identifying said at least one text to obtain at least one content message comprises:

5. The method of claim 1, wherein displaying at least one content information of the at least one text position on the display interface according to the font file comprises:

6. The method of claim 1, wherein prior to displaying the at least one content information for the at least one text location on the display interface of the electronic device, comprising:

7. A method according to any one of claims 1-3, wherein after said content recognition of said at least one text to obtain at least one content information, the method further comprises:

8. A font recognition apparatus, characterized by being applied to an electronic device, the apparatus comprising:

the acquisition unit is used for acquiring the image to be identified;

the first recognition unit is used for carrying out text position recognition and font recognition on the image to be recognized by utilizing the pre-training recognition model to obtain at least one font type corresponding to at least one text at least one text position; the pre-training recognition model is used for determining the text position in the image and the font type of the text at the text position; the pre-training recognition model is a multi-task processing model and is used for processing an image to be recognized and outputting the at least one text position of the image to be recognized and the font type of the corresponding text at the at least one text position at one time;

a display unit configured to display the at least one content information on a display interface of the electronic device based on the at least one font type and the at least one text position;

the obtaining unit is further configured to obtain a font file corresponding to the at least one font type;

the display unit is further used for displaying at least one text content information of the at least one text position on a display interface according to the font file.

9. An electronic device comprising a processor and a memory for storing a computer program capable of running on a first processor;

wherein the processor is adapted to perform the steps of the method of any of claims 1 to 7 when the computer program is run.

10. A computer readable storage medium, characterized in that a computer program is stored thereon, which computer program is executed by a processor to carry out the steps of the method according to any of claims 1 to 7.