CN117292370A

CN117292370A - Icon character recognition method and device

Info

Publication number: CN117292370A
Application number: CN202311566683.2A
Authority: CN
Inventors: 胡兴元; 武建双; 宋超; 孙宝; 刘露
Original assignee: Hefei Tianwei Information Security Technology Co ltd
Current assignee: Hefei Tianwei Information Security Technology Co ltd
Priority date: 2023-11-23
Filing date: 2023-11-23
Publication date: 2023-12-26

Abstract

The application relates to a method and a device for identifying icon characters, wherein the method for identifying the icon characters comprises the following steps: acquiring a target image to be identified; positioning icon characters in the target image through an optical character recognition network model to obtain a target area image; classifying the target area image through an image classification network model, and determining the identification result of the icon characters in the target area image according to the classification result. According to the method and the device, the accuracy of identifying the icon characters is improved, and the problem that the accuracy of identifying the icon characters is low in the prior related art is solved.

Description

Icon character recognition method and device

Technical Field

The present disclosure relates to the field of image recognition technologies, and in particular, to a method and an apparatus for recognizing icon characters.

Background

In recent years, computer vision technology is rapidly developed, and identification of picture information is becoming a popular research direction. And acquiring the content required by the user through extracting and identifying the information in the picture.

For example, in the context of computer user status determination, different user names are typically displayed in the interface of the computer, and the status of the user is described in the corresponding area of the user names by using small icons or pictures or other characters, for example, the user is identified as being in a disabled state or an enabled state by the different icons. However, since the existing partial picture information recognition method can only effectively recognize a picture having a large size. For the above scenario, the final recognition is to determine the corresponding user name and the state of the user according to the text information and the icon character information in the picture. However, since most of icon characters used for identification are very small, when the existing picture information recognition technology processes picture information, the information of the icon characters in the picture may be distorted or even lost, and the recognition accuracy of the icon characters is low.

Aiming at the problem of low accuracy in identifying icon characters in the related art, no effective solution is proposed at present.

Disclosure of Invention

The embodiment provides a method and a device for identifying icon characters, which are used for solving the problem of low accuracy in identifying the icon characters in the related art.

In a first aspect, the present invention provides a method for identifying an icon character, including:

acquiring a target image to be identified;

positioning icon characters in the target image through an optical character recognition network model to obtain a target area image;

classifying the target area image through an image classification network model, and determining the identification result of the icon characters in the target area image according to the classification result.

In some of these embodiments, the acquiring the target image to be identified includes:

acquiring an initial image to be processed;

and denoising and enhancing the contrast of the initial image to obtain the target image.

In some of these embodiments, the optical character recognition network model is a PaddleOCR network model.

In some of these embodiments, the image classification network model is a peletenet network model.

In some embodiments, the positioning the icon characters in the target image through the optical character recognition network model, and obtaining the target area image includes:

recognizing the literal characters in the target image through the optical character recognition network model to obtain literal coordinates of the literal characters;

determining a target area of an icon character corresponding to the literal character according to the literal coordinate of the literal character;

and intercepting the target area of the icon character, and determining the target area image.

In some of these embodiments, the size of the target area image is 32×32.

In some of these embodiments, the pelete network model includes a number of feature extraction layers that is less than or equal to three.

In some of these embodiments, the number of feature extraction layers is one.

In some embodiments, the feature extraction layer comprises a dense layer and a convolution layer, the input of the feature extraction layer is the input of the dense layer, the output of the dense layer is the input of the convolution layer, and the output of the convolution layer is the output of the feature extraction layer.

In a second aspect, the present invention provides an apparatus for recognizing an icon character, comprising:

the acquisition module is used for acquiring a target image to be identified;

the positioning module is used for positioning the icon characters in the target image through the optical character recognition network model to obtain a target area image;

and the identification module is used for classifying the target area image through the image classification network model, and determining the identification result of the icon characters in the target area image according to the classification result.

Compared with the related art, the invention has the following technical effects:

1. in the method for identifying the icon characters, the icon characters in the target image are positioned through the optical character identification network model, the target area image is obtained, and the target area image is classified through the image classification network model because the area occupied by the icon characters in the target area image is large. Through the combination of the optical character recognition network model and the image classification network model, the lost information in the recognition process can be reduced, so that the recognition accuracy of icon characters is improved, and the problem of lower recognition accuracy of icon characters in the prior related technology is solved.

2. According to the invention, the target image is obtained by denoising and contrast enhancement operation on the initial image, so that the image quality of the target image is improved, and the subsequent recognition effect on the target image is further improved.

3. In the invention, the model architecture of the PeleNet network model is improved, the feature extraction layer in the PeleNet network model is properly reduced, the downsampling depth of the target area image is further reduced, the information lost in the identification process is reduced, and the identification accuracy of icon characters is further improved.

The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the other features, objects, and advantages of the application.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:

fig. 1 is a block diagram of a terminal hardware structure for performing an icon character recognition method provided in the present invention;

FIG. 2 is a flow chart of a method of identifying iconic characters of the present invention;

fig. 3 is a schematic diagram of a network structure of a conventional PeleeNet network model;

FIG. 4 is a schematic diagram of a network architecture of a PeleeNet network model that includes only one feature extraction layer;

FIG. 5 is a result diagram of enabling status image recognition for 200 users using a PeleeNet network model prior to modification;

FIG. 6 is a result diagram of enabling status image recognition for 200 users using the modified PeleeNet network model;

FIG. 7 is a graph of the results of disabling status image recognition for 200 users using the PeleNet network model prior to modification;

FIG. 8 is a graph of the results of disabling status image recognition for 200 users using the modified PeleNet network model;

fig. 9 is a block diagram showing the configuration of an icon character recognition apparatus according to the present invention.

Detailed Description

For a clearer understanding of the objects, technical solutions and advantages of the present application, the present application is described and illustrated below with reference to the accompanying drawings and examples.

Unless defined otherwise, technical or scientific terms used herein shall have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terms "a," "an," "the," "these," and the like in this application are not intended to be limiting in number, but rather are singular or plural. The terms "comprising," "including," "having," and any variations thereof, as used in the present application, are intended to cover a non-exclusive inclusion; for example, a process, method, and system, article, or apparatus that comprises a list of steps or modules (units) is not limited to the list of steps or modules (units), but may include other steps or modules (units) not listed or inherent to such process, method, article, or apparatus. The terms "connected," "coupled," and the like in this application are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. Reference to "a plurality" in this application means two or more. "and/or" describes an association relationship of an association object, meaning that there may be three relationships, e.g., "a and/or B" may mean: a exists alone, A and B exist together, and B exists alone. Typically, the character "/" indicates that the associated object is an "or" relationship. The terms "first," "second," "third," and the like, as referred to in this application, merely distinguish similar objects and do not represent a particular ordering of objects.

The method embodiments provided in the present invention may be performed in a terminal, a computer or similar computing device. Such as on a terminal, fig. 1 is a block diagram of a terminal hardware structure for performing the method for recognizing icon characters provided in the present invention. As shown in fig. 1, the terminal may include one or more (only one is shown in fig. 1) processors 120 and a memory 140 for storing data, wherein the processors 120 may include, but are not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA. The terminal may further include a transmission device 160 for a communication function and an input-output device 180. It will be appreciated by those skilled in the art that the structure shown in fig. 1 is merely illustrative and is not intended to limit the structure of the terminal. For example, the terminal may also include more or fewer components than shown in fig. 1, or have a different configuration than shown in fig. 1.

The memory 140 may be used to store a computer program, for example, a software program of application software and a module, such as a computer program corresponding to an icon character recognition method in the present invention, and the processor 120 performs various functional applications and data processing by running the computer program stored in the memory 140, that is, implements the above-described method. Memory 140 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, memory 140 may further include memory located remotely from processor 120, which may be connected to the terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 160 is used to receive or transmit data via a network. The network includes a wireless network provided by a communication provider of the terminal. In one example, the transmission device 160 includes a network adapter (Network Interface Controller, simply referred to as NIC) that may be connected to other network devices via a base station to communicate with the internet. In one example, the transmission device 160 may be a Radio Frequency (RF) module for communicating with the internet wirelessly.

In the present invention, there is provided a method for recognizing an icon character, fig. 2 is a flowchart of the method for recognizing an icon character of the present invention, as shown in fig. 2, the flowchart including the steps of:

step S201, a target image to be identified is acquired.

Step S202, positioning icon characters in the target image through the optical character recognition network model to obtain a target area image.

Step S203, classifying the target area image through the image classification network model, and determining the recognition result of the icon characters in the target area image according to the classification result.

For example, in the user state recognition scenario of the Windows interface, a different icon character may be set next to the user's user name to indicate whether the user is disabled. When the state of a user is identified, firstly acquiring an interface image to be identified, wherein the interface image is a target image to be identified; wherein, the target image package generally comprises two characters, namely a text and an icon. Then positioning icon characters in the target image through an optical character recognition network model to obtain a target area image; specifically, after the target image is input into the optical character recognition network model, the optical character recognition network model can accurately recognize the text character content in the target image, and although the text character content cannot be recognized directly, the icon character can still be positioned. And finally classifying the target area image through the image classification network model, and determining the recognition result of the icon characters in the target area image according to the classification result. The identification results at least comprise forbidden and non-forbidden, and the two identification results correspond to two classification results. From the above classification result, it can be determined whether the user is disabled.

In the method, the icon characters in the target image are positioned through the optical character recognition network model, the target area image is obtained, and the area occupation of the icon characters in the target area image is larger, namely the effective information in the target area image is larger, and the area occupation ratio of the icon characters in the target image is smaller, so that the recognition accuracy can be effectively improved compared with the case that the target image is classified directly through the image classification network model.

Through the combination of the optical character recognition network model and the image classification network model, the lost information in the recognition process can be reduced, so that the recognition accuracy of icon characters is improved, and the problem of lower recognition accuracy of icon characters in the prior related technology is solved. Moreover, by combining the optical character recognition network model and the image classification network model, the recognition accuracy of the icon characters can be improved on the basis of recognizing the character characters, namely, the character characters and the icon characters can be recognized at the same time, and the method is more suitable for scenes similar to the user state recognition and needing to process the character information and the icon information at the same time.

In some of these embodiments, the optical character recognition network model is a PaddleOCR network model. The PaddleOCR network model is an Optical Character Recognition (OCR) algorithm based on deep learning. OCR is a technique that converts textual information in an image into readable text. The PaddleOCR network model uses PaddlePaddle as a deep learning framework behind the PaddlePaddle, and detects and identifies text areas by inputting images into a deep Convolutional Neural Network (CNN), so that an OCR solution with high precision and robustness is provided, and the model has the advantages of multi-scene support, high performance and accuracy, multi-language support, open source codes and the like.

In some of these embodiments, the image classification network model is a peletenet network model. The PeleNet network model is a lightweight deep neural network model specifically designed for target detection tasks. The method adopts a series of innovative design and optimization strategies to remarkably reduce the size and the calculated amount of the model while maintaining high accuracy. The PeleNet network model is typically composed of Stem Block, two-Way Dense Layer, dynamic Number of Channels of Bottleneck Layer, transition Layer without Compression, composite Function, which introduces root blocks, bi-directional Dense layers, dynamic channel numbers in bottlenecks, translation Layer compression, and conventional activation functions to reduce computational cost and increase speed. The bi-directional dense layer helps to obtain different scales of receptive fields, making it easier to identify larger targets.

In order to improve the image quality of the target image, in some embodiments thereof, step S201, acquiring the target image to be identified includes: acquiring an initial image to be processed; and denoising and enhancing the contrast of the initial image to obtain a target image. The image quality of the target image obtained through denoising and contrast enhancement operation is better, and the subsequent recognition effect on the target image is improved.

In some embodiments, step S202, positioning the icon characters in the target image through the optical character recognition network model, and obtaining the target area image includes: recognizing the literal characters in the target image through the optical character recognition network model to obtain literal coordinates of the literal characters; determining a target area of an icon character corresponding to the character according to the character coordinates of the character; and intercepting a target area of the icon character to determine a target area image.

In this embodiment, a specific manner of positioning the icon characters by the optical character recognition network model is provided. For example, taking the PaddleOCR network model as an example, in the user state recognition scenario of the Windows interface, since the icon characters in the interface image are used to represent the disabled state of the user, the icon characters are typically shown along with the user's user name, where the user's user name belongs to the text characters. When the icon characters are positioned, the user names of the users are identified through the PaddleOCR network model, coordinates of the user names are obtained, and then the area where the icon characters corresponding to the user names are located is determined according to the coordinates of the user names, namely the target area. Intercepting the target area, and storing the intercepting result by using opencv to obtain a target area image with a larger icon character area.

Further, the size of the target area image is 32×32. In the practical application scenario, the size of the target image is generally about 100×100, and the size of the icon character is generally about 23×23, so that the icon character occupies a relatively small area in the target image. Therefore, when the target area image is cut out, the size of the cut-out target area image is 32×32, so that most of the contents in the target area image are contents of icon characters.

In the method, taking the PeleNet network model as an example, the traditional PeleNet network model can be directly adopted to classify the target area image. However, in order to further enhance the performance of the PeleeNet network model in identifying smaller icon characters, in some of these embodiments, the PeleeNet network model is improved. Specifically, the improved PeleNet network model comprises feature extraction layers, and the number of the feature extraction layers is less than or equal to three. I.e. the feature extraction part of the peletenet network model consists of one, two or three feature extraction layers. Whereas the conventional pelete network model generally has four feature extraction layers, as shown in fig. 3, fig. 3 is a network structure schematic diagram of the conventional pelete network model. Because the size of the icon character is small, if the number of times of downsampling is large or the downsampling degree is deep, the size of the obtained feature map is only about 1×1, and although the effective information (icon character content) in the target area image is high, some information may be lost after deep downsampling. Therefore, the number of feature extraction layers of the PeleNet network model adopted in the embodiment is smaller than that of the traditional PeleNet network model, the number of downsampling times or depth is properly reduced, the loss of information is further reduced, and the accuracy of identifying icon characters is improved. Specifically, the pelete network model in the invention can be obtained by cutting the traditional pelete network model, for example, if the pelete network model with only two feature extraction layers is wanted, the feature extraction layers corresponding to stage3 and stage4 can be cut off, and only the feature extraction layers corresponding to stage1 and stage2 are reserved.

Further, the number of feature extraction layers is one, as shown in fig. 4, and fig. 4 is a network structure diagram of a PeleeNet network model including only one feature extraction layer. In this embodiment, the feature extraction portion of the pelete network model is composed of a feature extraction layer, the feature extraction layer includes a dense layer and a convolution layer, the input of the feature extraction layer is the input of the dense layer, the output of the dense layer is the input of the convolution layer, and the output of the convolution layer is the output of the feature extraction layer, i.e. the feature extraction layers corresponding to stage2, stage3 and stage4 and the pooling layer in stage1 are all cut off. Since most of the contents in the target area image belong to the contents of the icon characters, the invalid information (contents other than the icon characters) in the target area image is small. Therefore, the feature extraction layer in the PeleNet network model is reduced to only one, and the output of the convolution layer is used as the output of the feature extraction layer, so that effective information in the target area image is reserved to a greater extent. Specifically, the PeleNet network model in the invention can be obtained by cutting the traditional PeleNet network model.

In order to verify the recognition performance of icon characters by the PeleeNet network model before improvement and the PeleeNet network model after improvement (PeleeNet network model only comprising one feature extraction layer), 200 user disabled state images and 200 user enabled state images are recognized by using PeleNet before improvement and after improvement respectively on ubuntu20.04, i5-11400, 3070/8G environments. Fig. 5 is a diagram showing the result of using the peer net network model before improvement to enable status image recognition for 200 users, in fig. 5, there are 143 images with accurate recognition and 57 images with incorrect recognition, so the recognition accuracy of the peer net network model before improvement is 71.5%, and the time consumed in the recognition process is about 356ms. Fig. 6 is a diagram showing the result of using the improved pelete network model to enable status image recognition for 200 users, in fig. 6, the number of images to be recognized accurately is 191, and the number of images to be recognized incorrectly is 9, so that the recognition accuracy of the improved pelete network model is 95.5%, and the time consumed in the recognition process is about 155ms. Fig. 7 is a diagram showing the result of identifying a disabled state image of 200 users using the peer net network model before improvement, in fig. 7, there are 156 images with accurate identification and 44 images with incorrect identification, so that the accuracy of identifying the peer net network model before improvement is 78%, and the time consumed in the identification process is about 359ms. Fig. 8 is a diagram showing the result of using the improved pelete network model to identify the disabled state image of 200 users, in fig. 8, there are 192 images with accurate identification and 8 images with incorrect identification, so the identification accuracy of the improved pelete network model is 96%, and the time consumed in the identification process is about 173ms. On the environment of ubuntu20.04, i5-11400 and 3070/8G, the PeleeNet model before and after improvement is used for testing 400 user state diagrams respectively, the comprehensive recognition rate is improved from 74.75% to 95.75%, the total time consumption is reduced from 716ms to 328ms, and 54.19%.

According to the data, even if the PeleNet network model is not improved, the method has higher recognition accuracy when the method is used for recognizing the icon characters, and the recognition accuracy of the icon characters can be further improved after the PeleNet network model is improved. Moreover, the depth of downsampling of the improved PeleNet network model is reduced, so that the recognition time of icon characters can be shortened after the PeleNet network model is improved.

In summary, in the invention, the optical character recognition network model is used for positioning the icon characters in the target image, the target area image is obtained, the target area image with relatively large icon characters is used as the input of the image classification network model, and the image classification network model is used for classifying the target area image, so that the recognition result of the icon characters in the target area image is determined, the loss of information in the icon characters can be reduced, and the recognition accuracy of the icon characters is improved.

The invention also provides a method for identifying the icon characters, which comprises the following steps:

acquiring a target area image to be identified, wherein the target area image comprises icon characters;

classifying the target area image through the image classification network model, and determining the identification result of the icon characters in the target area image according to the classification result; the image classification network model is a PeleNet network model, and the PeleNet network model comprises feature extraction layers, wherein the number of the feature extraction layers is less than or equal to three.

I.e. the feature extraction part of the peletenet network model consists of one, two or three feature extraction layers. The conventional PeleeNet network model generally has four feature extraction layers, and because of the small size of the icon characters, if the number of times of downsampling is large or the degree of downsampling is deep, the size of the obtained feature map is only about 1×1, and although the effective information (icon character content) in the target area image is high in duty ratio, some information may be lost after deep downsampling. Therefore, the number of feature extraction layers of the PeleNet network model adopted in the embodiment is smaller than that of the traditional PeleNet network model, the number of downsampling times or depth is properly reduced, the loss of information is further reduced, and the accuracy of identifying icon characters is improved. Specifically, the PeleeNet network model in the method can be obtained by cutting the traditional PeleeNet network model.

Further, the number of feature extraction layers is one. In this embodiment, the feature extraction portion of the PeleeNet network model is composed of a feature extraction layer, the feature extraction layer includes a dense layer and a convolution layer, the input of the feature extraction layer is the input of the dense layer, the output of the dense layer is the input of the convolution layer, and the output of the convolution layer is the output of the feature extraction layer. Since most of the contents in the target area image belong to the contents of the icon characters, the invalid information (contents other than the icon characters) in the target area image is small. Therefore, the feature extraction layer in the PeleNet network model is reduced to only one, and the output of the convolution layer is used as the output of the feature extraction layer, so that effective information in the target area image is reserved to a greater extent. Specifically, the PeleNet network model in the invention can be obtained by cutting the traditional PeleNet network model.

It should be noted that the steps illustrated in the above-described flow or flow diagrams of the figures may be performed in a computer system, such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flow diagrams, in some cases, the steps illustrated or described may be performed in an order other than that illustrated herein.

The invention also provides a device for identifying icon characters, which is used for realizing the embodiment and the preferred implementation, and the description is omitted. The terms "module," "unit," "sub-unit," and the like as used below may refer to a combination of software and/or hardware that performs a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementations in hardware, or a combination of software and hardware, are also possible and contemplated.

Fig. 9 is a block diagram showing a configuration of an icon character recognition apparatus according to the present invention, as shown in fig. 9, the apparatus comprising:

an acquisition module 901, configured to acquire a target image to be identified;

the positioning module 902 is configured to position an icon character in the target image through the optical character recognition network model, so as to obtain a target area image;

the recognition module 903 is configured to classify the target area image through the image classification network model, and determine a recognition result of the icon character in the target area image according to the classification result.

In the device, the icon characters in the target image are positioned through the optical character recognition network model, the target area image is obtained, and the target area image is classified through the image classification network model because the area occupied by the icon characters in the target area image is relatively large. Through the combination of the optical character recognition network model and the image classification network model, the lost information in the recognition process can be reduced, so that the recognition accuracy of icon characters is improved, and the problem of lower recognition accuracy of icon characters in the prior related technology is solved.

The above-described respective modules may be functional modules or program modules, and may be implemented by software or hardware. For modules implemented in hardware, the various modules described above may be located in the same processor; or the above modules may be located in different processors in any combination.

It should be understood that the specific embodiments described herein are merely illustrative of this application and are not intended to be limiting. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present application, are within the scope of the present application in light of the embodiments provided herein.

It is evident that the drawings are only examples or embodiments of the present application, from which the present application can also be adapted to other similar situations by a person skilled in the art without the inventive effort. In addition, it should be appreciated that while the development effort might be complex and lengthy, it would nevertheless be a routine undertaking of design, fabrication, or manufacture for those of ordinary skill having the benefit of this disclosure, and thus should not be construed as an admission of insufficient detail.

The term "embodiment" in this application means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive. It will be clear or implicitly understood by those of ordinary skill in the art that the embodiments described in this application can be combined with other embodiments without conflict.

Claims

1. A method for recognizing an icon character, comprising:

acquiring a target image to be identified;

2. The method for recognizing an icon character according to claim 1, wherein the acquiring the target image to be recognized includes:

acquiring an initial image to be processed;

3. The method of claim 1, wherein the optical character recognition network model is a PaddleOCR network model.

4. The method of claim 1, wherein the image classification network model is a peletenet network model.

5. The method for recognizing icon characters according to claim 1, wherein the positioning the icon characters in the target image through the optical character recognition network model, and obtaining a target area image comprises:

6. The method of claim 5, wherein the size of the target area image is 32 x 32.

7. The method of claim 4, wherein the peletenet network model includes feature extraction layers, the number of feature extraction layers being less than or equal to three.

8. The method of claim 7, wherein the number of feature extraction layers is one.

9. The method of claim 8, wherein the feature extraction layer comprises a dense layer and a convolution layer, the input of the feature extraction layer is the input of the dense layer, the output of the dense layer is the input of the convolution layer, and the output of the convolution layer is the output of the feature extraction layer.

10. An icon character recognition apparatus, comprising:

the acquisition module is used for acquiring a target image to be identified;