CN116912856A

CN116912856A - Image identification method and device of intelligent scanning pen and intelligent scanning pen

Info

Publication number: CN116912856A
Application number: CN202311182898.4A
Authority: CN
Inventors: 车建波
Original assignee: Shenzhen Bepal Intelligent Technology Co ltd
Current assignee: Shenzhen Bepal Intelligent Technology Co ltd
Priority date: 2023-09-14
Filing date: 2023-09-14
Publication date: 2023-10-20

Abstract

The application provides an image recognition method of an intelligent scanning pen, and particularly relates to the technical field of artificial intelligence, wherein the method comprises the following steps: s0: constructing, training and testing to obtain an image text recognition model; s1: image acquisition S2: image preprocessing S4: and carrying out text recognition on the preprocessed image by using the image text recognition model, acquiring language information and visual characteristics in the image, and outputting text content contained in the preprocessed image. The application also provides a device and an intelligent scanning pen. Compared with the prior art, the image recognition method provided by the application does not need an extra language model, converts the original two-step recognition image text content into one-step recognition, adaptively considers visual and language information in a unified structure, and displays stronger language capability while keeping high efficiency.

Description

Image identification method and device of intelligent scanning pen and intelligent scanning pen

Technical Field

The application relates to the technical field of artificial intelligence, in particular to an intelligent scanning pen.

Background

The scanning pen is used as a portable electronic device, and can convert handwritten or printed text into a digital format after scanning, and further perform operations such as display, editing, TTS voice playing, translation, sharing and the like.

The working process mainly comprises the steps of image acquisition, image processing, image recognition and the like: firstly, a user needs to hold a scanning pen to aim at a text to be scanned, at the moment, an optical sensor arranged at the pen point of the scanning pen works, and images of the text at the pen point are collected; and processing the acquired image in the scanning pen, and finally sending the processed image into an image recognition model for text recognition, so that the text information in the image is obtained, and further processing such as displaying, translating and the like is further carried out on the text information according to the requirements of a user.

The image recognition algorithm applied to the scanning pen in the prior art is generally completed by adopting a two-step method, namely after the processed image is sent into the image recognition model, the image recognition model firstly carries out visual prediction on the image based on a visual analysis method to obtain a prediction result, then corrects the prediction result by using a text model to finally obtain a text recognition result, and when the method is adopted, the image recognition model independently constructs a visual prediction module and a language correction module, and the two modules independently consider visual information and text information in the sent image. Taking a Chinese patent application document with the application number of 202310604317.5 as an example, a text recognition method and a device for a dictionary pen are clearly recorded, word contents in a text image acquired in a scanning process of the dictionary pen are recognized, character information comparison is carried out on the word contents when the word contents cannot be matched with information in a dictionary pen word stock, character states are judged, text errors are determined in the complete state of the characters, the recognized complete character contents are matched with the information in the dictionary pen word stock in the incomplete state of the characters, the disposal mode of the incomplete characters is judged, the keyword of the incomplete characters is discarded or determined, the matching recognition is carried out on the residual characters and the information in the dictionary pen word stock based on the semantic association relation between the keyword and the residual characters, further, discarding or recognition complement is carried out on incomplete characters at a text recognition starting point and an end point, recognition of a large-density text is better applicable, and the accuracy and efficiency of text recognition are improved.

When the method for distinguishing the visual prediction module from the text language correction module and carrying out image prediction on an image of an input model and then carrying out language correction is applied to a specific scanning pen, the processing speed is low because of the more tedious processing, and the sketching trace, background texture, stain shielding and the like in a text-containing image obtained by scanning at the tip of the scanning pen are easy to influence the final text recognition result because the visual prediction module and the text language correction module work independently, so that the accuracy of the scanning pen is not ideal.

Disclosure of Invention

In order to solve the above problems, the present application provides an image recognition method of an intelligent scanning pen, which takes a visual and a language model as a whole, and obtains language information and visual characteristics for an image together, without requiring an additional language model, and can obtain higher accuracy and faster text recognition speed when being applied to the scanning pen.

The application also provides an image recognition device of the intelligent scanning pen and the intelligent scanning pen so as to apply the method.

The technical scheme provided by the application is as follows:

an image recognition method of an intelligent scanning pen, the method comprising the following steps:

s0: constructing, training and testing to obtain an image text recognition model;

s1: and (3) image acquisition: setting an image collector, and scanning to obtain an original image of a target area;

s2: image preprocessing: preprocessing an original image to obtain a preprocessed image;

s4: and carrying out text recognition on the preprocessed image by using the image text recognition model, acquiring language information and visual characteristics in the image, and outputting text content contained in the preprocessed image.

Further, S0: the image text recognition model is constructed, trained and tested and comprises the following substeps:

s01: constructing a model;

s02: preparing a training data set;

s03: training a model;

s04: and (5) testing the model.

S01: the model construction method comprises the following substeps:

further, S01: the construction model specifically comprises the following steps: s011: respectively constructing a feature extraction network, a mask language perception module and a visual reasoning module;

s012: the feature extraction network, the mask language sensing module and the visual reasoning module are sequentially connected to construct the to-be-trained image text recognition model.

Further, S02 specifically is:

s021: selecting a plurality of training images;

s022: making word-level labels for each training image;

s023: for each training image in the data set with the words marked, randomly generating a character index according to the length of the words marked on the current image;

s024: all training images with word-level labels and character indices are collected to form a training dataset.

Further, S03: the training model comprises the following substeps:

s031: selecting an image from the training data set and inputting the image into the text recognition model of the image to be trained;

s032: the feature extraction network performs feature extraction on the currently input image to obtain a two-dimensional feature vector of the current image;

s033: the mask language perception module performs character mask mapping processing on the two-dimensional feature vector of the current image, the word level label carried by the current image and the character index carried by the current image to obtain shielding feature mapping;

s034: combining the current image, the two-dimensional feature vector of the current image and the shielding feature map to obtain an image with shielding characters;

s035: inputting the image with the shielding characters into a visual reasoning module, capturing long-distance information in a visual space by the visual reasoning module, and predicting the corresponding text line identification content;

s036: and (3) circulating S031-S035 until all images in the training data set are input into the image text recognition model to be trained, and obtaining the image text recognition model to be tested.

Further, 04 includes the following sub-steps:

S041A: selecting a plurality of test images to be assembled into a first test data set;

S042A: removing a mask language perception module in the text recognition model of the image to be detected, and directly connecting a feature extraction network with a visual reasoning module;

S043A: sequentially inputting each test image in the test data set into a feature extraction network, extracting features of the currently input test image by the feature extraction network to obtain a two-dimensional feature vector of the current test image, directly inputting the two-dimensional feature vector into a visual reasoning module, and predicting text line identification content corresponding to the visual reasoning module to obtain a text identification result of the current test image;

S044A, circulating S041A-S043A until all test images in the test data set are input into the image text recognition model to be trained, and obtaining a first image text recognition model to be evaluated.

Further, S04 further includes the following sub-steps:

S041B: selecting a plurality of test images, making word-level labels for each test image, and randomly generating character indexes according to the lengths of the words labeled on the current test image; collecting all the test images carrying word-level labels and character indexes into a second test data set;

S042B: sequentially inputting each test image in the test data set into a text recognition model of the image to be tested to obtain a text recognition result of the current test image;

S043B, circulating S041B-S042B until all the test images in the test data set are input into the image text recognition model to be trained, and obtaining a second image text recognition model to be evaluated.

Further, S04 further includes: and evaluating the first image text recognition model to be evaluated and the second image text recognition model to be evaluated, and selecting the better one as the final image text recognition model.

The application also provides an image recognition device of the intelligent scanning pen, which stores the image recognition method of the intelligent scanning pen and controls the execution of the method.

The application also provides an intelligent scanning pen, and the intelligent scanning pen is applied to the image recognition device of the intelligent scanning pen.

The beneficial effects of the application are as follows: compared with the prior art, the image recognition method provided by the application does not need an extra language model, converts the original two-step recognition image text content into one-step recognition, adaptively considers visual and language information in a unified structure, and displays stronger language capability while keeping high efficiency.

Drawings

Fig. 1 is a flowchart of an image recognition method of an intelligent scanning pen implemented in a specific embodiment.

Fig. 2 is a block diagram of S03 in the image recognition method of the intelligent scanning pen according to the embodiment: a flow chart of sub-steps of training a model.

The realization, functional characteristics and advantages of the present application are further described with reference to the accompanying drawings in combination with the embodiments.

Description of the embodiments

In order to more clearly and completely describe the technical scheme of the application, the application is further described below with reference to the accompanying drawings.

Please refer to fig. 1-2.

In this embodiment, an image recognition method of an intelligent scanning pen is provided, and the method includes the following steps:

Further, in this embodiment, S0: the image text recognition model is constructed, trained and tested and comprises the following substeps:

s01: constructing a model;

s02: preparing a training data set;

s03: training a model;

s04: and (5) testing the model.

Further, in this embodiment, S01: the model construction method comprises the following substeps:

s01: the construction model specifically comprises the following steps: s011: respectively constructing a feature extraction network, a mask language perception module and a visual reasoning module;

Further, in the present embodiment, S02 is specifically:

s021: selecting a plurality of training images;

s022: making word-level labels for each training image;

s023: for each training image in the data set with the words marked, randomly generating a character index according to the length of the words marked on the current image, wherein the character index represents the index of the blocked character, and randomly acquiring the index of the character for the input word image with each length;

Further, in this embodiment, S03: the training model comprises the following substeps:

s033: the mask language perception module performs character mask mapping processing on the two-dimensional feature vector of the current image, the word level label carried by the current image and the character index carried by the current image to obtain shielding feature mapping; the mask language sensing module takes a two-dimensional feature vector and a character index of a current image as input, and in specific implementation, a technician can build the mask language sensing module by using a transducer unit and a sigmoid layer, the transducer unit is utilized to improve feature representation capability, character index information is combined, character mask mapping is obtained by the sigmoid layer, and shielding feature mapping is further generated.

S034: combining the current image, the two-dimensional feature vector of the current image and the shielding feature map to obtain an image with shielding characters; in order to guide the learning process of the shielding characteristic mapping, two parallel branches can be designed on the basis of weak supervision complementary learning when the mask language perception module is built.

The weakly supervised complementary learning is intended to guide the occlusion feature map to cover areas of more occluded characters, thereby complementarily causing the occlusion feature map to contain areas of more other characters. In the first branch, the element-wise product between the two-dimensional feature vector of the current image and the occlusion feature map is multiplied to generate a feature map containing the visual semantics of the occluded character. Correspondingly, in a second branch, the elemental product between the two-dimensional feature vector of the current image and the occlusion feature map is used to generate a feature map containing the visual semantics of the other characters. In this way, the first and second branch complementary learning processes will direct the occlusion feature map to cover only the characters in the corresponding location, and not overlap with other characters.

The weights of the transducer elements and the prediction layer are shared between two parallel arranged branches, which can be used for feature representation enhancement and semantic guidance. The supervision of weakly supervised complementary learning is automatically obtained using the original word level labels and randomly generated character indices, so that the mask language perception module can automatically generate accurate character mask mappings without additional labels.

the visual reasoning module comprises a visual semantic reasoning layer and a parallel prediction layer. The visual semantic reasoning layer consists of a plurality of transformer units, and can effectively model long-term dependency relationship without being influenced by word length.

Because the mask language perception module accurately blocks character information in a training stage, the visual semantic reasoning layer is guided to predict the dependency relationship between the visual characteristics of the characters, so that the semantics of the blocked characters are deduced. Therefore, under word-level supervision, the visual semantic reasoning layer learns to actively model language information in the visual context to assist in recognition, and because of the design, the visual semantic reasoning layer can adaptively consider the language information for visual feature enhancement under the condition of current visual semantic confusion (such as stain shielding, trace sketching and the like) in a subsequent model test stage. After the feature mapping generated by the visual semantic reasoning layer is visualized, how the learned language information improves the recognition performance can be better understood.

Further, in this embodiment, 04 includes the following sub-steps:

Further, in this embodiment, S04 further includes the following substeps:

Further, in this embodiment, S04 further includes: and evaluating the first image text recognition model to be evaluated and the second image text recognition model to be evaluated, and selecting the better one as the final image text recognition model.

In this embodiment, there is also provided an image recognition apparatus of an intelligent scanning pen, which stores the image recognition method of the intelligent scanning pen as described above and performs the method under control.

In this embodiment, there is also provided an intelligent scanning pen, which employs the image recognition apparatus of the intelligent scanning pen as described above.

Of course, the present application can be implemented in various other embodiments, and based on this embodiment, those skilled in the art can obtain other embodiments without any inventive effort, which fall within the scope of the present application.

Claims

1. An image recognition method of an intelligent scanning pen is characterized by comprising the following steps:

2. The image recognition method of an intelligent scanning pen according to claim 1, wherein S0: the image text recognition model is constructed, trained and tested and comprises the following substeps:

s01: constructing a model;

s02: preparing a training data set;

s03: training a model;

s04: and (5) testing the model.

3. The image recognition method of an intelligent scanning pen according to claim 2, wherein the step S01: the model construction method comprises the following substeps:

4. The image recognition method of the intelligent scanning pen according to claim 3, wherein the S02 specifically is:

s021: selecting a plurality of training images;

s022: making word-level labels for each training image;

5. The image recognition method of an intelligent scanning pen according to claim 4, wherein the step S03: the training model comprises the following substeps:

6. The image recognition method of an intelligent scanning pen according to claim 5, wherein S04 comprises the following sub-steps:

7. The image recognition method of an intelligent scanning pen according to claim 6, wherein S04 further comprises the sub-steps of:

8. The method for recognizing an image of an intelligent scanning pen according to claim 6, wherein S04 further comprises: and evaluating the first image text recognition model to be evaluated and the second image text recognition model to be evaluated, and selecting the better one as the final image text recognition model.

9. An image recognition apparatus of an intelligent scanning pen, characterized in that the apparatus stores the image recognition method of an intelligent scanning pen as claimed in any one of claims 1 to 8 and is controlled to execute the method.

10. An intelligent scanning pen, characterized in that it employs the image recognition apparatus of an intelligent scanning pen as claimed in claim 9.