CN115660952A

CN115660952A - Image processing method, dictionary pen and storage medium

Info

Publication number: CN115660952A
Application number: CN202211222183.2A
Authority: CN
Inventors: 林云峰; 丁威
Original assignee: Zhejiang Maojing Artificial Intelligence Technology Co ltd
Current assignee: Zhejiang Maojing Artificial Intelligence Technology Co ltd
Priority date: 2022-10-08
Filing date: 2022-10-08
Publication date: 2023-01-31

Abstract

The embodiment of the application provides an image processing method, a dictionary pen and a storage medium, wherein the image processing method comprises the following steps: acquiring continuous multi-frame text images to be recognized; determining an area which is close to the acquisition starting side and has image quality larger than a first image threshold value in the last frame of image as a target area; determining a region which is close to the acquisition starting side and has image quality larger than a second image threshold value in the adjacent previous frame of image as a region to be spliced; determining the same pixel distribution as that of the target area in the area to be spliced as a spliced area; splicing based on the target area and the splicing area to obtain a spliced image; obtaining a plurality of first characters to be processed corresponding to the spliced image, position information of each first character to be processed in the spliced image, and a second character to be processed corresponding to the last frame of image; and deleting the plurality of first characters to be processed according to the position information and the relation between the second characters to be processed and the plurality of first characters to be processed.

Description

Image processing method, dictionary pen and storage medium

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to an image processing method, a dictionary pen and a storage medium.

Background

Currently, electronic devices with text recognition functions are emerging in the field of educational hardware, such as dictionary pens and the like.

Generally, the content of a book is continuously acquired through electronic equipment to obtain multiple frames of images, each image may include text segments of the book, and then the acquired multiple frames of images may be identified through the electronic equipment to obtain corresponding text content, and corresponding original text, translated text, paraphrasing and the like may be determined based on the text content.

However, this requires the electronic device to accurately recognize the text included in the captured images of the plurality of frames.

Disclosure of Invention

In view of the above, embodiments of the present application provide an image processing scheme to at least partially solve the above problem.

According to a first aspect of embodiments of the present application, there is provided an image processing method, including: acquiring continuous multi-frame text images to be recognized; analyzing the image quality of the last frame of image in the multiple frames of text images to be recognized, and determining an area, which is close to the acquisition starting side and has the image quality larger than a first image threshold value, in the last frame of image as a target area; performing image quality analysis on the last frame of image adjacent to the last frame of image, and determining an area, which is close to the acquisition starting side and has image quality larger than a second image threshold value, in the adjacent last frame of image as an area to be spliced, wherein the first image quality threshold value is larger than the second image quality threshold value; comparing the pixels in the target area with the pixels of the area to be spliced, and determining a splicing area with the same pixel distribution as that of the target area from the area to be spliced; splicing the last frame of image and the adjacent last frame of image based on the target area and the splicing area to obtain a spliced image; obtaining a plurality of first characters to be processed corresponding to the spliced image, position information of each first character to be processed in the spliced image, and a second character to be processed corresponding to the last frame of image; and deleting the plurality of first characters to be processed according to the position information and the relationship between the second characters to be processed and the plurality of first characters to be processed.

According to a second aspect of an embodiment of the present application, there is provided a dictionary pen including: the device comprises an image acquisition device, a processor and an output device, wherein the image acquisition device is used for acquiring continuous multi-frame text images to be identified; the processor is used for carrying out image quality analysis on the last frame of image in the multiple frames of text images to be recognized, and determining an area, close to the acquisition starting side and with image quality larger than a first image threshold value, in the last frame of image as a target area; performing image quality analysis on the last frame of image adjacent to the last frame of image, and determining an area, which is close to the acquisition starting side and has image quality larger than a second image threshold value, in the adjacent last frame of image as an area to be spliced, wherein the first image quality threshold value is larger than the second image quality threshold value; comparing the pixels in the target area with the pixels of the area to be spliced, and determining a splicing area with the same pixel distribution as that of the target area from the area to be spliced; splicing the last frame of image and the adjacent last frame of image based on the target area and the splicing area to obtain a spliced image; obtaining a plurality of first characters to be processed corresponding to the spliced image, position information of each first character to be processed in the spliced image, and a second character to be processed corresponding to the last frame of image; deleting the plurality of first characters to be processed according to the position information and the relation between the second characters to be processed and the plurality of first characters to be processed to obtain processed characters, and determining corresponding output contents according to the processed characters; the output device is used for outputting the output content.

According to a third aspect of embodiments of the present application, there is provided a computer storage medium having stored thereon a computer program which, when executed by a processor, implements the image processing method as described above.

According to the scheme provided by the embodiment of the application, continuous multi-frame text images to be recognized are obtained; performing image quality analysis on the last frame of image in the plurality of frames of text images to be recognized, and determining an area, which is close to the acquisition starting side and has image quality larger than a first image threshold value, in the last frame of image as a target area; performing image quality analysis on a previous frame of image adjacent to the last frame of image, determining a region, which is close to the acquisition starting side and has image quality larger than a second image threshold, in the adjacent previous frame of image as a region to be spliced, wherein the first image quality threshold is larger than the second image quality threshold, so that a target region with higher image quality can be determined from the side, which is close to the acquisition starting side, in the last frame of image, a region to be spliced with higher image quality can be determined from the previous frame of image adjacent to the last frame of image, and the area of the region to be spliced is larger than that of the target region by ensuring that the first image threshold is larger than the second image threshold, so that when comparing pixels in the target region with pixels in the region to be spliced, a splicing region with the same pixel distribution as that of the target region can be determined from the region to be spliced; splicing the last frame of image and the adjacent last frame of image based on the target area and the splicing area to obtain a spliced image, wherein the last frame of image can be ensured to be completely remained at the tail part of the spliced image; then obtaining a plurality of first characters to be processed corresponding to the stitched image, position information of each first character to be processed in the stitched image, and a second character to be processed corresponding to the last frame of image, wherein the last frame of image is completely retained at the tail of the stitched image, so that the first character to be processed corresponding to the stitched image and the second character to be processed corresponding to the last frame of image have a high position matching degree, thereby reducing position errors caused by character deformation in the image as much as possible, particularly improving the position errors of a part of the first characters to be processed corresponding to the last frame of image in the stitched image, and enabling the errors when deleting the plurality of first characters to be processed to be small according to the position information and the relationship between the second character to be processed and the plurality of first characters to be processed, thereby improving the precision; in addition, according to the scheme provided by the application, the plurality of first characters to be processed are deleted based on the position information, so that when the plurality of lines of the first characters to be processed exist, the error condition can not occur.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the description below are only some embodiments described in the embodiments of the present application, and other drawings can be obtained by those skilled in the art according to these drawings.

FIG. 1A is a diagram illustrating a structure of a pen for a dictionary according to an embodiment of the present disclosure;

FIG. 1B is a diagram illustrating image recognition in the embodiment of FIG. 1A;

fig. 2A is a schematic flowchart of an image processing method according to an embodiment of the present disclosure;

FIG. 2B is a diagram illustrating an example of a scenario in the embodiment shown in FIG. 2A;

fig. 3A is a flowchart illustrating steps of an image processing method according to an embodiment of the present disclosure;

FIG. 3B is a schematic diagram of image stitching in the embodiment of FIG. 3A;

FIG. 3C is a diagram illustrating an example of a scenario in the embodiment shown in FIG. 3A;

FIG. 3D is a schematic illustration of another image stitching in the embodiment shown in FIG. 3A;

FIG. 4 is a diagram illustrating a structure of a pen for a dictionary according to an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the embodiments of the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application shall fall within the scope of the protection of the embodiments in the present application.

A usage scenario of the present application is described first, and referring to fig. 1, a schematic diagram of a dictionary pen for collecting a book is shown, where the dictionary pen includes a pen body and a pen head 10, a shielding plate 11 is disposed on the pen head 10, and a camera (not shown in the figure) is disposed inside the shielding plate 11 and is used for collecting content (for example, what is shown in fig. 1) on the book.

Specifically, in the using process, a user can use the dictionary pen to sweep the text according to the sequence of characters in the book, the camera of the pen point is used for quickly recording or quickly shooting the text to obtain multi-frame images, then the multi-frame images are spliced, character recognition is carried out on the basis of the spliced images, then text-to-speech playing is carried out, and corresponding translation (such as English-to-Chinese) results are synchronously output.

Specifically, when a text is drawn by using a dictionary stroke, taking the drawing from left to right as an example, the left side of the shielding plate is the content seen by the user, and the right side of the shielding plate is the image shot by the camera. When a user lifts the pen, the camera still collects the content on the right side of the shielding plate, namely the last frame of image collected by the camera, and the content is the content after the pen is lifted and needs to be discarded during processing.

Generally, the last frame of image can be identified, and the text corresponding to the identification result of the last frame is discarded from the identified text corresponding to the multiple frames of images; however, if the last frame of image includes multiple lines of characters, especially if the multiple lines of characters also include some words or phrases, and the result recognized from the previous image is repeated, the situation of error discarding is easy to occur. For example, referring to fig. 1B, when there are multiple lines of text in the image, the dictionary pen typically recognizes only the middle line of text in the image, i.e., "I love china" in the image, "ve in china" in the first line, and so does not recognize it because it is not in the middle. The part after the pen-up position shown in fig. 1B is the last frame image, which only includes China in the first row, that is, the identification result corresponding to the last frame image is "China", and if China in the previous identification result is discarded, the remaining content is "I love", that is, a situation of false discarding occurs.

In addition, multiple frames of images can be directly spliced, and the last frame of image can be taken out of the spliced images, but due to the fact that splicing accuracy in image splicing is limited, the situation that more or fewer deletions are needed can occur.

Therefore, embodiments of the present application provide an image processing method to solve or alleviate the above problems as much as possible.

The following further describes specific implementations of embodiments of the present application with reference to the drawings of the embodiments of the present application.

Fig. 2A is a schematic flowchart of an image processing method according to an embodiment of the present application, and as shown in the drawing, the image processing method includes:

s201, acquiring continuous multi-frame text images to be identified;

in this embodiment, the electronic device for acquiring the text image to be recognized may be a dictionary pen, or may be other devices capable of acquiring an image, which is not limited in this embodiment.

When the text image to be recognized is collected specifically, a user can hold the electronic equipment by hand to pass through the book to be collected, and a camera of the electronic equipment carries out continuous shooting to obtain a plurality of frames of text images to be recognized. Of course, the above description is only given by way of example of a book, and any object with text may be the object to be captured.

S202, carrying out image quality analysis on the last frame of image in the plurality of frames of text images to be recognized, and determining an area, close to the acquisition starting side and with image quality larger than a first image threshold value, in the last frame of image as a target area;

by analyzing the image quality of the last frame of image in the plurality of frames of text images to be recognized, an area which is close to the acquisition starting side and has higher image quality can be selected as a target area, and the accuracy of comparing pixels in the subsequent step S204 is improved.

For example, the image quality analysis may include, but is not limited to, a brightness analysis, a contrast analysis, a text sharpness analysis, and the like, which is not limited by the embodiment.

S203, carrying out image quality analysis on the previous frame of image adjacent to the last frame of image, and determining the area, close to the acquisition starting side and with the image quality larger than a second image threshold value, of the adjacent previous frame of image as an area to be spliced, wherein the first image quality threshold value is larger than the second image quality threshold value.

By analyzing the image quality of the previous frame of image adjacent to the last frame of image, an area close to the acquisition starting side and having higher image quality can be selected as an area to be spliced, and the accuracy of comparing pixels in the subsequent step S204 can also be improved.

In addition, in the embodiment of the application, because the first image quality threshold is greater than the second image quality threshold, the determined area of the region to be stitched is greater than the area of the target region, so that when the pixels in the target region are compared with the pixels in the region to be stitched, a stitching region with the same pixel distribution as that of the target region can be determined from the region to be stitched.

The first image threshold and the second image threshold can be determined by those skilled in the art according to the requirements, as long as it can be ensured that a splicing region having the same pixel distribution as that of the target region exists in the region to be spliced.

S204, comparing the pixels in the target area with the pixels in the area to be spliced, and determining a splicing area with the same pixel distribution as that of the target area from the area to be spliced.

In addition, the target area and the splicing area are close to the acquisition starting side, so that the last frame of image can completely occupy the tail part of the spliced image after the last frame of image is spliced with the adjacent last frame of image. For a specific method for performing pixel comparison, reference may be made to related technologies, which are not described herein again.

S205, based on the target area and the splicing area, splicing the last frame of image and the adjacent last frame of image to obtain a spliced image.

For the last frame of image in the multiple frames of text images to be recognized, the pixels of the target area at the collection starting side in the last frame of image can be matched with the area to be spliced of the adjacent previous frame of image, the splicing area matched with the pixels of the target area in the adjacent previous frame of image is determined, and image splicing is performed according to the splicing area.

Exemplarily, taking an example that a dictionary pen collects multiple frames of text images to be recognized from left to right, for a last frame of image in the multiple frames of text images to be recognized, a splicing area may be determined based on a correspondence between pixels of a target area on the left side of the last frame of image and pixels of a to-be-spliced area of an adjacent previous frame of image, and at least half of the target area and the right side of the last frame of image are spliced with the adjacent previous frame of image, so that the last frame of image in the obtained spliced image may be relatively completely retained on the right side of the spliced image.

In addition, in this embodiment, multiple frames of text images to be recognized may be first spliced in sequence, and the step S202 is executed when the last frame of text image is spliced; alternatively, the step S202 may be directly performed without performing the stitching in sequence, in this case, for any two adjacent images except the last image, the pixels of any one of the multiple images may be compared with the adjacent previous image, so as to determine the area with the same pixel distribution in the two images, and after the area with the same pixel distribution in all any two adjacent images is determined, the consecutive multiple images may be stitched to obtain the stitched image.

The method for stitching images can refer to the related art, and this embodiment does not limit this.

S206, obtaining a plurality of first characters to be processed corresponding to the spliced image, position information of each first character to be processed in the spliced image, and a second character to be processed corresponding to the last frame of image;

in this embodiment, by performing text recognition on the stitched image, a plurality of first characters to be processed corresponding to the stitched image and position information of each first character to be processed in the stitched image can be obtained, for example, an offset distance of each first character to be processed with respect to a left edge of the stitched image can be obtained.

In this embodiment, the second character to be processed corresponding to the last frame image can be obtained by performing text recognition on the last frame image.

For a specific text recognition method, reference may be made to related technologies, which are not described herein again.

S207, deleting the plurality of first characters to be processed according to the position information and the relation between the second characters to be processed and the plurality of first characters to be processed.

In the above steps S202 to S205, when the last frame of image is stitched, it can be ensured that the last frame of image can be completely retained at the tail of the stitched image, so that the position matching degree between the first character to be processed corresponding to the stitched image and the second character to be processed corresponding to the last frame of image is higher, thereby reducing the position error caused by the character deformation in the image as much as possible and improving the accuracy; in addition, according to the scheme provided by the application, the plurality of first characters to be processed are deleted based on the position information, so that when the plurality of first characters to be processed exist, the situation of errors can not occur.

For example, in this embodiment, if the last frame image corresponds to the second character to be processed, the position information and the first character to be processed corresponding to the last frame image may be deleted according to the position information of each first character to be processed in the concatenated character, so that an error caused by performing other matching is avoided, and the precision is improved.

The following describes an exemplary implementation scenario of the present disclosure.

Referring to fig. 2B, a plurality of collected continuous frames of text images to be recognized are shown, and the plurality of frames of text images to be recognized can be collected from left to right;

and splicing the plurality of frames of text images to be recognized to obtain a spliced image. When specifically splicing is performed, especially when the last frame image is spliced, a spliced image can be obtained by splicing a target area on the left side of the last frame image and a spliced area close to the left side in an adjacent previous frame image, so that the last frame image completely occupies the tail of the spliced image, and the spliced image and the last frame image correspond to a part in a wire frame in fig. 2B.

And then, text recognition can be carried out on the spliced image and the last frame of image to obtain a plurality of first characters to be processed corresponding to the spliced image, position information of each first character to be processed in the spliced image and a second character to be processed corresponding to the last frame of image. The position information is specifically the position deviation of the first character to be processed from the left edge of the stitched image, and fig. 2B exemplarily shows the position information of part of the first character to be processed in the stitched image, for example, the position information corresponding to the character "ina.

The method comprises the steps of performing character recognition on a last frame of image to determine whether a character to be processed corresponds to the last frame of image, deleting a first character to be processed corresponding to position information and the last frame of image according to the position information if the character to be processed corresponds to the character to be processed, determining translation, paraphrase and the like corresponding to the deleted first character to be processed and displaying the translation, the paraphrase and the like after the deleted first character to be processed is obtained.

According to the scheme provided by the embodiment, continuous multi-frame text images to be recognized are obtained; analyzing the image quality of the last frame of image in the multiple frames of text images to be recognized, and determining an area, which is close to the acquisition starting side and has the image quality larger than a first image threshold value, in the last frame of image as a target area; performing image quality analysis on a previous frame of image adjacent to the last frame of image, determining a region, which is close to the acquisition starting side and has image quality larger than a second image threshold, in the adjacent previous frame of image as a region to be spliced, wherein the first image quality threshold is larger than the second image quality threshold, so that a target region with higher image quality can be determined from the side, which is close to the acquisition starting side, in the last frame of image, a region to be spliced with higher image quality can be determined from the previous frame of image adjacent to the last frame of image, and the area of the region to be spliced is larger than that of the target region by ensuring that the first image threshold is larger than the second image threshold, so that when comparing pixels in the target region with pixels in the region to be spliced, a splicing region with the same pixel distribution as that of the target region can be determined from the region to be spliced; splicing the last frame of image and the adjacent last frame of image based on the target area and the splicing area to obtain a spliced image, so that the last frame of image can be ensured to be completely retained at the tail part of the spliced image; then obtaining a plurality of first characters to be processed corresponding to the stitched image, position information of each first character to be processed in the stitched image, and a second character to be processed corresponding to the last frame of image, wherein the last frame of image is completely retained at the tail of the stitched image, so that the first character to be processed corresponding to the stitched image and the second character to be processed corresponding to the last frame of image have a high position matching degree, thereby reducing position errors caused by character deformation in the image as much as possible, particularly improving the position errors of a part of the first characters to be processed corresponding to the last frame of image in the stitched image, and enabling the errors when deleting the plurality of first characters to be processed to be small according to the position information and the relationship between the second character to be processed and the plurality of first characters to be processed, thereby improving the precision; in addition, according to the scheme provided by the application, the plurality of first characters to be processed are deleted based on the position information, so that when the plurality of first characters to be processed exist, the situation of errors can not occur.

The image processing method of the present embodiment may be performed by any suitable electronic device having data processing capabilities, including but not limited to: a server, a mobile terminal (such as a mobile phone, a PAD and the like), a PC and the like.

Fig. 3A is a flowchart illustrating steps of an image processing method according to an embodiment of the present application, where as shown in the drawing, the method includes:

s301, obtaining continuous multi-frame text images to be recognized.

S302, sequentially determining the second frame image and the penultimate frame image as images to be spliced, and determining a target area in the images to be spliced and a spliced area which is adjacent to the images to be spliced and has the same pixel distribution with the target area in the previous frame image.

Optionally, in this embodiment of the present application, for any image to be stitched, image quality analysis may be performed on the image to be stitched, a region close to the acquisition start side and having an image quality greater than a first image threshold value is determined as a target region, image quality analysis may be performed on an adjacent previous frame image of the image to be stitched, and a region close to the acquisition start side and having an image quality greater than a second image threshold value is determined as a region to be stitched; and comparing the pixels in the target area with the pixels of the area to be spliced, and determining a spliced area with the same pixel distribution as that of the target area from the area to be spliced.

For a specific method for determining the target area and the splicing area, reference may be made to the above embodiments, and details are not described herein again.

Referring to fig. 3B, the right area in fig. 3B is the image to be stitched, the left side is the previous frame image adjacent to the right area, and the box in the right area shows a schematic diagram of a target area.

S303, carrying out image quality analysis on the last frame of image in the plurality of frames of text images to be recognized, and determining an area, close to the acquisition starting side and with the image quality larger than a first image threshold value, in the last frame of image as a target area.

The method for determining the target area in step S302 is the same as the method for determining the target area in the last frame image.

Optionally, in this embodiment, step S303 may include: performing image quality analysis on at least half of an image area, which is close to the acquisition starting side, in the last frame of image; and according to the image quality analysis result, determining a region with image quality larger than the first image threshold value from at least half of the image region close to the acquisition starting side in the last frame of image as the target region. Therefore, the target area with high image quality can be selected for matching, the matching accuracy is improved, and compared with the image quality analysis of the whole frame of image, the resource amount consumed by the image quality analysis is reduced by selecting at least half of the image area close to the acquisition starting side for analysis.

Optionally, in this embodiment, the image quality analysis includes that the image quality analysis includes at least one of: brightness analysis, deformation analysis, brightness uniformity analysis, contrast analysis, signal-to-noise ratio analysis and definition analysis. Correspondingly, the first image threshold comprises at least one of: a first brightness threshold, a first deformation threshold, a first brightness uniformity threshold, a first contrast threshold, a first signal-to-noise ratio threshold, and a first resolution threshold; correspondingly, the second image threshold comprises at least one of: a second brightness threshold, a second deformation threshold, a second brightness uniformity threshold, a second contrast threshold, a second signal-to-noise ratio threshold, and a second resolution threshold. In the process of collecting through the dictionary pen, the camera and the collected surface form a certain angle, for example, 45 degrees, so that the content in the shot image can be deformed, in this embodiment, it is preferable to perform image quality analysis at least according to the brightness analysis and the deformation analysis, so that the quality of the determined target area is improved.

Brightness analysis

Generally, an image captured by a dictionary pen is generally an image of a black character with white background, and therefore, the brightness of white background in a certain image area is mainly analyzed as the brightness of the image area when the brightness analysis is performed.

Deformation analysis

When carrying out deformation analysis, can mainly analyze the deformation degree of perpendicular to collection direction, of course, also can analyze the deformation degree of being on a parallel with collection direction, and this is also in the scope of protection of this application.

Analysis of luminance uniformity

Aiming at a certain image area, the brightness of each pixel point in the image area can be collected, the variance of the brightness is calculated, and the variance result is used as the brightness uniformity analysis result; or, the deviation between the brightness of each pixel point and the highest brightness or the lowest brightness in the image region can be calculated, and the deviation result is used as the brightness uniformity analysis result.

Contrast analysis

Aiming at a certain image area, the brightness of each pixel point in the image area can be collected, and the contrast of the image area is calculated according to the brightness. For example, the contrast of the image area is determined by performing a difference calculation on the average value of the luminance values of the different colors.

Signal-to-noise ratio analysis

When the signal-to-noise ratio analysis is performed, a camera of the dictionary pen can be used for collecting an image with single color filling, and the signal-to-noise ratio is calculated for a certain image area based on the collected image. Moreover, images with different single color fillings can be scanned, and the signal-to-noise ratio of the same image area is calculated respectively based on the scanned images to obtain the final signal-to-noise ratio analysis result.

Resolution analysis

When the definition analysis is performed, a camera of the dictionary pen can be used for collecting an image with lines in a single direction, and whether the lines can be distinguished or not is judged based on the collected image. Moreover, images of lines in a single direction with different thicknesses or intervals can be scanned, whether the lines can be distinguished or not is judged respectively based on the scanned images, and a final definition analysis result is obtained.

In addition, for the dictionary pen, a gray scale test and a convergence test can be performed.

Gray scale testing

When the gray scale test is carried out, a camera of the dictionary pen can be used for collecting images with a plurality of lines of color frames, whether the color frames can be distinguished or not is judged based on the collected images, and a gray scale test result is obtained.

Convergence test

When the convergence test is carried out, the camera of the dictionary pen can be used for collecting images with the upper half part of black-background white characters and the lower half part of white-background black characters, the exposure stable frame number is determined based on the collected images, and the convergence test result is obtained according to the frame number.

S304, carrying out image quality analysis on the last frame of image adjacent to the last frame of image, and determining an area, close to the acquisition starting side and with image quality larger than a second image threshold value, in the last frame of image adjacent to the last frame of image as an area to be spliced.

It should be noted that the method for determining the region to be stitched in the step S302 is the same as the method for determining the region to be stitched in the last frame image.

S305, comparing the pixels in the target area with the pixels in the area to be spliced, and determining a splicing area with the same pixel distribution as that of the target area from the area to be spliced.

Specifically, in this embodiment, a region to be stitched may be determined from the previous frame of image, where the area of the region to be stitched is greater than the target region, and the region to be stitched is specifically shown in the left square of fig. 3B. And then, determining a sliding window which is the same as the target area in the area to be spliced, and comparing the pixels of the target area with the pixels of the sliding window to determine an area which is matched with the target area from the area to be spliced as a splicing area through the sliding window.

And S306, carrying out image splicing based on the target area and the splicing area to obtain the spliced image.

Optionally, in this embodiment, step S305 may include: and splicing the last frame of image and the adjacent last frame of image by taking the central line of the target area and the central line of the splicing area as splicing positions. Specifically, the centerline may be shown with reference to the vertical centerline of the splice area and the target area in FIG. 3B. Since the deformation of the center line position is generally the smallest, the image quality of the stitched image can be improved by taking the center line as the stitching position.

Of course, the above method may be adopted when two adjacent frames of images other than the last frame of image are stitched, and of course, other methods may also be adopted, which is not limited in this embodiment.

Optionally, before step S306, the method may further include: and carrying out image brightness preprocessing and/or image jump preprocessing on the target area and the splicing area. Therefore, the conditions of brightness jump or image content jump and the like of the spliced image can be avoided, and the quality of the spliced image is improved. For a specific method for performing the image brightness preprocessing and/or the image jump preprocessing, reference may be made to related technologies, which are not described herein again.

S307, performing text recognition on the spliced image to obtain a plurality of first characters to be processed corresponding to the spliced image and position information of each first character to be processed in the spliced image.

And S308, performing text recognition on the last frame of image to obtain a second character to be processed corresponding to the last frame of image.

S309, judging whether the second character to be processed is empty or not.

And S310, if the second character to be processed is not a null character, deleting a character corresponding to the position of the second character to be processed in the first character to be processed according to the position information.

Optionally, in this embodiment of the application, the position information of the first character to be processed in the stitched image includes: the position of each first character to be processed is shifted with respect to the edge of the start side of the acquisition of the stitched image, and correspondingly, step S309 may include: determining a first image length of the spliced image in the acquisition direction and a second image length of the last frame image in the acquisition direction, and calculating a difference value between the first image length and the second image length to obtain a distance threshold corresponding to the last frame image; deleting the first character to be processed with the position deviation larger than the distance threshold.

For example, referring to fig. 3C, in this embodiment, when the acquisition direction is from left to right, the distance threshold = a total length of the stitched image in the horizontal direction (a first image length) -a length of the last frame image in the horizontal direction (a second image length), and the length may specifically be a pixel value. Of course, if the collecting direction is from top to bottom or oblique, the first image length and the second image length in the corresponding direction are adopted for calculation, which is also within the protection scope of the present application.

S311, if the second character to be processed is a null character, all the first characters to be processed are reserved.

Referring to fig. 3D, the second character to be processed is a null character, which indicates that the last frame of image may be a blank written image, the blank written image has less effective information, and the position is easily deviated during splicing, resulting in inaccurate position of the last frame of image. The continuous multi-frame text images to be recognized before splicing are arranged on the upper portion of the graph 3D, the expected spliced image is arranged in the middle of the graph 3D, the real spliced image is arranged on the lower portion of the graph 3D, and the dotted lines in the spliced images in the middle and the lower portion of the graph 3D correspond to the last frame image, as shown in the graph 3D, the position of the last frame image in the real spliced image is not accurate. Therefore, in this embodiment, the first character to be processed is deleted only when the second character to be processed is not a null character, and the first character to be processed may be completely reserved when the second character to be processed is a null character.

Fig. 4 is a schematic structural diagram of a dictionary pen according to an embodiment of the present application, as shown in the drawing, including: image acquisition device 401, processor 402, output device 403. The image acquisition device 401 may be any device with an image acquisition function, such as a camera; the output devices may include, but are not limited to, at least one of: display, speaker.

The image acquisition device 401 is configured to acquire continuous multiple frames of text images to be recognized.

The processor 402 is configured to perform image quality analysis on a last frame of image in the multiple frames of text images to be recognized, and determine an area, which is close to an acquisition starting side and has image quality greater than a first image threshold, in the last frame of image as a target area; performing image quality analysis on a last frame of image adjacent to the last frame of image, and determining an area, which is close to the acquisition starting side and has image quality larger than a second image threshold value, in the last frame of image adjacent to the last frame of image as an area to be spliced, wherein the first image quality threshold value is larger than the second image quality threshold value; comparing the pixels in the target area with the pixels in the area to be spliced, and determining a spliced area with the same pixel distribution as that of the target area from the area to be spliced; splicing the last frame image and the adjacent last frame image based on the target area and the splicing area to obtain a spliced image; obtaining a plurality of first characters to be processed corresponding to the spliced image, position information of each first character to be processed in the spliced image, and a second character to be processed corresponding to the last frame of image; and deleting the plurality of first characters to be processed according to the position information and the relationship between the second characters to be processed and the plurality of first characters to be processed to obtain processed characters, and determining corresponding output contents according to the processed characters.

The output device 403 is used for outputting the output content.

Optionally, in this embodiment, the dictionary pen further includes a memory configured to store a knowledge base, and the processor is specifically configured to perform a query in the knowledge base according to the processed character, and determine corresponding output content according to a query result. The knowledge base may be any knowledge base, such as a bilingual dictionary, three hundred poems of down, etc. This embodiment does not limit this.

Referring to fig. 5, a schematic structural diagram of an electronic device provided in an embodiment of the present application is shown, and a specific embodiment of the present application does not limit a specific implementation of the electronic device.

As shown in fig. 5, the electronic device may include: a processor (processor) 502, a communication Interface (Communications Interface) 504, a memory (memory) 506, a communication bus 508, an image capture device 510, and an output device 512.

Wherein:

the processor 502, the communication interface 504, the memory 506, the image capture device 510, and the output device 512 communicate with one another via a communication bus 508.

The image acquisition device 510 is configured to acquire a plurality of continuous text images to be recognized. The image capturing device 510 may be a camera or the like.

A communication interface 504 for communicating with other electronic devices or servers.

The processor 502 is configured to execute the program 514, and may specifically execute relevant steps in the foregoing embodiment of the image processing method for multiple frames of text images to be recognized, and determine output content.

The output device 512 is used for outputting the output content. The output device 512 may be a display or a speaker, etc.

In particular, the program 514 may include program code comprising computer operating instructions.

The processor 502 may be a CPU (Central processing Unit), or an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement embodiments of the present Application. The intelligent device comprises one or more processors which can be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.

A memory 506 for storing a program 514. The memory 506 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

For specific implementation of each step in the program 514, reference may be made to corresponding steps and corresponding descriptions in units in the foregoing embodiments of the image processing method, which are not described herein again. It can be clearly understood by those skilled in the art that, for convenience and simplicity of description, the specific working processes of the above-described devices and modules may refer to the corresponding process descriptions in the foregoing method embodiments, and are not described herein again.

Embodiments of the present application also provide a computer storage medium, on which a computer program is stored, which when executed by a processor implements any of the image processing methods in the above-described method embodiments.

The embodiment of the present application further provides a computer program product, which includes computer instructions for instructing a computing device to execute an operation corresponding to any one of the image processing methods in the foregoing method embodiments.

It should be noted that, according to the implementation requirement, each component/step described in the embodiment of the present application may be divided into more components/steps, and two or more components/steps or partial operations of the components/steps may also be combined into a new component/step to achieve the purpose of the embodiment of the present application.

The above-described methods according to embodiments of the present application may be implemented in hardware, firmware, or as software or computer code storable in a recording medium such as a CD ROM, a RAM, a floppy disk, a hard disk, or a magneto-optical disk, or as computer code originally stored in a remote recording medium or a non-transitory machine-readable medium downloaded through a network and to be stored in a local recording medium, so that the methods described herein may be stored in such software processes on a recording medium using a general-purpose computer, a dedicated processor, or programmable or dedicated hardware such as an ASIC or FPGA. It is understood that a computer, processor, microprocessor controller, or programmable hardware includes memory components (e.g., RAM, ROM, flash memory, etc.) that can store or receive software or computer code that, when accessed and executed by a computer, processor, or hardware, implements the image processing methods described herein. Further, when a general-purpose computer accesses code for implementing the image processing method shown herein, execution of the code converts the general-purpose computer into a special-purpose computer for executing the image processing method shown herein.

Those of ordinary skill in the art will appreciate that the various illustrative elements and method steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the embodiments of the present application.

The above embodiments are only used for illustrating the embodiments of the present application, and not for limiting the embodiments of the present application, and those skilled in the art can make various changes and modifications without departing from the spirit and scope of the embodiments of the present application, so that all equivalent technical solutions also belong to the scope of the embodiments of the present application, and the scope of the patent protection of the embodiments of the present application should be defined by the claims.

Claims

1. An image processing method, comprising:

acquiring continuous multi-frame text images to be recognized;

analyzing the image quality of the last frame of image in the multiple frames of text images to be recognized, and determining an area, which is close to the acquisition starting side and has the image quality larger than a first image threshold value, in the last frame of image as a target area;

performing image quality analysis on the last frame of image adjacent to the last frame of image, and determining an area, which is close to the acquisition starting side and has image quality larger than a second image threshold value, in the adjacent last frame of image as an area to be spliced, wherein the first image quality threshold value is larger than the second image quality threshold value;

comparing the pixels in the target area with the pixels of the area to be spliced, and determining a splicing area with the same pixel distribution as that of the target area from the area to be spliced;

splicing the last frame image and the adjacent last frame image based on the target area and the splicing area to obtain a spliced image;

obtaining a plurality of first characters to be processed corresponding to the spliced image, position information of each first character to be processed in the spliced image, and a second character to be processed corresponding to the last frame of image;

and deleting the plurality of first characters to be processed according to the position information and the relationship between the second characters to be processed and the plurality of first characters to be processed.

2. The method of claim 1, wherein the stitching the last frame image and the adjacent previous frame image based on the target area and the stitching area to obtain the stitched image comprises:

and splicing the last frame of image and the adjacent last frame of image by taking the central line of the target area and the central line of the splicing area as splicing positions to obtain the spliced image.

3. The method according to claim 1, wherein the performing image quality analysis on a last image in the plurality of frames of text images to be recognized, and determining an area, which is close to an acquisition starting side and has an image quality greater than a first image threshold, in the last image as a target area comprises:

performing image quality analysis on at least half of an image area, which is close to the acquisition starting side, in the last frame of image;

and according to the image quality analysis result, determining a region with image quality larger than the first image threshold value from at least half of the image region close to the acquisition starting side in the last frame of image as the target region.

4. The method of claim 1, wherein the image quality analysis comprises at least one of: brightness analysis, deformation analysis, brightness uniformity analysis, contrast analysis, signal-to-noise ratio analysis and definition analysis;

correspondingly, the first image threshold comprises at least one of: a first brightness threshold, a first deformation threshold, a first brightness uniformity threshold, a first contrast threshold, a first signal-to-noise ratio threshold, and a first resolution threshold;

correspondingly, the second image threshold comprises at least one of: a second brightness threshold, a second deformation threshold, a second brightness uniformity threshold, a second contrast threshold, a second signal-to-noise ratio threshold, and a second resolution threshold.

5. The method according to claim 1, wherein the deleting the first characters to be processed according to the position information and the relationship between the second character to be processed and the first characters to be processed comprises:

and if the second character to be processed is not a null character, deleting the character corresponding to the position of the second character to be processed in the first character to be processed according to the position information.

6. The method of claim 5, wherein the position information of the first character to be processed in the stitched image comprises: correspondingly, the deleting the character corresponding to the second character to be processed in the first character to be processed according to the position information includes:

determining a first image length of the spliced image in the acquisition direction and a second image length of the last frame image in the acquisition direction, and calculating a difference value between the first image length and the second image length to obtain a distance threshold corresponding to the last frame image;

deleting the first character to be processed with the position deviation larger than the distance threshold.

7. The method according to claim 5, wherein the deleting the plurality of first characters to be processed according to the position information and the relationship between the second character to be processed and the plurality of first characters to be processed further comprises:

and if the second character to be processed is a null character, all the first characters to be processed are reserved.

8. The method according to any one of claims 1-7, wherein before the stitching the last frame image and the adjacent previous frame image based on the target area and the stitching area to obtain a stitched image, the method further comprises: and carrying out image brightness preprocessing and/or image jump preprocessing on the target area and the splicing area.

9. A dictionary pen, comprising: an image acquisition device, a processor and an output device,

the image acquisition device is used for acquiring continuous multi-frame text images to be identified;

the processor is used for carrying out image quality analysis on the last frame of image in the multiple frames of text images to be recognized, and determining an area, close to the acquisition starting side and with image quality larger than a first image threshold value, in the last frame of image as a target area; performing image quality analysis on the last frame of image adjacent to the last frame of image, and determining an area, which is close to the acquisition starting side and has image quality larger than a second image threshold value, in the adjacent last frame of image as an area to be spliced, wherein the first image quality threshold value is larger than the second image quality threshold value; comparing the pixels in the target area with the pixels of the area to be spliced, and determining a splicing area with the same pixel distribution as that of the target area from the area to be spliced; splicing the last frame image and the adjacent last frame image based on the target area and the splicing area to obtain a spliced image; obtaining a plurality of first characters to be processed corresponding to the spliced image, position information of each first character to be processed in the spliced image, and a second character to be processed corresponding to the last frame of image; deleting the plurality of first characters to be processed according to the position information and the relationship between the second characters to be processed and the plurality of first characters to be processed to obtain processed characters, and determining corresponding output contents according to the processed characters;

the output device is used for outputting the output content.

10. The dictionary pen of claim 9, further comprising a memory for storing a knowledge base, the processor being specifically configured to perform a query in the knowledge base based on the processed characters, and determine corresponding output content based on a result of the query.

11. The dictionary pen of claim 9, wherein the processor is specifically configured to stitch the last frame image and the adjacent previous frame image with a centerline of the target area and a centerline of the stitch area as a stitch position to obtain the stitch image.

12. The dictionary pen according to claim 9, wherein the processor is specifically configured to delete a character corresponding to the second character to be processed in the first character to be processed according to the position information if the second character to be processed is not an empty character; or if the second character to be processed is a null character, all the first characters to be processed are reserved.

13. A computer storage medium having stored thereon a computer program which, when executed by a processor, carries out the method of any one of claims 1 to 8.