WO2022121842A1 - 文本图像的矫正方法及装置、设备和介质 - Google Patents

文本图像的矫正方法及装置、设备和介质 Download PDF

Info

Publication number
WO2022121842A1
WO2022121842A1 PCT/CN2021/135748 CN2021135748W WO2022121842A1 WO 2022121842 A1 WO2022121842 A1 WO 2022121842A1 CN 2021135748 W CN2021135748 W CN 2021135748W WO 2022121842 A1 WO2022121842 A1 WO 2022121842A1
Authority
WO
WIPO (PCT)
Prior art keywords
camera
image
recognized
initial image
pixel
Prior art date
Application number
PCT/CN2021/135748
Other languages
English (en)
French (fr)
Inventor
高敬乾
王欢
周骥
冯歆鹏
Original Assignee
上海肇观电子科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海肇观电子科技有限公司 filed Critical 上海肇观电子科技有限公司
Publication of WO2022121842A1 publication Critical patent/WO2022121842A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/33Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • G06V10/247Aligning, centring, orientation detection or correction of the image by affine transforms, e.g. correction due to perspective effects; Quadrilaterals, e.g. trapezoids

Definitions

  • the present application relates to the technical field of artificial intelligence, and in particular, to a text image correction method and device, equipment and medium.
  • the image may be corrected for curvature, so as to overcome the problem that the accuracy of the text recognition is affected by the curvature of the reading material.
  • the correction algorithm itself, the text recognition effect on curved reading materials still needs to be improved.
  • a method for correcting a text image comprising: acquiring an initial image including an object to be recognized obtained by obliquely photographing a binocular camera, the binocular camera including a first camera and a second camera, the first camera and the The optical axes of the second cameras are not perpendicular to the placement surface of the object to be recognized, and the initial images include the first initial image including the object to be recognized obtained by tilting the first camera and the image including the object to be recognized obtained by tilting the second camera.
  • a second initial image determines that the object to be recognized is rotated around a set point on the object to be recognized to a rotation matrix corresponding to being perpendicular to the optical axis of the first camera; based on the first initial image and the second initial image, determine that the object to be recognized is included
  • the 3D image of the object is used to obtain a rotation corrected image obtained by rotating the 3D image around the set point to be perpendicular to the optical axis of the first camera; and the rotation corrected image is flattened and corrected to obtain a final corrected image.
  • an electronic circuit comprising a circuit configured to perform the steps of the remediation method described above.
  • a text image correction device comprising: a binocular camera configured to obliquely capture an initial image including an object to be recognized, the binocular camera comprising a first camera and a second camera, the first camera The optical axes of both the camera and the second camera are not perpendicular to the placement surface of the object to be recognized, the initial image includes a first initial image of the object to be recognized and a second initial image including the object to be recognized, and the first camera is configured to shoot obliquely a first initial image, a second camera configured to take the second initial image obliquely; and the electronic circuit described above.
  • an electronic device comprising: a processor; and a memory storing a program, the program including instructions that, when executed by the processor, cause the processor to perform the remedial method described above.
  • a non-transitory computer-readable storage medium storing a program, the program comprising instructions that, when executed by a processor of an electronic device, cause the electronic device to perform the remedial method described above.
  • FIG. 1 is a flowchart illustrating a method for correcting a text image according to an exemplary embodiment of the present disclosure
  • FIG. 2 is a schematic diagram illustrating the operation of a text image correction device according to an exemplary embodiment of the present disclosure
  • FIG. 3 is a schematic diagram illustrating a first pixel strip and a 3D guideline in a first initial image according to an exemplary embodiment of the present disclosure
  • FIG. 4 is a schematic diagram illustrating the principle of epipolar geometry according to an exemplary embodiment of the present disclosure
  • FIG. 5 is a schematic diagram illustrating the principle of binocular vision according to an exemplary embodiment of the present disclosure
  • FIG. 6 is a schematic diagram showing the position and geometric relationship of a 3D guide line, a binocular camera, and an object to be recognized according to an exemplary embodiment of the present disclosure
  • FIG. 7 is a schematic diagram illustrating a plurality of curved straight generatrices determined according to an exemplary embodiment of the present disclosure
  • FIG. 8 is a schematic diagram illustrating a plurality of first surface sampling points of an object to be identified determined according to an exemplary embodiment of the present disclosure
  • FIG. 9 is a schematic diagram illustrating a final rectified image according to an exemplary embodiment of the present disclosure.
  • FIG. 10 is a block diagram illustrating an example computing device that can be applied to example embodiments.
  • first, second, etc. to describe various elements is not intended to limit the positional relationship, timing relationship or importance relationship of these elements, and such terms are only used for Distinguish one element from another.
  • first element and the second element may refer to the same instance of the element, while in some cases they may refer to different instances based on the context of the description.
  • Readings such as books or magazines usually have a certain layout, for example, the content is divided into different paragraphs (for example, including upper and lower paragraphs and left and right columns, etc.).
  • paragraphs for example, including upper and lower paragraphs and left and right columns, etc.
  • people visually capture images in their field of vision, and use their brains to divide the text in the images into paragraphs.
  • Such paragraph division may be used, for example, in an application that converts a paper book into an electronic book, or an application that converts text in an image into a sound signal and outputs the sound signal.
  • paragraph division refers to dividing text in an image into different paragraphs. Paragraph divisions up and down can also be referred to as segments, and left and right paragraph divisions can also be referred to as columns.
  • a text line refers to a sequence of characters whose adjacent character spacing is less than a threshold spacing, that is, a continuous line of text.
  • the spacing between adjacent characters refers to the distance between the coordinates of the corresponding positions of adjacent characters, such as the distance between the coordinates of the upper left corner, the coordinates of the lower right corner, or the coordinates of the centroid of adjacent characters. If the spacing between adjacent characters is not greater than the threshold spacing, the adjacent characters may be considered to be continuous, so that they are divided into the same text line. If the spacing between adjacent characters is greater than the threshold spacing, the adjacent characters may be considered to be discontinuous (for example, they may belong to different paragraphs or to the left and right columns respectively), so that they are divided into different text lines.
  • the threshold spacing can be set according to the text size, for example, the threshold spacing set for adjacent characters whose font size is larger than size 4 (such as size 3 and size 2) is greater than the threshold spacing set for adjacent characters whose font size is smaller than size 4 (such as size 4 and size 5). Threshold spacing for adjacent text settings.
  • the image before text recognition is performed on a reading material such as a book or a magazine, the image may be corrected for curvature to overcome the problem that the accuracy of the text recognition is affected by the curvature of the reading material.
  • the specific process of the image bending correction may be as follows: using a correction algorithm to flatten and interpolate the curved surface to solve the image bending problem. This method has a better flattening effect for vertical shooting scenes.
  • the correction algorithm itself due to the limitation of the correction algorithm itself, in the oblique shooting scene, due to the oblique perspective, the curved surface cannot be flattened, and even more distorted processing results can be obtained.
  • the present disclosure provides a correction method for a text image of an oblique shooting scene.
  • the object to be recognized is rotated to a rotation matrix corresponding to being perpendicular to the optical axis of the first camera in the binocular camera and a 3D image of the object to be recognized is determined.
  • a 3D rotation correction image corresponding to the rotation of the object to be recognized to be perpendicular to the optical axis of the first camera can be obtained.
  • the final rectified image can be obtained.
  • the text line in the 3D rotation-corrected image is a straight line. Therefore, flattening and correcting the 3D rotation-corrected image can achieve a good flattening effect, thereby ensuring that the text of the object to be recognized is displayed. recognition accuracy.
  • the object to be recognized may refer to a picture or a current page of a text to be recognized of a reading, or the like.
  • FIG. 1 is a flowchart illustrating a correction method of a text image according to an exemplary embodiment of the present disclosure.
  • the correction method may include: step S101 , acquiring an initial image including the object to be recognized obtained by oblique shooting of a binocular camera, the binocular camera includes a first camera and a second camera, the first camera and the second camera two The optical axis of the user is not perpendicular to the placement surface of the object to be recognized, and the initial image includes the first initial image including the object to be recognized obtained by tilting the first camera and the second initial image including the object to be recognized obtained by tilting the second camera.
  • Step S102 determine that the object to be recognized is rotated around a set point on the object to be recognized to the corresponding rotation matrix perpendicular to the optical axis of the first camera;
  • Step S103 based on the first initial image and the second initial image, determine that include The 3D image of the object to be recognized;
  • Step S104 use the rotation matrix to obtain the rotation correction image obtained by rotating the 3D image around the set point to be perpendicular to the optical axis of the first camera;
  • Step S105 flatten and correct the rotation correction image, Get the final rectified image. Therefore, since there is no oblique perspective problem in the 3D rotation-corrected image, the text line in the 3D rotation-corrected image is a straight line. Therefore, flattening and correcting the 3D rotation-corrected image can achieve a good flattening effect, which can ensure the recognition The accuracy of the object's text recognition.
  • the binocular camera may be a stand-alone device (eg, a binocular camera, a binocular video camera, a binocular camera, etc.), or may be included in various electronic devices (eg, mobile phones, computers, personal digital assistants, reading aids, etc.) devices, tablets, wearables, etc.).
  • a stand-alone device eg, a binocular camera, a binocular video camera, a binocular camera, etc.
  • various electronic devices eg, mobile phones, computers, personal digital assistants, reading aids, etc.
  • the binocular camera can be set on the user's wearable device or glasses, etc., so that the first initial image and the second initial image can be captured by the binocular camera and of the reading held in the user's hand image.
  • the object to be recognized may include words (including words, numbers, characters, punctuation marks, etc. of various countries), pictures, and the like.
  • the object to be recognized may be, for example, the current page of the text to be recognized of a reading such as a passport, a driver's license, a book, a magazine, etc., including a text area.
  • the text area corresponds to the area where the text is located.
  • the placement surface of the object to be recognized is the placement surface of the reading material.
  • the optical axes of the first camera and the second camera in the binocular camera may be arranged in parallel.
  • Both the first initial image and the second initial image captured by the first camera and the second camera may include a complete object to be recognized, so that the entire object to be recognized can be flattened to facilitate subsequent processing, such as character recognition.
  • the first initial image and the second initial image obtained by the binocular camera may also be images that have undergone some preprocessing.
  • the preprocessing may include, but is not limited to, at least one of the following processes: distortion Correction, binocular correction, grayscale processing and blur removal.
  • Image distortion can include radial distortion and tangential distortion, where radial distortion occurs because light rays are more curved farther from the center of the camera than near the center. Tangential distortion is caused by defects in camera manufacturing that make the camera itself not parallel to the image plane. According to some embodiments, distortion correction can be performed on the first initial image and the second initial image, so that the distortion caused by the camera lens factor can be eliminated.
  • the distortion correction performed on the first initial image and the second initial image respectively may be, performing distortion correction on each pixel in the first initial image and the second initial image.
  • the distortion correction formula may be:
  • is the balance factor of radial distortion and tangential distortion
  • k 1 , k 2 , k 3 , p 1 and p 2 are the distortion parameters of the camera.
  • the related distortion parameters of the first camera and the second camera of the binocular camera in the present disclosure may be different, which reduces the requirement for device accuracy, and by correcting the related distortion parameters of the first camera and the second camera respectively, Distortion caused by the camera lens of the first camera and the second camera can be eliminated.
  • binocular correction may be performed on the first initial image and the second initial image.
  • the same point in the three-dimensional space can be projected onto the same horizontal scan line corresponding to the position in the first initial image and the second initial image, so as to facilitate the realization of the corresponding pixel points in the subsequent first initial image and the second initial image. match.
  • distortion correction may be performed on the first initial image and the second initial image respectively, and then binocular correction may be performed on the distortion-corrected first initial image and the second initial image, so that the curved surface can be further improved. corrective effect.
  • the curved shape of the object to be recognized may be substantially the same from one side of the object to be recognized to the opposite side. It can be understood that the technical solutions of the present disclosure are also applicable to scenes where the curved shapes of the objects to be recognized are different from one side of the object to be recognized to the opposite side.
  • the first camera 101 included in the binocular camera and the second camera 102 can be disposed on the side where the side of the object to be recognized 100 is located, thereby facilitating the determination of a 3D guideline that can characterize the curved shape of the object to be recognized.
  • the specific principle will be described in the following content. It should be noted that the object to be recognized 100 shown in FIG. 2 is a plane, which is only for convenience of illustration, and actually the object to be recognized 100 is a curved surface.
  • a 3D image including an object to be recognized can be determined based on binocular vision, and when the tilt angle of the binocular camera is unknown, the object to be recognized can also be determined based on binocular vision to surround the object to be recognized.
  • a set point on is rotated to a rotation matrix corresponding to being perpendicular to the optical axis of the first camera.
  • the principle of binocular vision may be as follows: the separation distance on the x-axis between the first optical center O1 of the first camera and the second optical center Or of the second camera is T.
  • the two line segments with lengths L l and L r in FIG. 3 respectively represent the image plane of the first camera and the image plane of the second camera, and the shortest distances from the first optical center O l and the second optical center Or to the corresponding image plane
  • the distances are the focal lengths fl and fr , respectively.
  • the first camera captures a first initial image including the object to be recognized
  • the second camera captures a second initial image including the object to be recognized.
  • a point P in the three-dimensional space its imaging point on the first camera (which can correspond to One pixel) is PL, and the imaging point (which can correspond to one pixel) on the second camera is PR.
  • the distances of PL and PR from the left edge of the respective image plane are xl and xr , respectively.
  • the imaging parallax of the point P in the first camera and the second camera can be defined as x l -x r or x r -x l .
  • the vertical distance Zc (ie the depth) between the point P and the straight line determined by the first optical center O1 and the second optical center Or can be calculated.
  • determining the rotation matrix may include: based on the first An initial image and a second initial image, as well as internal parameters of the binocular camera, determine at least two 3D alignment lines of the object to be recognized; and determine a rotation matrix based on the at least two 3D alignment lines and a set point. Therefore, when the tilt angle of the first camera is unknown, the rotation matrix can be determined based on the 3D directrix, and it is not necessary to determine the depth of the entire first initial image, which effectively reduces the amount of calculation.
  • the following content will describe how to determine at least two 3D alignment lines of the object to be recognized based on the first initial image and the second initial image and the internal parameters of the binocular camera.
  • determining the at least two 3D guide lines may include: determining at least two first pixel bands in the first initial image; determining at least two first pixel bands in the second initial image corresponding to respective positions of the at least two first pixel bands A second pixel band; based on the first pixel band and the second pixel band corresponding to the position, and the internal parameters of the binocular camera, determine the corresponding depth information of the first pixel band and the second pixel band corresponding to the position; The first pixel strip and the second pixel strip, and the corresponding depth information, determine the 3D guideline. Therefore, the 3D alignment can be determined by the first pixel band and the second pixel band corresponding to the position, which avoids the pixel point matching of the entire image range for the first initial image and the second initial image, and reduces the amount of calculation.
  • each pixel in the first pixel band its matching pixel in the second initial image may be determined. It can be determined that a pixel band determined by a plurality of pixel points in the second initial image that respectively match all the pixel points in the first pixel band is the second pixel band corresponding to the position of the first pixel band.
  • the matching efficiency of the corresponding pixel bands in the first initial image and the second initial image can be improved based on the principle of epipolar geometry.
  • the principle of epipolar geometry can be understood as: the first optical center of the first camera is O1 and the second optical center of the second camera is Or, for point M in three-dimensional space, it is in the first initial image
  • the projected pixel points corresponding to the second initial image must be located on the epipolar plane MO1Or determined by the point M, the first optical center O1 and the second optical center Or.
  • the imaging point of point M on the first initial image is M1
  • the imaging point on the second initial image is Mr
  • the epipolar plane MO10 intersects with the first initial image in the first initial image
  • the epipolar plane MO1Or intersects the second initial image at the second epipolar line Lr passing through the point Mr in the second initial image.
  • the pixel point M l on the first initial image is known, and the corresponding pixel point M r of the pixel point M l on the second initial image can be determined based on the principle of epipolar geometry.
  • the second polar line L r of is determined on the second polar line L r of , and based on the pixel point M l and the fundamental matrix F.
  • the specific calculation formula is:
  • F represents a fundamental matrix
  • the fundamental matrix F can be determined according to the internal parameters of the first camera, the internal parameters of the second camera, and the external parameters between the first camera and the second camera.
  • the constraint relationship between the corresponding imaging points in the first initial image and the second initial image for the same point in the three-dimensional space is established. Therefore, by simplifying the matching process of the corresponding pixels in the first initial image and the second initial image from a search in a two-dimensional image space to a one-dimensional search within the corresponding epipolar line, the efficiency and accuracy of matching can be improved .
  • the epipolar line corresponds to the position of the edge of the first pixel strip.
  • Another epipolar line corresponding to the position of the other long side of the first pixel band in the second initial image can be determined, so that the epipolar line corresponding to the position of the first pixel band in the second initial image can be determined based on the two epipolar lines.
  • the above-mentioned pixel point matching based on the principle of epipolar geometry may be implemented by OpenCV, Matlab or other software products, which is not limited herein.
  • determining the at least two second pixel bands corresponding to the respective positions of the at least two first pixel bands in the second initial image is not limited to the above-mentioned one manner, for example, a neural network can also be used.
  • the matching pixel points corresponding to the positions of all the pixels in the first pixel band in the second initial image can be determined through a neural network, and the pixel band determined by all the matching pixel points is the second pixel band corresponding to the position of the first pixel band. Pixel strips.
  • the corresponding pixel point matching method using the neural network can be understood as matching the known first pixel point feature in the first initial image with the possible first pixel point in the second initial image
  • the features of the second pixel point are input to the neural network.
  • the second pixel point that may match the first pixel point may be determined within a limited range in the second initial image, or may be determined within the entire range in the second initial image.
  • the neural network may output an output result for determining the degree of matching between the first pixel point and the second pixel point. By comparing the matching degree of each second pixel point that may match the first pixel point with the first pixel point, the corresponding pixel point matching the first pixel point in the second initial image can be determined.
  • the pixel points can be fitted to obtain a line corresponding to the edge position in the second initial image.
  • another line in the second initial image corresponding to the position of the other edge (eg, another long edge) of the first pixel strip can be determined, so that it can be determined based on these two lines that the second initial image is related to the first pixel strip.
  • the corresponding point matching method using a neural network can be implemented by a binocular matching neural network obtained by training, wherein the binocular matching neural network can include at least one of the following networks: CNN (Convolutional Neural Networks, Convolutional Neural Network), DNN (Deep Neural Network, Deep Neural Network) or RNN (Recurrent Neural Network, Recurrent Neural Network), etc.
  • the binocular matching network may include one of the networks such as the CNN, DNN, and RNN, or at least two of the networks such as the CNN, DNN, and RNN.
  • the method for determining the second pixel band by using the neural network is not limited to the above method, and is not limited here.
  • the method for determining the at least two second pixel strips corresponding to the respective positions of the at least two first pixel strips in the second initial image is not limited to the above two methods, and other methods may also be used, which are not limited here.
  • the at least two pixel strips determined in the first initial image may be parallel to each other, thereby reducing the amount of computation. According to other embodiments, the at least two pixel strips determined in the first initial image may also be non-parallel to each other, which is not limited herein.
  • depth information corresponding to the first pixel band and the second pixel band corresponding to the position may be determined according to the principle of binocular vision.
  • the specific principle of binocular vision has been described above, and will not be repeated here.
  • two 3D alignment lines of the object to be recognized can be further determined.
  • the determined at least two 3D guide lines may be parallel to each other. According to other embodiments, two or more of the determined at least two 3D guide lines may not be parallel.
  • two first pixel bands in the first initial image may be determined, namely a first pixel band 301 and another first pixel band 302 .
  • two second pixel strips corresponding to respective positions of the two first pixel strips are determined in the second initial image.
  • the depth information corresponding to the first pixel band and the second pixel band corresponding to the position can be determined.
  • the 3D guideline 1020 and the 3D guideline 1030 may be determined in the first pixel band 301 and the first pixel band 302 , respectively. Thereby, two 3D guidelines can be quickly determined based on the two first pixel strips and the corresponding two second pixel strips.
  • the 3D guideline 1020 and the 3D guideline 1030 shown in FIG. 5 include depth information.
  • the 3D guideline 1020 and the 3D guideline 1030 in FIG. 5 are straight lines only for the convenience of illustration. In fact, the 3D guideline 1020 and 3D directrix 1030 is a curve.
  • the two 3D guide lines 1020 and 1030 illustrated in FIG. 5 are parallel to each other. It should be noted that the two 3D alignment lines may not be parallel. It can be understood that it is also possible to determine three or more first pixel bands in the first initial image, and three or more second pixel bands corresponding to the respective positions of the three or more first pixel bands in the second initial image, Therefore, three or more 3D alignment lines can be determined, which is not limited here.
  • the 3D guideline may be determined based on the central axis of the first pixel strip and its depth information corresponding to the position and the central axis of the second pixel strip and its depth information. Thereby, the 3D guideline can be determined simply and quickly.
  • the method for determining the 3D guideline may be: determining the coordinate values of the pixel points in the first pixel band and the second pixel band corresponding to the position.
  • the result obtained after fitting the three-dimensional coordinates of multiple sampling points is determined as the 3D alignment line.
  • the width of the first pixel strip and the width of the second pixel strip may each be less than half the width of the first initial image. In this way, at least two first pixel strips can be achieved not to overlap, thereby ensuring that the 3D alignment lines determined in different first pixel strips do not cross each other. In addition, the determined at least two 3D alignment lines can also represent the curved shape of the object to be recognized, thereby improving the subsequent curvature correction effect.
  • At least two first pixel strips may be distributed on both sides of the central axis of the first initial image. Therefore, by determining the 3D alignment lines in the regions on both sides of the central axis in the first initial image, the determined at least two 3D alignment lines can represent the curved shape of the regions on both sides of the central axis of the object to be recognized, and further It can improve the follow-up bending correction effect.
  • one 3D guideline may be determined in each first pixel band in the first initial image.
  • two or more 3D alignment lines may also be determined in each of the first pixel bands in the first initial image.
  • At least two 3D guide lines may extend along the bending direction of the object to be recognized. Therefore, the curved shape of the object to be recognized can be characterized based on the 3D guideline, and the subsequent curvature correction effect can be improved.
  • the at least two 3D guide lines may be substantially parallel to the long sides of the first pixel strip.
  • the present disclosure does not limit the number, distribution, shape and mutual relationship of the determined at least two 3D directional lines, as long as the determined 3D directional lines can represent the curved shape of the object to be recognized, both can be achieved.
  • the technical solutions of the present disclosure are not limited here.
  • the pixel points corresponding to the positions of the pixels in the first initial image in the second initial image can also be determined based on the entire first initial image and the entire second initial image;
  • the corresponding pixel points and the internal parameters of the binocular camera are used to determine the depth information of the pixel points corresponding to the positions; and based on the pixel points corresponding to the positions and the corresponding depth information, at least two 3D guidelines are determined.
  • the pixel points corresponding to the positions in the first initial image and the second initial image can be determined based on one of the epipolar geometry principle, neural network and other methods.
  • the depth information of the pixels corresponding to the positions in the first initial image and the second initial image can be determined based on the principle of binocular vision, and the specific implementation method can refer to the above content, which will not be repeated here.
  • the depth information of the entire object to be recognized can also be calculated first, and then a line containing the depth information is determined based on the entire first initial image and the second initial image as the 3D guideline.
  • the at least two 3D guide lines are determined, in the case where the inclination angle of the binocular camera is unknown, it may be determined based on the at least two 3D guide lines that the object to be recognized is rotated around a set point on the object to be recognized to be in line with the first camera The rotation matrix corresponding to the vertical axis of the optical axis.
  • determining the rotation matrix based on the at least two 3D guidelines and the set point includes: calculating an average depth of each of the at least two 3D guidelines; and based on the average depth of the at least two 3D guidelines and the set point Fixed point, determine the rotation matrix. Thereby, the calculation can be simplified.
  • the set point may be the intersection of a straight line parallel to the optical axis of the first camera and the object to be recognized, and the midpoint of the line connecting the optical centers of the first camera and the second camera is located at the intersection with the first camera
  • the optical axis is parallel to the straight line (for convenience of description, the straight line is defined as a mid-perpendicular line), so that the calculation for determining the rotation matrix can be simplified.
  • set point can also be other specific points on the object to be recognized, which is not limited herein.
  • the specific principle of determining the rotation matrix may be as follows:
  • h 1 is the average depth of the first directrix 1020
  • h 2 is the average depth of the second directrix 1030
  • d 1 is the distance between the depth direction of the first directrix 1020 and the midpoint straight line
  • d 2 is the distance between the depth direction of the second directrix 1030 and the midpoint straight line.
  • d 1 and d 2 may be determined based on the determined positional relationship between the 3D guideline and the binocular camera.
  • h 0 is the depth of the set point
  • the rotation matrix R is:
  • the rotation matrix corresponding to the rotation of the object to be recognized around a set point on the object to be recognized to be perpendicular to the optical axis of the first camera can be obtained.
  • the rotation matrix R can be directly calculated.
  • the depth of the intersection point Q in the camera coordinate system can be calculated according to the principle of binocular vision, and the specific method has been described in the above content.
  • step S103 may be performed to determine a 3D image including the object to be recognized based on the first initial image and the second initial image. It should be noted that the present disclosure does not limit the execution order of step S102 and step S103, and step S102 and step S103 may also be executed synchronously.
  • step S103 may include: based on at least two 3D directrixes, determining a plurality of curved surface straight generatrixes; and based on at least the plurality of curved surface straight generatrixes and at least two 3D directrix lines, determining a plurality of first curved surface sampling points and Three-dimensional coordinates of each first surface sampling point, wherein the 3D image is represented by the plurality of first surface sampling points.
  • a plurality of first curved surface sampling points of the 3D image of the object to be recognized can be determined based on the curved surface straight generatrix and at least two 3D directrix lines, and the 3D image of the to-be-identified object can be represented by the plurality of first curved surface sampling points, Therefore, the steps of determining the 3D image can be simplified and the amount of calculation can be reduced.
  • a plurality of determined curved straight generatrixes 201 are illustrated.
  • a plurality of determined first surface sampling points 202 are illustrated.
  • the 3D guideline 1020 and the 3D guideline 1030 shown in FIG. 7 and FIG. 8 contain depth information, and the 3D guideline 1020 and the 3D guideline 1030 are straight lines only for convenience of illustration. In fact, the 3D guideline 1020 and 3D directrix 1030 is a curve.
  • the curved surface can be reconstructed by the trajectory swept by the straight generatrix of the curved surface, and the straight generatrix exists on the curved surface and is a straight line. Therefore, in the above step S103, a 3D image of the object to be recognized can be obtained by fitting a plurality of first curved surface sampling points.
  • step S103 the 3D guideline determined in step S102 can be directly used to reduce the amount of calculation. It can be understood that, when step S103 is performed first, step S102 can also directly use the 3D alignment line determined in step S103.
  • step S104 may be performed, using the rotation matrix, A rotation-corrected image obtained by rotating the 3D image around the set point to be perpendicular to the optical axis of the first camera.
  • step S104 may include: using a rotation matrix to determine the relative setting of each of the plurality of first surface sampling points based on the depth of each first surface sampling point in the plurality of first surface sampling points and the depth of the set point.
  • the rotation-corrected image may be represented by a plurality of second surface sample points.
  • a 3D rotation-corrected image of the object to be recognized can be obtained by fitting through a plurality of second curved surface sampling points.
  • the depth of the set point can be obtained using the method described above.
  • the first surface sampling point is determined based on the straight generatrix of the surface and the 3D directrix, and its depth can also be obtained based on the straight generatrix of the surface and the 3D directrix.
  • the 3D image of the object to be recognized is determined by determining a plurality of curved surface straight generatrixes of the object to be recognized, and based on the determined plurality of curved surface straight generatrix lines and at least two of the 3D alignment lines, and then the rotation matrix can be used to determine the 3D image of the object to be recognized. and the 3D image to determine the 3D rotation correction image obtained by rotating the object to be recognized to be perpendicular to the optical axis of the first camera, so as to overcome the oblique perspective problem existing in the oblique shooting scene.
  • the at least two 3D guidelines may include a first 3D guideline and a second 3D guideline.
  • the following will take the first 3D directrix and the second 3D directrix as examples to specifically describe how to determine the straight generatrix of the curved surface based on the two 3D directrix. It can be understood that, for any two of the at least two 3D directrix, the following method can be used to determine the corresponding straight generatrix of the curved surface.
  • determining the plurality of curved straight generatrixes may include: sampling the first 3D directrix to obtain n first discrete points; sampling the second 3D directrix to obtain N second discrete points, wherein, n and N are positive integers, and N>n; for each of the n first discrete points, one of the second discrete points is determined from the N second sampling points according to a preset rule as the first discrete point an optimal corresponding discrete point, wherein the normal vector of the first 3D directrix passing through the first discrete point is consistent with the normal vector of the second 3D directrix passing through the optimal corresponding discrete point; and based on the n first discrete points and the corresponding optimal corresponding discrete points to determine the plurality of curved surface straight generatrixes.
  • determining one of the second discrete points as the optimal corresponding discrete point of the first discrete point from the N second sampling points according to a preset rule may include: : determine the optimal corresponding range of the second 3D directrix corresponding to the discrete point, and the optimal corresponding range includes at least one second discrete point among the N second discrete points; calculate the first discrete point and the corresponding optimal corresponding range A cost function between each second discrete point in the corresponding range; based on the cost function, one of the second discrete points is determined from the corresponding optimal corresponding range as the optimal corresponding discrete point corresponding to the first discrete point.
  • both the first discrete point and the second discrete point can be, but are not limited to, discrete points with arc length as a parameter, so that both the first 3D directrix and the second 3D directrix can be expressed in arc length is a set of 3D discrete points for parameters.
  • An arc length parameter value can correspond to a unique 3D coordinate on the 3D guideline.
  • the arc length parameter formula C 0 (t) of the first 3D directional line may be determined based on the pixel coordinates of the first 3D directional line.
  • the specific conversion method is in the prior art and will not be described in detail here.
  • the arc length parameter formula C 1 (s) of the second 3D guideline can be determined based on the pixel coordinates of the second 3D guideline.
  • the cost function used may be:
  • the method of determining the straight generatrix of the curved surface is not limited to the aforementioned one.
  • the tangent normal vector and the tangent vector corresponding to each of the n first discrete point sets of the first 3D directrix may also be calculated separately, and
  • the tangent plane normal vector and tangent vector corresponding to the N second discrete point sets of the second 3D directrix can be used to determine the straight generatrix equation of the curved surface by using the principle of consistency of normal vectors on the same straight generatrix.
  • two discrete points with equal normal vectors between two 3D directrixes are candidate optimal corresponding points, and the line connecting these two discrete points is a candidate straight generatrix. Therefore, the straight generatrix of the surface can be calculated by the similarity of the normal vectors between the discrete points and the change speed of the discrete points.
  • step S105, performing flattening and correction on the rotation-corrected image to obtain a final corrected image may include: orthographically projecting the rotation-corrected image on the image plane of the first camera to obtain a mapping image; and performing a mapping on the mapping image. Interpolate to get the final rectified image. Because there is no oblique perspective problem in the 3D rotation-corrected image, the text line is a straight line. Therefore, by performing orthographic projection and adjusting the spacing of the 3D rotation-corrected image, the purpose of flattening the curved surface can be achieved, thereby ensuring the text recognition of the object to be recognized. accuracy. In the example shown in FIG.
  • interpolating the mapped image to obtain a final rectified image may include: for the mapped image, calculating a 3D distance between two adjacent pixels along a preset direction; and based on the 3D distance, mapping the image along a preset direction The image is interpolated to obtain the final rectified image. Due to the rotation and orthographic projection, the uniformly curved surface only has bulges or depressions in the Xc0cZc plane. Therefore, interpolation can be performed along the Xc coordinate axis to adjust the distance between pixels, thereby performing flattening correction, which has the advantages of easy implementation and a small amount of calculation.
  • the 3D coordinates of a line in the middle parallel to the Xc coordinate axis may be obtained, and the 3D distance of adjacent pixel points may be calculated as a new distance between two pixels.
  • 2D grid interpolation can be performed with the new spacing, resulting in an interpolated image (ie, the final rectified image).
  • the 2D lattice can be done using linear interpolation: (1-a)*P1+a*P2.
  • P1 and P2 represent two adjacent 2D discrete coordinates and pixel values, and a is the distance between the pixel point (integer grid point) to be inserted and P1, accounting for the proportion of the distance between P1 and P2.
  • other interpolation methods can also be used to perform image interpolation, for example, nearest neighbor interpolation, bi-square interpolation, bi-cubic interpolation, etc., which are not limited herein.
  • the present disclosure can be used to solve the problem of consistent bending of book pages under oblique photography.
  • This uniform bending is a typical scene in the shooting of text carriers such as books.
  • existing flattening algorithms cannot effectively flatten in oblique perspective shooting.
  • the reason why the present disclosure can be effectively flattened in oblique perspective shooting is: when looking down from right above each character of the unfolded book, the text line is a straight line, so by rotating the object to be recognized to the optical axis of the first camera Vertical and orthographic projection to the image plane of the first camera, and then adjust the spacing between pixels to achieve the purpose of flattening;
  • the existing flattening algorithm determines the straight generatrix of the surface, it is necessary to set the slope interval of the function corresponding to the arc length.
  • the slope range of the function corresponding to the arc length on the shortest path is very large.
  • the amount of calculation is very large.
  • the arc length correspondence is converted into a subscript correspondence, the search range is adaptively set according to the current state, and the calculation amount is small.
  • the present disclosure does not need to flatten the curved surface, and because of the problem of oblique perspective, the existing curved surface flattening algorithm cannot realize the flattening of the curved surface.
  • the present disclosure adopts the method of rotating multiple curved surface samples of the object to be recognized around a set point to be perpendicular to the optical axis of the first camera, and then projecting to the image plane, which requires less computation and can solve the problem of oblique perspective.
  • an electronic circuit comprising: a circuit configured to perform the steps according to the method of rectification of a text image as described above.
  • a text image correction device comprising: a binocular camera configured to obliquely capture an initial image including an object to be recognized, the binocular camera comprising a first camera and a second camera, the first camera The optical axes of both the camera and the second camera are not perpendicular to the placement surface of the object to be recognized, the initial image includes a first initial image of the object to be recognized and a second initial image including the object to be recognized, and the first camera is configured to shoot obliquely the first initial image, the second camera configured to take the second initial image obliquely; and the electronic circuit as above.
  • the correction device may further include: a bracket 200 and a flat plate 300 .
  • the object 100 to be recognized is placed on the tablet 300 , and both the first camera 101 and the second camera 102 of the binocular camera are fixedly assembled on the bracket 200 .
  • the curved shape of the object to be recognized may be substantially the same from one side of the object to be recognized to the opposite side.
  • the binocular camera may be disposed on the side where one side of the object to be recognized is located, so that at least two 3D alignment lines determined by the first initial image and the second initial image can represent the curved shape of the object to be recognized.
  • an electronic device comprising: a processor; and a memory storing a program, the program including instructions that, when executed by the processor, cause the processor to perform the remedial method according to the above.
  • a non-transitory computer-readable storage medium storing a program, the program comprising instructions that, when executed by a processor of an electronic device, cause the electronic device to perform the above-described Correction method.
  • FIG. 10 is a block diagram illustrating an example of an electronic device according to an exemplary embodiment of the present disclosure. It should be noted that the structure shown in FIG. 10 is only an example, and according to specific implementations, the electronic device of the present disclosure may only include one or more of the components shown in FIG. 10 .
  • the electronic device 2000 may be, for example, a general purpose computer (eg, a laptop computer, a tablet computer, etc. various computers), a mobile phone, a personal digital assistant. According to some embodiments, the electronic device 2000 may be a visually impaired assistive device.
  • the electronic device 2000 may be configured to capture images, process the captured images, and provide audio prompts in response to data obtained from the processing.
  • the electronic device 2000 may be configured to capture an image, perform text detection and/or recognition on the image to obtain text data, convert the text data to sound data, and output the sound data for listening by a user.
  • the electronic device 2000 may be configured to include an eyeglass frame or be configured to be removably mountable to an eyeglass frame (eg, a frame of an eyeglass frame, a link connecting two frames, a temple, or any other portion) ), so that an image that approximately includes the user's field of view can be captured.
  • an eyeglass frame eg, a frame of an eyeglass frame, a link connecting two frames, a temple, or any other portion
  • the electronic device 2000 can also be installed on other wearable devices, or integrated with other wearable devices.
  • the wearable device may be, for example, a head-mounted device (such as a helmet or a hat, etc.), a device that can be worn on the ear, and the like.
  • the electronic device may be implemented as an accessory attachable to a wearable device, eg as an accessory attachable to a helmet or hat, or the like.
  • the electronic device 2000 may also have other forms.
  • electronic device 2000 may be a mobile phone, a general purpose computing device (eg, a laptop computer, tablet computer, etc.), a personal digital assistant, or the like.
  • the electronic device 2000 may also have a base so that it can be placed on a desktop.
  • the electronic device 2000 may be used to assist reading as a visually impaired aid, in which case the electronic device 2000 is also sometimes referred to as an "electronic reader” or a “reading aid".
  • the electronic device 2000 With the aid of the electronic device 2000, users who cannot read independently (eg, visually impaired persons, persons with dyslexia, etc.) can "read” regular reading materials (eg, books, magazines, etc.) by adopting a posture similar to a reading posture.
  • the electronic device 2000 may capture an image to obtain an initial image including the object to be recognized.
  • the electronic device 2000 may also perform curvature correction on the initial image to obtain a final corrected image, and then perform layout analysis, text detection and text recognition on the text in the text area in the final corrected image (for example, using optical character recognition OCR). method) to obtain text data, overcome the influence of text bending on the recognition of text in the object to be recognized, and improve text recognition efficiency and accuracy. Then, the text data can be converted into sound data, and the sound data can be output through a sound output device such as a speaker or an earphone for the user to listen to.
  • a sound output device such as a speaker or an earphone for the user to listen to.
  • the electronic device 2000 may include a first camera 101 and a second camera 102 for acquiring images.
  • the first camera 101 and the second camera 102 may include, but are not limited to, cameras or cameras, etc., and are configured to acquire an initial image including an object to be recognized.
  • the electronic device 2000 may also include an electronic circuit 2100 that includes circuitry configured to perform the steps of the method as previously described (eg, the method steps shown in the flowcharts of FIGS. 1 and 3 ).
  • the electronic device 2100 may further include a character recognition circuit 2005, which is configured to perform character detection and/or recognition (eg, OCR processing) on characters in the character area of the object to be recognized in the initial image, so as to obtain characters data.
  • the character recognition circuit 2005 can be implemented by, for example, a dedicated chip.
  • the electronic device 2000 may also include a sound conversion circuit 2006 configured to convert the text data into sound data.
  • the sound conversion circuit 2006 can be implemented by, for example, a dedicated chip.
  • the electronic device 2000 may further include a sound output circuit 2007 configured to output the sound data.
  • the sound output circuit 2007 may include, but is not limited to, an earphone, a speaker, or a vibrator, etc., and a corresponding driving circuit thereof.
  • the electronic device 2000 may also include an image processing circuit 2008, which may include circuits configured to perform various image processing on images.
  • the image processing circuitry 2008 may include, for example, but is not limited to, one or more of the following: circuitry configured to denoise the image, circuitry configured to deblur the image, circuitry configured to geometrically correct the image A circuit, a circuit configured to perform feature extraction on an image, a circuit configured to perform object detection and/or recognition of objects in an image, a circuit configured to perform text detection on text contained in an image, a circuit configured to perform text detection from Circuitry for extracting text lines from an image, Circuitry configured to extract text coordinates from an image, Circuitry configured to extract object boxes from an image, Circuitry configured to extract text boxes from an image, Circuitry configured to extract text boxes from an image, Circuits for layout analysis (eg paragraph division), etc.
  • the electronic circuit 2100 may also include a word processing circuit 2009, which may be configured to be based on the extracted text-related information (eg, text data, text boxes, paragraph coordinates, text line coordinates, Various processing are performed to obtain processing results such as paragraph sorting, text semantic analysis, and layout analysis results.
  • a word processing circuit 2009 may be configured to be based on the extracted text-related information (eg, text data, text boxes, paragraph coordinates, text line coordinates, Various processing are performed to obtain processing results such as paragraph sorting, text semantic analysis, and layout analysis results.
  • One or more of the various circuits described above may use custom hardware, and/or may Realize with hardware, software, firmware, middleware, microcode, hardware description language or any combination thereof.
  • one or more of the various circuits described above can be implemented in assembly language by using logic and algorithms according to the present disclosure or hardware programming languages (such as VERILOG, VHDL, C++) to program hardware (eg, programmable logic circuits including Field Programmable Gate Arrays (FPGA) and/or Programmable Logic Arrays (PLA)).
  • FPGA Field Programmable Gate Arrays
  • PDA Programmable Logic Arrays
  • electronic device 2000 may also include communication circuitry 2010, which may be any type of device or system that enables communication with external devices and/or with a network, and may include, but is not limited to, modems, network cards , infrared communication devices, wireless communication devices and/or chipsets such as Bluetooth devices, 1302.11 devices, WiFi devices, WiMax devices, cellular communication devices and/or the like.
  • communication circuitry 2010, may be any type of device or system that enables communication with external devices and/or with a network, and may include, but is not limited to, modems, network cards , infrared communication devices, wireless communication devices and/or chipsets such as Bluetooth devices, 1302.11 devices, WiFi devices, WiMax devices, cellular communication devices and/or the like.
  • the electronic device 2000 may further include an input device 2011, which may be any type of device capable of inputting information to the electronic device 2000, and may include, but is not limited to, various sensors, a mouse, a keyboard, a touch screen , buttons, joysticks, microphones and/or remote controls, etc.
  • the electronic device 2000 may also include an output device 2012, which may be any type of device capable of presenting information, and may include, but is not limited to, a display, a visual output terminal, a vibrator, and/or a printer, etc. .
  • an output device 2012 may be any type of device capable of presenting information, and may include, but is not limited to, a display, a visual output terminal, a vibrator, and/or a printer, etc.
  • a vision-based output device may facilitate the user's family members or maintenance workers, etc. to obtain output information from the electronic device 2000 .
  • the electronic device 2000 may also include a processor 2001 .
  • the processor 2001 may be any type of processor, and may include, but is not limited to, one or more general-purpose processors and/or one or more special-purpose processors (eg, special processing chips).
  • the processor 2001 may be, for example, but not limited to, a central processing unit CPU or a microprocessor MPU, or the like.
  • the electronic device 2000 may also include a working memory 2002, which may store programs (including instructions) and/or data (eg, images, text, sounds, and other intermediate data, etc.) useful for the operation of the processor 2001.
  • memory and may include, but is not limited to, random access memory and/or read only memory devices.
  • the electronic device 2000 may also include a storage device 2003, which may include any non-transitory storage device, which may be any storage device that is non-transitory and that enables data storage, and may include, but is not limited to Disk drives, optical storage devices, solid-state memory, floppy disks, flexible disks, hard disks, magnetic tapes or any other magnetic media, optical discs or any other optical media, ROM (read only memory), RAM (random access memory), cache memory and /or any other memory chip or cartridge, and/or any other medium from which a computer may read data, instructions and/or code.
  • the working memory 2002 and the storage device 2003 may be collectively referred to as "memory" and may be used concurrently with each other in some cases.
  • the processor 2001 can communicate with the first camera 101 and the second camera 102, the character recognition circuit 2005, the sound conversion circuit 2006, the sound output circuit 2007, the image processing circuit 2008, the word processing circuit 2009, the communication circuit 2010, the electronic The circuit 2100 and at least one of the other various devices and circuits included in the electronic device 2000 are controlled and scheduled.
  • at least some of the various components described in FIG. 10 may be interconnected and/or in communication via bus 2013 .
  • Software elements may reside in the working memory 2002, including, but not limited to, an operating system 2002a, one or more application programs 2002b, drivers, and/or other data and code.
  • instructions for performing the aforementioned control and scheduling may be included in the operating system 2002a or one or more application programs 2002b.
  • instructions to perform the method steps described in the present disclosure may be included in one or more application programs 2002b and the various modules of the electronic device 2000 described above. This may be accomplished by reading and executing instructions of one or more applications 2002b by the processor 2001 .
  • the electronic device 2000 may include a processor 2001 and a memory (eg, working memory 2002 and/or storage device 2003 ) that stores programs including instructions that, when executed by the processor 2001 cause the processing
  • the controller 2001 performs methods as described in various embodiments of the present disclosure.
  • some or all of the operations performed by at least one of the character recognition circuit 2005 , the sound conversion circuit 2006 , the image processing circuit 2008 , the word processing circuit 2009 , and the electronic circuit 2100 may be read and performed by the processor 2001 .
  • One or more instructions of the application 2002 are implemented.
  • the executable or source code of the instructions of the software element (program) may be stored in a non-transitory computer-readable storage medium (such as the storage device 2003), and may be stored in the working memory 2001 (possibly by the storage device 2003) when executed. compile and/or install). Accordingly, the present disclosure provides a computer-readable storage medium storing a program comprising instructions that, when executed by a processor of an electronic device (eg, a visually impaired assistive device), cause the electronic device to perform various functions of the present disclosure. method described in the examples. According to another embodiment, the executable code or source code of the instructions of the software element (program) may also be downloaded from a remote location.
  • circuits, units, modules, or elements may be implemented in hardware, software, firmware, middleware, microcode, hardware description languages, or any combination thereof.
  • some or all of the circuits, units, modules or elements encompassed by the disclosed methods and apparatus can be programmed in assembly language or hardware programming languages (such as VERILOG, VHDL, C++) using logic and algorithms according to the present disclosure.
  • Hardware eg, programmable logic circuits including Field Programmable Gate Arrays (FPGAs) and/or Programmable Logic Arrays (PLAs) are programmed to implement.
  • the processor 2001 in the electronic device 2000 may be distributed over a network. For example, some processing may be performed using one processor, while other processing may be performed by another processor remote from the one processor. Other modules of electronic device 2000 may be similarly distributed. As such, electronic device 2000 may be interpreted as a distributed computing system that performs processing in multiple locations.
  • a method for correcting a text image comprising:
  • the binocular camera includes a first camera and a second camera, and the optical axes of the first camera and the second camera are the same as the object to be recognized.
  • the placement surface is not vertical, and the initial image includes a first initial image including the object to be recognized obtained by tilting the first camera and a second initial image including the object to be recognized obtained by tilting the second camera;
  • Aspect 2 The correction method of aspect 1, wherein determining the rotation matrix comprises:
  • the rotation matrix is determined based on the at least two 3D directrixes and the set point.
  • Aspect 3 The correction method of aspect 2, wherein determining the at least two 3D guide lines comprises:
  • the 3D guideline is determined based on the first pixel strip and the second pixel strip corresponding to the positions and the corresponding depth information.
  • Aspect 4 The correction method according to Aspect 3, wherein the 3D guideline is determined based on the central axis and depth information of the first pixel strip corresponding to the position and the central axis and depth information of the second pixel strip.
  • Aspect 5 The correction method of Aspect 3, wherein the width of the first pixel strip and the width of the second pixel strip are both less than half of the width of the first initial image.
  • Aspect 6 The correction method according to Aspect 5, wherein the at least two first pixel bands are distributed on both sides of the central axis of the first initial image.
  • Aspect 7 The correction method of aspect 2, wherein determining the rotation matrix based on the at least two 3D guidelines and the set point comprises:
  • the rotation matrix is determined based on the average depth of the at least two 3D directrixes and the set point.
  • Aspect 8 The correction method according to Aspect 7, wherein the set point is an intersection of a straight line parallel to the optical axis of the first camera and the object to be recognized, and the first camera and the second The midpoint of the line connecting the optical centers of the cameras is located on the straight line parallel to the optical axis of the first camera.
  • Aspect 9 The correction method of any one of aspects 2-6, wherein determining the 3D image including the object to be recognized comprises:
  • the 3D image is represented by the plurality of first surface sampling points.
  • Aspect 10 The correction method of aspect 9, wherein acquiring the rotation-corrected image comprises:
  • the rotation matrix is used to determine that each of the plurality of first surface sampling points is relative to the set point.
  • the rotation-corrected image is represented by the plurality of second surface sampling points.
  • Aspect 11 The correction method of aspect 9, wherein the at least two 3D guidelines include a first 3D guideline and a second 3D guideline,
  • the determination of multiple surface straight generatrix includes:
  • one of the second discrete points is determined from the N second discrete points as the optimal corresponding discrete point of the first discrete point according to a preset rule, wherein the first 3D quasi-discrete point is The normal vector of the line passing through the first discrete point is consistent with the normal vector of the second 3D directrix passing through the optimal corresponding discrete point; and
  • the plurality of curved surface straight generatrixes are determined based on the n first discrete points and the corresponding optimal corresponding discrete points.
  • Aspect 12 The correction method according to Aspect 11, wherein determining one of the second discrete points as the optimal corresponding discrete point of the first discrete point from the N second discrete points according to a preset rule comprises:
  • the optimal corresponding range includes at least one second discrete point among the N second discrete points
  • Aspect 13 The rectification method according to aspect 1, wherein performing flattening and rectification on the rotation rectified image to obtain a final rectified image comprises:
  • Aspect 14 The rectification method according to aspect 13, wherein performing interpolation on the mapped image to obtain a final rectified image comprises:
  • mapping image is interpolated along the preset direction to obtain the final corrected image.
  • Aspect 15 The correction method according to Aspect 2, wherein from one side of the object to be recognized to the opposite side of the object to be recognized, the curved shape of the object to be recognized is substantially the same.
  • Aspect 16 The correction method according to Aspect 15, wherein the binocular camera is disposed on the side where the side edge of the object to be recognized is located.
  • Aspect 17 The correction method according to Aspect 2, wherein the at least two 3D guide lines extend along a bending direction of the object to be recognized.
  • Aspect 18 The remediation method of aspect 1, wherein the object to be recognized comprises a text area.
  • Aspect 19 The correction method according to Aspect 1, wherein the optical axes of the first camera and the second camera are arranged in parallel.
  • An electronic circuit comprising:
  • Circuitry configured to perform the steps of the remediation method of any of aspects 1-19.
  • Aspect 21 An apparatus for correcting text images, comprising:
  • a binocular camera configured to obliquely capture an initial image including an object to be identified, the binocular camera including a first camera and a second camera, the optical axes of both the first camera and the second camera being the same as the to-be-identified object
  • the placement surface of the object is not vertical
  • the initial image includes a first initial image of the object to be recognized and a second initial image including the object to be recognized
  • the first camera is configured to shoot the first initial image obliquely
  • the second camera is configured to take the second initial image obliquely;
  • Aspect 22 The orthotic device of aspect 21, further comprising:
  • a tablet configured to place the object to be identified
  • the binocular camera is fixedly assembled on the bracket.
  • An electronic device comprising:
  • a memory storing a program comprising instructions that, when executed by the processor, cause the processor to perform the remedial method of any of aspects 1-19.
  • Aspect 24 A non-transitory computer-readable storage medium storing a program, the program comprising instructions that, when executed by a processor of an electronic device, cause the electronic device to perform any one of the aspects 1-19 Correction method described in item.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

本公开提供一种文本图像的矫正方法,包括:获取双目相机倾斜拍摄得到的包括待识别对象的初始图像,双目相机包括第一相机和第二相机,第一相机和第二相机两者的光轴与待识别对象的放置面不垂直,初始图像包括第一相机倾斜拍摄得到的包括待识别对象的第一初始图像和第二相机倾斜拍摄得到的包括待识别对象的第二初始图像;确定待识别对象绕待识别对象上的一设定点旋转至与第一相机的光轴垂直所对应的旋转矩阵;基于第一初始图像和第二初始图像,确定包括待识别对象的3D图像;利用旋转矩阵,获取3D图像绕设定点旋转至与第一相机的光轴垂直所得到的旋转矫正图像;以及对旋转矫正图像进行展平矫正,得到最终矫正图像。

Description

文本图像的矫正方法及装置、设备和介质 技术领域
本申请涉及人工智能技术领域,特别涉及一种文本图像的矫正方法及装置、设备和介质。
背景技术
相关技术中在对诸如书籍或杂志之类的读物进行文字识别之前,可以对图像进行弯曲矫正,以克服因读物弯曲而影响文字识别的准确性的问题。但是,受到矫正算法本身的限制,对弯曲读物的文字识别效果仍有待提高。
在此部分中描述的方法不一定是之前已经设想到或采用的方法。除非另有指明,否则不应假定此部分中描述的任何方法仅因其包括在此部分中就被认为是现有技术。类似地,除非另有指明,否则此部分中提及的问题不应认为在任何现有技术中已被公认。
发明内容
根据本公开的一方面,提供一种文本图像的矫正方法,包括:获取双目相机倾斜拍摄得到的包括待识别对象的初始图像,双目相机包括第一相机和第二相机,第一相机和第二相机两者的光轴与待识别对象的放置面不垂直,初始图像包括第一相机倾斜拍摄得到的包括待识别对象的第一初始图像和第二相机倾斜拍摄得到的包括待识别对象的第二初始图像;确定待识别对象绕待识别对象上的一设定点旋转至与第一相机的光轴垂直所对应的旋转矩阵;基于第一初始图像和第二初始图像,确定包括待识别对象的3D图像;利用旋转矩阵,获取3D图像绕设定点旋转至与第一相机的光轴垂直所得到的旋转矫正图像;以及对旋转矫正图像进行展平矫正,得到最终矫正图像。
根据本公开的另一方面,提供一种电子电路,包括被配置为执行上述的矫正方法的步骤的电路。
根据本公开的另一方面,提供一种文本图像的矫正装置,包括:双目相机,被配置为倾斜拍摄包括待识别对象的初始图像,双目相机包括第一相机和第二相机,第一相机和第二相机两者的光轴与待识别对象的放置面不垂直,初始图像包括待识别对象的第一初始图像和包括待识别对象的第二初始图像,第一相机被配置为倾斜拍摄第一初始图像,第二相机被配置为倾斜拍摄第二初始图像;以及上述的电子电路。
根据本公开的另一方面,提供一种电子设备,包括:处理器;以及存储程序的存储器,程序包括指令,指令在由处理器执行时使处理器执行上述的矫正方法。
根据本公开的另一方面,提供一种存储程序的非暂态计算机可读存储介质,程序包括指令,指令在由电子设备的处理器执行时,致使电子设备执行上述的矫正方法。
附图说明
附图示例性地示出了实施例并且构成说明书的一部分,与说明书的文字描述一起用于讲解实施例的示例性实施方式。所示出的实施例仅出于例示的目的,并不限制权利要求的范围。在所有附图中,相同的附图标记指代类似但不一定相同的要素。
图1是示出根据本公开示例性实施例的文本图像的矫正方法的流程图;
图2是示出根据本公开示例性实施例的文本图像的矫正装置的工作示意图;
图3是示出根据本公开示例性实施例的第一初始图像中的第一像素带和3D准线的示意图;
图4是示出根据本公开示例性实施例的对极几何原理示意图;
图5是示出根据本公开示例性实施例的双目视觉的原理示意图;
图6是示出根据本公开示例性实施例的3D准线、双目相机和待识别对象的位置几何关系示意图;
图7是示出根据本公开示例性实施例的确定的多条曲面直母线的示意图;
图8是示出根据本公开示例性实施例的确定的待识别对象的多个第一曲面采样点的示意图;
图9是示出根据本公开示例性实施例的最终矫正图像的示意图;
图10是示出能够应用于示例性实施例的示例性计算设备的结构框图。
具体实施方式
在本公开中,除非另有说明,否则使用术语“第一”、“第二”等来描述各种要素不意图限定这些要素的位置关系、时序关系或重要性关系,这种术语只是用于将一个元件与另一元件区分开。在一些示例中,第一要素和第二要素可以指向该要素的同一实例,而在某些情况下,基于上下文的描述,它们也可以指代不同实例。
在本公开中对各种所述示例的描述中所使用的术语只是为了描述特定示例的目的,而并非旨在进行限制。除非上下文另外明确地表明,如果不特意限定要素的数量,则该 要素可以是一个也可以是多个。此外,本公开中所使用的术语“和/或”涵盖所列出的项目中的任何一个以及全部可能的组合方式。
诸如书籍或杂志之类的读物通常会有一定的排版,例如内容会分成不同的段落(例如包括上下的分段和左右的分栏等)。阅读这些读物时,人们通过视觉捕获视野中的图像,通过大脑来对图像中的文字进行段落划分。然而,如果是由机器来“阅读”这些读物,则不仅需要对图像中的文字进行文字识别,还要对这些文字进行段落划分,从而能够以正确的段落次序“阅读”读物中的文字。例如在将纸质书转换成电子书的应用中,或者在将图像中的文字转换成声音信号并输出该声音信号的应用中,可能会用到这种段落划分。在本公开中,“段落划分”是指将图像中的文字划分为成不同段落。上下的段落划分也可称为分段,而左右的段落划分也可称为分栏。
在本公开中,文本行是指相邻文字间距小于阈值间距的文字的序列,即连续的一行文字。相邻文字间距指的是相邻文字的对应位置的坐标之间的距离,例如相邻文字左上角坐标之间、右下角坐标之间或质心坐标之间的距离等。如果相邻文字间距不大于所述阈值间距,则可认为所述相邻文字连续,从而将其划分到同一文本行中。如果相邻文字间距大于所述阈值间距,则可认为所述相邻文字不连续(例如可能分别属于不同的段落或分别属于左右两栏),从而将其划分到不同的文本行中。所述阈值间距可以根据文字大小来设置,例如:字体大小大于四号(如三号、二号)的相邻文字设置的阈值间距大于字体大小为四号以下(如小四、五号)的相邻文字设置的阈值间距。
相关技术中,在对诸如书籍或杂志之类的读物进行文字识别之前,可以对图像进行弯曲矫正,以克服因读物弯曲而影响文字识别的准确性的问题。对图像进行弯曲矫正的具体过程可以为:采用矫正算法对弯曲曲面进行展平并插值,来解决图像弯曲问题。这种方法对垂直拍摄场景具有比较好的展平效果。但是,受到矫正算法本身的限制,在倾斜拍摄场景下,由于倾斜透视的原因,无法展平弯曲曲面,甚至得到更加扭曲的处理结果。
为了解决上述技术问题,本公开提供一种针对倾斜拍摄场景的文本图像的矫正方法。所述矫正方法通过确定待识别对象旋转至与双目相机中的第一相机的光轴垂直所对应的旋转矩阵以及待识别对象的3D图像。利用所确定的旋转矩阵和待识别对象的3D图像,能够得到待识别对象旋转至与第一相机的光轴垂直所对应的3D旋转矫正图像。对3D旋转矫正图像进行展平矫正,可以得到最终矫正图像。由于3D旋转矫正图像不存在倾斜透视 问题,3D旋转矫正图像中的文本行是一条直线,因此,对3D旋转矫正图像进行展平矫正能够达到良好的展平效果,进而能够保证对待识别对象进行文字识别的准确性。
在本公开中,待识别对象可以是指图片或读物的待识别文字的当前页面等。
以下将结合附图对根据本公开实施例的文本图像的矫正方法进行进一步描述。
图1是示出根据本公开示例性实施例的文本图像的矫正方法的流程图。如图1所示,矫正方法可以包括:步骤S101、获取双目相机倾斜拍摄得到的包括待识别对象的初始图像,双目相机包括第一相机和第二相机,第一相机和第二相机两者的光轴与待识别对象的放置面不垂直,初始图像包括第一相机倾斜拍摄得到的包括待识别对象的第一初始图像和第二相机倾斜拍摄得到的包括待识别对象的第二初始图像;步骤S102、确定待识别对象绕待识别对象上的一设定点旋转至与第一相机的光轴垂直所对应的旋转矩阵;步骤S103、基于第一初始图像和第二初始图像,确定包括待识别对象的3D图像;步骤S104、利用旋转矩阵,获取3D图像绕设定点旋转至与第一相机的光轴垂直所得到的旋转矫正图像;步骤S105、对旋转矫正图像进行展平矫正,得到最终矫正图像。由此,由于3D旋转矫正图像不存在倾斜透视问题,3D旋转矫正图像中的文本行是一条直线,因此,对3D旋转矫正图像进行展平矫正能够达到良好的展平效果,进而能够保证对待识别对象进行文字识别的准确性。
根据一些实施例,双目相机可以是独立装置(例如双目照相机、双目视频摄像机、双目摄像头等),也可以包括在各类电子设备(例如移动电话、计算机、个人数字助理、阅读辅助设备、平板电脑、可穿戴设备等)中。
根据一些实施例,双目相机可以设置于用户的可穿戴设备或眼镜等设备上,从而第一初始图像和第二初始图像可以是由该双目相机拍摄的、用户手中所握持的读物的图像。因而,待识别对象可包含文字(包括各种国家的文字、数字、字符、标点符号等)、图片等内容。待识别对象例如可以为:护照、驾照、书籍、杂志等读物的待识别文字的当前页面,包括文本区域。文本区域对应于文字所在的区域。在这种情况下,待识别对象的放置面即为读物的放置面。
根据一些实施例,双目相机中的第一相机和第二相机的光轴可以为平行设置。
第一相机和第二相机拍摄所得到的第一初始图像和第二初始图像可以均包括完整的待识别对象,以能够对整个待识别对象进行展平,便于后续处理,例如进行文字识别。
根据一些实施例,双目相机拍摄得到第一初始图像和第二初始图像也可以是经过了一些预处理的图像,所述预处理例如可以但不限于包括以下处理中的至少其中之一:畸变矫正、双目矫正、灰度处理和模糊去除。
图像畸变可以包括径向畸变和切向畸变,其中,径向畸变的产生原因是光线在远离摄像头中心的地方比靠近中心的地方更加弯曲。切向畸变的产生是由于摄像头制造上的缺陷使得摄像头本身与图像平面不平行而产生的。根据一些实施例,可以对第一初始图像和第二初始图像进行畸变矫正,从而能够消除因摄像镜头因素造成的失真。
对第一初始图像和第二初始图像分别进行的畸变矫正可以为,对第一初始图像和第二初始图像中的每一个像素进行畸变矫正。
根据一些实施例,畸变矫正公式可以为:
Figure PCTCN2021135748-appb-000001
其中,
Figure PCTCN2021135748-appb-000002
为已经完成畸变矫正的像素坐标,
Figure PCTCN2021135748-appb-000003
为未完成畸变矫正的像素坐标,
Figure PCTCN2021135748-appb-000004
Figure PCTCN2021135748-appb-000005
为图像的中心坐标,α为径向畸变与切向畸变的平衡因子,k 1、k 2、k 3、p 1和p 2为相机的畸变参数。
可以理解,本公开中的双目相机的第一相机和第二相机的相关畸变参数可以不同,降低对设备精度的需求,并且通过分别对第一相机和第二相机的相关畸变参数的矫正,能够消除第一相机和第二相机因摄像镜头原因造成的失真。
根据一些实施例,可以对第一初始图像和第二初始图像进行双目矫正。由此,可以使三维空间中的同一点被投射到第一初始图像和第二初始图像中位置对应的同一水平扫描线上,便于实现后续第一初始图像和第二初始图像中对应像素点的匹配。
在一个示例性实施例中,可以先对第一初始图像和第二初始图像分别进行畸变矫正,再对畸变矫正后的第一初始图像和第二初始图像进行双目矫正,从而能够进一步提高曲面矫正的效果。
根据一些实施例,从待识别对象的一侧边到相对的另一侧边,待识别对象的弯曲形状可以大致相同。可以理解的是,本公开的技术方案也适用于从待识别对象的一侧边到相对的另一侧边,待识别对象的弯曲形状不相同的场景。
根据一些实施例,如图2所示,在从待识别对象的一侧边到相对的另一侧边,待识别对象的弯曲形状大致相同的情况下,双目相机所包括的第一相机101和第二相机102可以设置在待识别对象100的所述一侧边所在的一侧,由此便于确定能够表征待识别对象弯 曲形状的3D准线,具体的原理将在以下内容中描述。需要说明的是,图2中示意的待识别对象100为平面,仅是为了便于示意,实际上待识别对象100为曲面。
本公开通过设置双目相机,可以基于双目视觉来确定包括待识别对象的3D图像,并且在双目相机倾斜角度未知的情况下,也可以基于双目视觉来确定待识别对象绕待识别对象上的一设定点旋转至与第一相机的光轴垂直所对应的旋转矩阵。
以下首先对双目视觉的原理进行说明。
如图3所示,双目视觉的原理可以为:第一相机的第一光心O l和第二相机的第二光心O r在x轴上的间隔距离为T。图3中长为L l和L r的两条线段分别表示第一相机的像平面和第二相机的像平面,第一光心O l和第二光心O r到相应的像平面的最短距离分别为焦距f l和f r。第一相机拍摄得到包括待识别对象的第一初始图像,第二相机拍摄得到包括待识别对象的第二初始图像,对于三维空间中的一点P,其在第一相机上的成像点(可以对应一个像素)为PL,在第二相机上的成像点(可以对应一个像素)为PR。PL和PR与各自像平面的左边缘的距离分别为x l和x r。可以定义点P在第一相机和第二相机中的成像视差为x l-x r或者是x r-x l。在双目相机标定和匹配后,双目相机的内参f l、f r,结构参数T以及x l、x r都能够获得之后,可以得到:
Figure PCTCN2021135748-appb-000006
在f l=f r=f的情况下,
Figure PCTCN2021135748-appb-000007
根据上述公式可以计算得到点P与第一光心Ol和第二光心Or所确定的直线之间的距离垂直Zc(即深度)。
根据一些实施例,在摄像机的倾斜角度(摄像机的光轴和与待识别对象的放置面垂直的垂线之间的夹角)未知的情况下,步骤S102中,确定旋转矩阵可包括:基于第一初始图像和第二初始图像,以及双目相机的内参,确定待识别对象的至少两条3D准线;基于至少两条3D准线和设定点,确定旋转矩阵。由此,在第一相机的倾斜角度未知的情况下,能够基于3D准线确定旋转矩阵,无需确定整个第一初始图像的深度,有效减小了计算量。
以下内容中将描述如何基于第一初始图像和第二初始图像,以及双目相机的内参,确定待识别对象的至少两条3D准线。
根据一些实施例,确定至少两条3D准线可以包括:确定第一初始图像中的至少两条第一像素带;确定第二初始图像中与至少两条第一像素带各自位置对应的至少两条第二像素带;基于位置对应的第一像素带和第二像素带,以及双目相机的内参,确定位置对应的第一像素带和第二像素带相应的深度信息;以及基于位置对应的第一像素带和第二像素带以及相应的深度信息,确定3D准线。由此,可以通过位置对应的第一像素带和第二像素带确定3D准线,避免了对第一初始图像和第二初始图像进行全图范围的像素点匹配,减少了计算量。
以下将对如何确定位置对应的第一像素带和第二像素带进行说明。
根据一些实施例,可以针对第一像素带中的每一个像素点,确定其在第二初始图像中的匹配像素点。可以确定第二初始图像中分别与第一像素带中的所有像素点匹配的多个像素点所确定的像素带即为与第一像素带位置对应的第二像素带。
根据一些实施例,可以基于对极几何原理,提升第一初始图像与第二初始图像中对应像素带的匹配效率。
如图4所示,对极几何原理可以理解为:第一相机的第一光心为Ol和第二相机的第二光心为Or,对于三维空间中的点M,其在第一初始图像和第二初始图像中对应的投影像素点必然处于点M、第一光心Ol和第二光心Or所确定的对极平面MOlOr上。如图4所示,点M在第一初始图像上的成像点为M l,在第二初始图像上的成像点为M r,对极平面MOlOr与第一初始图像相交于第一初始图像中经过点M l的第一极线L l,对极平面MOlOr与第二初始图像相交于第二初始图像中经过点M r的第二极线L r。在点M未知的情况下,已知第一初始图像上的像素点M l,可以基于对极几何原理确定像素点M l在第二初始图像上的对应的像素点M r在第二初始图像的第二极线L r上,并基于像素点M l和基础矩阵F,确定第二极线L r。具体的计算公式为:
L r=FM l
其中,F表示基础矩阵,基础矩阵F可以根据第一相机的内参、第二相机的内参和第一相机与第二相机之间的外参确定。
通过对极几何原理,建立了第一初始图像和第二初始图像中针对于三维空间中同一点的对应成像点之间的约束关系。由此,通过将第一初始图像和第二初始图像中对应像素点的匹配过程由二维图像空间的搜索简化为在对应极线范围内的一维搜索,从而能够提升匹配的效率和准确性。
根据一些实施例,在第一像素带的长度延伸方向与第一相机的光心和第二相机的光心两者的连线在像平面所在平面上的正投影平行的情况下,确定位置与第一像素带对应的第二像素带的方法具体可以为:根据第一像素带的一条长边上的一个像素点可以确定第二初始图像中的极线,可以确定第二初始图像中的该极线与第一像素带的该条边位置对应。类似地,可以确定第二初始图像中与第一像素带的另一条长边位置对应的另一极线,从而能够基于这两条极线确定第二初始图像中与第一像素带位置对应的第二像素带。
根据一些实施例,上述基于对极几何原理的像素点匹配可以通过OpenCV、Matlab或其他软件产品实现,在此不作限定。
根据一些实施例,确定第二初始图像中与至少两条第一像素带各自位置对应的至少两条第二像素带并不局限于上述一种方式,例如也可以利用神经网络实现。可以通过神经网络确定第一像素带中的所有像素点在第二初始图像中的位置对应的匹配像素点,由所有匹配像素点所确定的像素带即为与第一像素带位置对应的第二像素带。
在一个示例性实施例中,利用神经网络的对应像素点匹配方法可以理解为,将第一初始图像中的已知的第一像素点的特征和第二初始图像中可能与第一像素点匹配的第二像素点的特征输入神经网络。其中,可能与第一像素点匹配的第二像素点可以在第二初始图像中的一限定范围内确定,也可以在第二初始图像中的整个范围内确定。神经网络响应于输入第一像素点的特征和第二像素点的特征,可以输出用于确定第一像素点和该第二像素点的匹配程度的输出结果。通过比较每一个可能与第一像素点匹配的第二像素点与第一像素点的匹配程度,可以确定在第二初始图像中与第一像素点匹配的对应像素点。
根据一些实施例,可以通过确定第二初始图像中与第一像素带的一条边(例如一条长边)上的多个第一像素点分别匹配的多个第二像素点,基于多个第二像素点可以拟合得到第二初始图像中与该条边位置对应的一条线。类似地,可以确定第二初始图像中与第一像素带的另一条边(例如另一条长边)位置对应的另一线,从而能够基于这两条线确定第二初始图像中与第一像素带位置对应的第二像素带。
根据一些实施例,利用神经网络的对应点匹配方法可以通过训练得到的双目匹配神经网络实现,其中,所述双目匹配神经网络可以包括以下网络中的至少一种:CNN(Convolutional Neural Networks,卷积神经网络)、DNN(Deep Neural Network,深度神经网络)或RNN(Recurrent Neural Network,循环神经网络)等。所述双目匹配网络可 以包括所述CNN、DNN和RNN等网络中的一种网络,也可以包括所述CNN、DNN和RNN等网络中的至少两种网络。
可以理解,利用神经网络确定第二像素带的方法并不局限于上述方法,在此不作限定。
可以理解,确定第二初始图像中与至少两条第一像素带各自位置对应的至少两条第二像素带的方法并不限于上述两种方法,也可以采用其他方法,在此不作限定。
根据一些实施例,在第一初始图像中确定的至少两条像素带可以互相平行,由此可以减少计算量。根据另一些实施例,在第一初始图像中确定的至少两条像素带也可以互相不平行,在此不作限定。
在确定了位置对应的第一像素带和第二像素带之后,根据一些实施例,可以根据双目视觉原理确定位置对应的第一像素带和第二像素带相应的深度信息。具体的双目视觉的原理已经在上文进行了说明,在此不再赘述。
在确定了位置对应的第一像素带和第二像素带相应的深度信息后,可以进一步地确定待识别对象的两条3D准线。
根据一些实施例,所确定的至少两条3D准线可以互相平行。根据另一些实施例,所确定的至少两条3D准线中的两条或多条可以不平行。
在图5示意的示例性实施例中,可以确定第一初始图像中的两条第一像素带,即第一像素带301和另一第一像素带302。基于在第一初始图像中确定的两条第一像素带,在第二初始图像中确定与该两条第一像素带各自位置对应的两条第二像素带。基于位置对应的第一像素带和第二像素带,以及双目相机的内参,可以确定位置对应的第一像素带和第二像素带相应的深度信息。基于位置对应的第一像素带和第二像素带以及相应的深度信息,可以在第一像素带301和第一像素带302中分别确定3D准线1020和3D准线1030。由此,能够基于两条第一像素带和对应的两条第二像素带快速确定两条3D准线。
需要说明的是,图5中所示的3D准线1020和3D准线1030包含深度信息,图5中3D准线1020和3D准线1030为直线仅是为了便于示意,实际上3D准线1020和3D准线1030为曲线。图5中示意的两条3D准线1020和1030互相平行。需要说明的是,这两条3D准线也可以不平行。可以理解,也可以确定第一初始图像中的三条或三条以上第一像素带,以及第二初始图像中与该三条或三条以上第一像素带各自位置对应的三条或三条以上第二像素带,从而能够确定三条或三条以上3D准线,在此不作限定。
根据一些实施例,3D准线可以为基于位置对应的第一像素带的中轴线及其深度信息和第二像素带的中轴线及其深度信息而确定。由此,能够简单快速地确定3D准线。具体地,确定3D准线的方法可以为:确定位置对应的第一像素带和第二像素带中像素点的坐标值。分别采样位于第一像素带的中轴线的位置对应的多个采样点和第二像素带的中轴线的位置对应的多个采样点,利用上述双目视觉原理计算上述多个采样点的深度值,以获得每个采样点的三维坐标。将对多个采样点的三维坐标进行拟合后得到的结果确定为3D准线。
根据一些实施例,第一像素带的宽度和第二像素带的宽度均可以小于第一初始图像宽度的一半。由此,能够实现至少两条第一像素带不交叠,进而能够保证在不同的第一像素带中确定的3D准线互不交叉。另外,还能够使确定的至少两条3D准线能够表征待识别对象的弯曲形状,进而能够提高后续的弯曲矫正效果。
根据一些实施例,至少两条第一像素带可以在第一初始图像的中轴线的两侧均有分布。由此,通过在第一初始图像中的中轴线两侧的区域内分别确定3D准线,使确定的至少两条3D准线能够表征待识别对象的位于中轴线两侧区域的弯曲形状,进而能够提高后续的弯曲矫正效果。
根据一些实施例,可以在第一初始图像中的每一个第一像素带中各确定一条3D准线。根据另一些实施例,也可以在第一初始图像中的每一个第一像素带中各确定两条及以上3D准线。
根据一些实施例,至少两条3D准线可沿所述待识别对象的弯曲方向延伸。由此,基于3D准线能够表征待识别对象的弯曲形状,能够提高后续的弯曲矫正效果。
根据一些实施例,至少两条3D准线可以为大致与第一像素带的长边平行。
需要说明的是,本公开对确定的至少两条3D准线的数量、分布、形状、相互关系均不作限定,只要所确定的3D准线能够表征待识别对象的弯曲形状即可,均能够实现本公开的技术方案,在此不作限定。
可以理解的,也可以基于整个第一初始图像和整个第二初始图像确定第一初始图像中像素点在第二初始图像中位置对应的像素点;基于第一初始图像和第二初始图像中位置对应的像素点,以及双目相机的内参,确定位置对应的像素点的深度信息;以及基于位置对应的像素点以及相应的深度信息,确定至少两条3D准线。在这种情形下,可以基于对极几何原理、神经网络等方法中的一种确定在第一初始图像和第二初始图像中位置对应的像素点,具体的实现方法可以参见上面内容,在此不再赘述。可以基于双目视觉 原理确定第一初始图像和第二初始图像中位置对应的像素点的深度信息,具体的实现方法可以参见上面内容,在此不再赘述。
可以理解的,也可以首先计算整个待识别对象的深度信息,再基于整个第一初始图像和第二初始图像确定一条包含深度信息的线作为3D准线。
在确定至少两条3D准线之后,在双目相机倾斜角度未知的情况下,可以基于至少两条3D准线,确定待识别对象绕待识别对象上的一设定点旋转至与第一相机的光轴垂直所对应的旋转矩阵。
根据一些实施例,根据至少两条3D准线以及设定点,确定旋转矩阵包括:计算至少两条3D准线中的每一条的平均深度;以及基于至少两条3D准线的平均深度和设定点,确定旋转矩阵。由此,能够简化计算。
根据一些实施例,设定点可以为与第一相机的光轴平行的直线和待识别对象的交点,并且第一相机和第二相机的光心的连线的中点位于与第一相机的光轴平行的所述直线(为便于描述,将该直线定义为中垂线)上,从而能够简化确定旋转矩阵的计算。
可以理解,设定点也可以为待识别对象上的其他特定点,在此不作限定。
参见图6所示,以与第一相机的光轴平行的中垂线和待识别对象的交点Q作为设定点,第一相机和第二相机的光心的连线的中点位于与第一相机的光轴平行的所述中垂线上,基于至少两条3D准线以及设定点,确定旋转矩阵具体的原理可以为:
图6中仅示出两条准线:第一准线1020和第二准线1030,由几何关系可得:
Figure PCTCN2021135748-appb-000008
基于上述方程组可解得:
Figure PCTCN2021135748-appb-000009
其中,h 1为第一准线1020的平均深度,h 2为第二准线1030的平均深度。d 1为第一准线1020的深度方向与中点直线之间的距离,d 2为第二准线1030的深度方向与中点直线之间的距离。d 1和d 2可以基于所确定的3D准线与双目相机之间的位置关系确定。h 0为设定点的深度,旋转矩阵R为:
Figure PCTCN2021135748-appb-000010
通过上述方法,可以在第一相机的倾斜角度α未知的情况下,计算得到待识别对象绕待识别对象上的一设定点旋转至与第一相机的光轴垂直所对应的旋转矩阵。
可以理解的是,如果第一相机的倾斜角度α已知,则可以直接计算得到旋转矩阵R。在这种情况下,可以根据双目视觉原理计算交点Q在相机坐标系中的深度,具体的方法已在上面内容中描述。
在确定旋转矩阵之后,可以执行步骤S103、基于第一初始图像和第二初始图像,确定包括待识别对象的3D图像。需要说明的是,本公开不限定步骤S102和步骤S103的执行顺序,步骤S102和步骤S103也可以同步执行。
根据一些实施例,步骤S103可以包括:基于至少两条3D准线,确定多条曲面直母线;以及至少基于多条曲面直母线和至少两条3D准线,确定多个第一曲面采样点以及每一第一曲面采样点的三维坐标,其中,3D图像由所述多个第一曲面采样点来表示。由此,能够基于曲面直母线和至少两条3D准线来确定待识别对象的3D图像的多个第一曲面采样点,并通过多个第一曲面采样点来表示待识别对象的3D图像,从而能够简化3D图像的确定步骤,减少计算量。在图7所示的示例中,示意了确定的多条曲面直母线201。在图8所示的示例中,示意了确定的多个第一曲面采样点202。
需要说明的是,图7和图8中所示的3D准线1020和3D准线1030包含深度信息,3D准线1020和3D准线1030为直线仅是为了便于示意,实际上3D准线1020和3D准线1030为曲线。
根据相关技术,曲面可由曲面直母线移动扫过的轨迹重建,直母线存在于曲面上,并且是直线。因此,上述步骤S103通过多个第一曲面采样点可以拟合得到待识别对象的3D图像。
步骤S103中可以直接利用步骤S102中确定的3D准线,以减少计算量。可以理解的是,在先执行步骤S103时,步骤S102也可以直接利用步骤S103中确定的3D准线。
可以理解的是,本公开中也可以通过其它方法来确定待识别对象的3D图像,并不局限于上述一种方式。
在确定待识别对象的3D图像和待识别对象绕待识别对象上的一设定点旋转至与第一相机的光轴垂直所对应的旋转矩阵之后,可以执行步骤S104、利用所述旋转矩阵,获取3D图像绕设定点旋转至与第一相机的光轴垂直所得到的旋转矫正图像。
根据一些实施例,步骤S104可以包括:基于多个第一曲面采样点中的每一第一曲面采样点的深度、设定点的深度,利用旋转矩阵确定多个第一曲面采样点各自相对设定点旋转后所得到的多个第二曲面采样点。在这种情况下,旋转矫正图像可以由多个第二曲面采样点来表示。通过多个第二曲面采样点可以拟合得到待识别对象的3D旋转矫正图像。
设定点的深度可以采用上面内容中描述的方法来获取。第一曲面采样点为基于曲面直母线和3D准线而确定,其深度也可以基于曲面直母线和3D准线来获得。
上述技术方案中,通过确定待识别对象的多条曲面直母线,并基于所确定的多条曲面直母线和至少两条所述3D准线来确定待识别对象的3D图像,进而能够基于旋转矩阵和3D图像来确定待识别对象旋转至与第一相机的光轴垂直所得到的3D旋转矫正图像,以克服倾斜拍摄场景下存在的倾斜透视问题。
根据一些实施例,至少两条3D准线可以包括第一3D准线和第二3D准线。下面将以第一3D准线和第二3D准线为例,来具体描述如何基于两条3D准线来确定曲面的曲面直母线。可以理解的是,至少两条3D准线中的任意两条均可以采用下面的方法来确定相应的曲面直母线。
根据一些实施例,确定多条曲面直母线可以包括:对第一3D准线进行采样,得到n个第一离散点;对第二3D准线进行采样,得到N个第二离散点,其中,n和N为正整数,并且N>n;针对n个第一离散点中的每一个,根据预设规则从N个第二采样点中确定其中一个第二离散点为该第一离散点的最优对应离散点,其中,第一3D准线的通过该第一离散点的法向量和第二3D准线的通过该最优对应离散点的法向量一致;以及基于n个第一离散点以及相应的最优对应离散点,确定所述多条曲面直母线。
根据一些实施例,针对n个第一离散点中的每一个,根据预设规则从N个第二采样点中确定其中一个第二离散点为该第一离散点的最优对应离散点可以包括:确定第二3D准线的与该一离散点对应的最优对应范围,最优对应范围包括N个第二离散点中的至少一个第二离散点;计算该第一离散点与相应最优对应范围中的每一个第二离散点之间的代价函数;基于代价函数,从相应的最优对应范围中确定其中一个第二离散点为该第一离散点对应的最优对应离散点。由此,通过使用自适应搜索范围,搜索的是第二3D准线上的最优对应离散点(即第二3D准线离散点的下标索引),而不是弧长离散值,不需要设置搜索梯度范围,提高了运算速度。
在示例中,第一离散点和第二离散点两者可以但不局限于为以弧长为参数的离散点,从而第一3D准线和第二3D准线两者可以表示为以弧长为参数的3D离散点集合。一个弧长参数值可以对应3D准线上唯一一个3D坐标。
根据一些实施例,可以基于第一3D准线的像素坐标,确定第一3D准线的弧长参数公式C 0(t),具体的转换方法为现有技术,在此不再详述。同样地,可以基于第二3D准线的像素坐标,确定第二3D准线的弧长参数公式C 1(s)。
根据一些实施例,可以先将第一3D准线C 0(t)和第二3D准线C 1(s)离散化为相同数量(记为N)的离散点。再将C 0(t)降采样K倍,记录其采样位置索引为U i(约N/K个元素)。然后寻找最优离散下标(即C 1(s)离散化后的位置索引)的对应关系,表示如下:j=f(U i)。
根据一些实施例,确定C 1(s)中的与C 0(t)所对应的最优对应离散点,其所利用的所述代价函数可以为:
Figure PCTCN2021135748-appb-000011
上述公式中形式为(a,b,c)的表达式表示三个矢量的混合积,即(a,b,c)=a×b·c;
Figure PCTCN2021135748-appb-000012
表示第一条3D准线离散化并降采样K倍后的第U i个弧长参数值,
Figure PCTCN2021135748-appb-000013
表示第二条3D准线C 1(s)离散化后第f(U i)个弧长参数值。
Figure PCTCN2021135748-appb-000014
表示第一条准线离散化后并降采样K倍后第U i个曲线坐标,
Figure PCTCN2021135748-appb-000015
表示第二条准线离散化后第f(U i)个曲线3D坐标。
根据一些实施例,可以首先计算候选中心点下标j=f(U i-1)+U i-U i-1,则搜索的下标范围(即最优对应范围)可以为:[f(U i-1)+1,f(U i-1)+2(U i-U i-1)]。由此,可以通过设置搜索步长,在保证范围的同时,减少搜索路径数目,而性能基本保持不变。
需要说明的是,确定曲面直母线的方式并不局限于所述一种,例如,也可以分别计算第一3D准线的n个第一离散点集合各自对应的切面法向量和切向量,以及第二3D准线的N个第二离散点集合各自对应的切面法向量和切向量,可以利用同一直母线上法向量的一致性原则确定曲面直母线方程。换言之,两条3D准线之间的法向量相等的两个离散点是候选最优对应点,这两个离散点间的连线是候选直母线。所以,可通过离散点间的法向量的相似程度以及离散点变化快慢来计算曲面直母线。
根据一些实施例,步骤S105、对所述旋转矫正图像进行展平矫正,得到最终矫正图像可以包括:将旋转矫正图像正投影至第一相机的像平面上,得到映射图像;以及对映射图像进行插值,得到最终矫正图像。由于3D旋转矫正图像不存在倾斜透视问题,文本行是一条直线,因此,通过对3D旋转矫正图像进行正投影和调整间距,能够达到展平弯曲曲面的目的,进而能够保证对待识别对象进行文字识别的准确性。在图9所示的示例中,示意了所得到的最终矫正图像,从图中可以看出最终矫正图像中的文本行位于一条直线上。图9中右下角的黑白条纹图形为边界值插值结果,不具有实际意义。
根据一些实施例,对映射图像进行插值,得到最终矫正图像可以包括:针对映射图像,计算沿预设方向相邻的两个像素之间的3D距离;以及基于3D距离,沿预设方向对映射图像进行插值,得到最终矫正图像。由于经过旋转和正投影后,一致性弯曲的曲面只在Xc0cZc平面内有隆起或者凹陷。因此,可以沿Xc坐标轴进行插值,调整像素之间的距离,由此来进行展平矫正,具有容易实现、计算量小等优点。
在示例性实施例中,可以获取中间一条与Xc坐标轴平行的直线的3D坐标,计算相邻像素点的3D距离作为两个像素的新间距。可以利用新间距进行2D格点插值,得到插值图像(即,最终矫正图像)。在示例性实施例中,2D格点可以采用线性插值来完成:(1-a)*P1+a*P2。其中,P1和P2表示相邻的两个2D离散的坐标以及像素值,a是要插入的像素点(整数格点)和P1的距离,占P1和P2间距离的比例。可以理解的是,也可以采用其它插值方法来进行图像插值,例如,最近邻插值、双平方插值、双立方插值等,在此不作限定。
本公开的技术方案具有以下优点:
本公开可以用于解决倾斜拍摄下的一致性弯曲书页问题。该一致性弯曲在书本等文本载体的拍摄中是典型场景。而现有的展平算法在倾斜透视拍摄中无法有效展平。本公开在倾斜透视拍摄能够有效展平的原因是:在摊开的书本每个字的正上方向下看,文本行是一条直线,故通过将待识别对象旋转至与第一相机的光轴垂直,并正投影至第一相机的像平面,然后调整像素之间的间距,可达到展平的目的;
现有展平算法在确定曲面直母线时,需要设置弧长对应函数的斜率区间。但在倾斜视角下,最短路径上弧长对应函数的斜率范围非常大,为保证涵盖该斜率范围,计算量非常大。而本公开将弧长对应关系转化为下标对应关系,根据当前状态自适应设置搜索范围,计算量小。
本公开不需要对曲面进行展平,而且因为倾斜透视的问题,现有的曲面展平算法不能实现曲面展平。而本公开采用待识别对象的多个曲面采样绕设定点旋转至与第一相机的光轴垂直,然后投影到像平面的方法,计算量小,并且能够解决倾斜透视问题。
根据本公开的另一方面,提供一种电子电路,包括:被配置为执行根据如上所述的文本图像的矫正方法的步骤的电路。
根据本公开的另一方面,提供一种文本图像的矫正装置,包括:双目相机,被配置为倾斜拍摄包括待识别对象的初始图像,双目相机包括第一相机和第二相机,第一相机和第二相机两者的光轴与待识别对象的放置面不垂直,初始图像包括待识别对象的第一初始图像和包括待识别对象的第二初始图像,第一相机被配置为倾斜拍摄第一初始图像,第二相机被配置为倾斜拍摄第二初始图像;以及如上的电子电路。
根据一些实施例,如图2所示,所述矫正装置还可以包括:支架200和平板300。其中,待识别对象100放置在平板300上,双目相机的第一相机101和第二相机102均固定装配在支架200上。
在示例性实施例中,从待识别对象的一侧边到相对的另一侧边,待识别对象的弯曲形状可以大致相同。双目相机可以设置在待识别对象的其中一侧边所在的一侧,从而第一初始图像和第二初始图像确定的至少两条3D准线能够表征待识别对象的弯曲形状。
根据本公开的另一方面,提供一种电子设备,包括:处理器;以及存储程序的存储器,程序包括指令,指令在由处理器执行时使处理器执行根据上述的矫正方法。
根据本公开的另一方面,提供一种存储程序的非暂态计算机可读存储介质,所述程序包括指令,所述指令在由电子设备的处理器执行时,致使所述电子设备执行上述的矫正方法。
图10是示出根据本公开的示例性实施例的电子设备的示例的框图。要注意的是,图10所示出的结构仅是一个示例,根据具体的实现方式,本公开的电子设备可以仅包括图10所示出的组成部分中的一种或多个。
电子设备2000例如可以是通用计算机(例如膝上型计算机、平板计算机等等各种计算机)、移动电话、个人数字助理。根据一些实施例,电子设备2000可以是视障辅助设备。
电子设备2000可被配置为拍摄图像,对所拍摄的图像进行处理,并且响应于所述处理所获得的数据而提供声音提示。例如,电子设备2000可被配置为拍摄图像,对该图像 进行文字检测和/或识别以获得文字数据,将文字数据转换成声音数据,并且输出声音数据供用户聆听。
根据一些实施方式,所述电子设备2000可以被配置为包括眼镜架或者被配置为能够可拆卸地安装到眼镜架(例如眼镜架的镜框、连接两个镜框的连接件、镜腿或任何其他部分)上,从而能够拍摄到近似包括用户的视野的图像。
根据一些实施方式,所述电子设备2000也可被安装到其它可穿戴设备上,或者与其它可穿戴设备集成为一体。所述可穿戴设备例如可以是:头戴式设备(例如头盔或帽子等)、可佩戴在耳朵上的设备等。根据一些实施例,所述电子设备可被实施为可附接到可穿戴设备上的配件,例如可被实施为可附接到头盔或帽子上的配件等。
根据一些实施方式,所述电子设备2000也可具有其他形式。例如,电子设备2000可以是移动电话、通用计算设备(例如膝上型计算机、平板计算机等)、个人数字助理,等等。电子设备2000也可以具有底座,从而能够被安放在桌面上。
根据一些实施方式,所述电子设备2000作为视障辅助设备可以用于辅助阅读,在这种情况下,所述电子设备2000有时也被称为“电子阅读器”或“阅读辅助设备”。借助于电子设备2000,无法自主阅读的用户(例如视力障碍人士、存在阅读障碍的人士等)可以采用类似阅读姿势的姿势即可实现对常规读物(例如书本、杂志等)的“阅读”。在“阅读”过程中,所述电子设备2000可以拍摄图像,获取包括待识别对象的初始图像。所述电子设备2000还可以对所述初始图像进行弯曲矫正,得到最终矫正图像,然后对所述最终矫正图像中的文本区域的文字进行版面分析、文字检测和文字识别(例如利用光学文字识别OCR方法),以获得文字数据,克服文本弯曲对识别待识别对象中的文字的影响,提高文字识别效率和准确性。然后可以将文字数据转换成声音数据,并且通过扬声器或耳机等声音输出设备输出所述声音数据供用户聆听。
电子设备2000可以包括第一相机101和第二相机102,用于获取图像。第一相机101和第二相机102可以包括但不限于摄像头或照相机等,被配置为获取包括待识别对象的初始图像。电子设备2000还可以包括电子电路2100,所述电子电路2100包括被配置为执行如前所述的方法的步骤(例如图1和图3的流程图中所示的方法步骤)的电路。电子设备2100还可以包括文字识别电路2005,所述文字识别电路2005被配置为对所述初始图像中待识别对象的文字区域的文字进行文字检测和/或识别(例如OCR处理),从而获得文字数据。所述文字识别电路2005例如可以通过专用芯片实现。电子设备2000还可以包括声音转换电路2006,所述声音转换电路2006被配置为将所述文字数据转换成 声音数据。所述声音转换电路2006例如可以通过专用芯片实现。电子设备2000还可以包括声音输出电路2007,所述声音输出电路2007被配置为输出所述声音数据。所述声音输出电路2007可以包括但不限于耳机、扬声器、或振动器等,及其相应驱动电路。
根据一些实施方式,所述电子设备2000还可以包括图像处理电路2008,所述图像处理电路2008可以包括被配置为对图像进行各种图像处理的电路。图像处理电路2008例如可以包括但不限于以下中的一个或多个:被配置为对图像进行降噪的电路、被配置为对图像进行去模糊化的电路、被配置为对图像进行几何矫正的电路、被配置为对图像进行特征提取的电路、被配置为对图像中的对象进行对象检测和/或识别的电路、被配置为对图像中包含的文字进行文字检测的电路、被配置为从图像中提取文本行的电路、被配置为从图像中提取文字坐标的电路、被配置为从图像中提取对象框的电路、被配置为从图像中提取文本框的电路、被配置为基于图像进行版面分析(例如段落划分)的电路,等等。
根据一些实施方式,电子电路2100还可以包括文字处理电路2009,所述文字处理电路2009可以被配置为基于所提取的与文字有关的信息(例如文字数据、文本框、段落坐标、文本行坐标、文字坐标等)进行各种处理,从而获得诸如段落排序、文字语义分析、版面分析结果等处理结果。
上述的各种电路(例如文字识别电路2005、声音转换电路2006、声音输出电路2007、图像处理电路2008、文字处理电路2009、电子电路2100中的一个或多个可以使用定制硬件,和/或可以用硬件、软件、固件、中间件、微代码,硬件描述语言或其任何组合来实现。例如,上述的各种电路中的一个或多个可以通过使用根据本公开的逻辑和算法,用汇编语言或硬件编程语言(诸如VERILOG,VHDL,C++)对硬件(例如,包括现场可编程门阵列(FPGA)和/或可编程逻辑阵列(PLA)的可编程逻辑电路)进行编程来实现。
根据一些实施方式,电子设备2000还可以包括通信电路2010,所述通信电路2010可以是使得能够与外部设备和/或与网络通信的任何类型的设备或系统,并且可以包括但不限于调制解调器、网卡、红外通信设备、无线通信设备和/或芯片组,例如蓝牙设备、1302.11设备、WiFi设备、WiMax设备、蜂窝通信设备和/或类似物。
根据一些实施方式,电子设备2000还可以包括输入设备2011,所述输入设备2011可以是能向电子设备2000输入信息的任何类型的设备,并且可以包括但不限于各种传感器、鼠标、键盘、触摸屏、按钮、控制杆、麦克风和/或遥控器等等。
根据一些实施方式,电子设备2000还可以包括输出设备2012,所述输出设备2012可以是能呈现信息的任何类型的设备,并且可以包括但不限于显示器、视觉输出终端、振动器和/或打印机等。尽管电子设备2000根据一些实施例用于视障辅助设备,基于视觉的输出设备可以方便用户的家人或维修工作人员等从电子设备2000获得输出信息。
根据一些实施方式,电子设备2000还可以包括处理器2001。所述处理器2001可以是任何类型的处理器,并且可以包括但不限于一个或多个通用处理器和/或一个或多个专用处理器(例如特殊处理芯片)。处理器2001例如可以是但不限于中央处理单元CPU或微处理器MPU等等。电子设备2000还可以包括工作存储器2002,所述工作存储器2002可以存储对处理器2001的工作有用的程序(包括指令)和/或数据(例如图像、文字、声音,以及其他中间数据等)的工作存储器,并且可以包括但不限于随机存取存储器和/或只读存储器设备。电子设备2000还可以包括存储设备2003,所述存储设备2003可以包括任何非暂时性存储设备,非暂时性存储设备可以是非暂时性的并且可以实现数据存储的任何存储设备,并且可以包括但不限于磁盘驱动器、光学存储设备、固态存储器、软盘、柔性盘、硬盘、磁带或任何其他磁介质,光盘或任何其他光学介质、ROM(只读存储器)、RAM(随机存取存储器)、高速缓冲存储器和/或任何其他存储器芯片或盒、和/或计算机可从其读取数据、指令和/或代码的任何其他介质。工作存储器2002和存储设备2003可以被集合地称为“存储器”,并且在有些情况下可以相互兼用。
根据一些实施方式,处理器2001可以对第一相机101和第二相机102、文字识别电路2005、声音转换电路2006、声音输出电路2007、图像处理电路2008、文字处理电路2009、通信电路2010、电子电路2100以及电子设备2000包括的其他各种装置和电路中的至少一个进行控制和调度。根据一些实施方式,图10中所述的各个组成部分中的至少一些可通过总线2013而相互连接和/或通信。
软件要素(程序)可以位于所述工作存储器2002中,包括但不限于操作系统2002a、一个或多个应用程序2002b、驱动程序和/或其他数据和代码。
根据一些实施方式,用于进行前述的控制和调度的指令可以被包括在操作系统2002a或者一个或多个应用程序2002b中。
根据一些实施方式,执行本公开所述的方法步骤(例如图1的流程图中所示的方法步骤)的指令可以被包括在一个或多个应用程序2002b中,并且上述电子设备2000的各个模块可以通过由处理器2001读取和执行一个或多个应用程序2002b的指令来实现。换言之,电子设备2000可以包括处理器2001以及存储程序的存储器(例如工作存储器2002 和/或存储设备2003),所述程序包括指令,所述指令在由所述处理器2001执行时使所述处理器2001执行如本公开各种实施例所述的方法。
根据一些实施方式,文字识别电路2005、声音转换电路2006、图像处理电路2008、文字处理电路2009、电子电路2100中的至少一个所执行的操作中的一部分或者全部可以由处理器2001读取和执行一个或多个应用程序2002的指令来实现。
软件要素(程序)的指令的可执行代码或源代码可以存储在非暂时性计算机可读存储介质(例如所述存储设备2003)中,并且在执行时可以被存入工作存储器2001中(可能被编译和/或安装)。因此,本公开提供存储程序的计算机可读存储介质,所述程序包括指令,所述指令在由电子设备(例如视障辅助设备)的处理器执行时,致使所述电子设备执行如本公开各种实施例所述的方法。根据另一种实施方式,软件要素(程序)的指令的可执行代码或源代码也可以从远程位置下载。
还应该理解,可以根据具体要求而进行各种变型。例如,也可以使用定制硬件,和/或可以用硬件、软件、固件、中间件、微代码,硬件描述语言或其任何组合来实现各个电路、单元、模块或者元件。例如,所公开的方法和设备所包含的电路、单元、模块或者元件中的一些或全部可以通过使用根据本公开的逻辑和算法,用汇编语言或硬件编程语言(诸如VERILOG,VHDL,C++)对硬件(例如,包括现场可编程门阵列(FPGA)和/或可编程逻辑阵列(PLA)的可编程逻辑电路)进行编程来实现。
根据一些实施方式,电子设备2000中的处理器2001可以分布在网络上。例如,可以使用一个处理器执行一些处理,而同时可以由远离该一个处理器的另一个处理器执行其他处理。电子设备2000的其他模块也可以类似地分布。这样,电子设备2000可以被解释为在多个位置执行处理的分布式计算系统。
虽然已经参照附图描述了本公开的实施例或示例,但应理解,上述的方法、系统和设备仅仅是示例性的实施例或示例,本发明的范围并不由这些实施例或示例限制,而是仅由授权后的权利要求书及其等同范围来限定。实施例或示例中的各种要素可以被省略或者可由其等同要素替代。此外,可以通过不同于本公开中描述的次序来执行各步骤。进一步地,可以以各种方式组合实施例或示例中的各种要素。重要的是随着技术的演进,在此描述的很多要素可以由本公开之后出现的等同要素进行替换。
下面描述本公开的一些示例性方面。
方面1.一种文本图像的矫正方法,包括:
获取双目相机倾斜拍摄得到的包括待识别对象的初始图像,所述双目相机包括第一相机和第二相机,所述第一相机和第二相机两者的光轴与所述待识别对象的放置面不垂直,所述初始图像包括所述第一相机倾斜拍摄得到的包括待识别对象的第一初始图像和所述第二相机倾斜拍摄得到的包括待识别对象的第二初始图像;
确定所述待识别对象绕所述待识别对象上的一设定点旋转至与所述第一相机的光轴垂直所对应的旋转矩阵;
基于所述第一初始图像和第二初始图像,确定包括所述待识别对象的3D图像;
利用所述旋转矩阵,获取所述3D图像绕所述设定点旋转至与所述第一相机的光轴垂直所得到的旋转矫正图像;以及
对所述旋转矫正图像进行展平矫正,得到最终矫正图像。
方面2.如方面1所述的矫正方法,其中,确定所述旋转矩阵包括:
基于所述第一初始图像和所述第二初始图像,以及所述双目相机的内参,确定所述待识别对象的至少两条3D准线;
基于所述至少两条3D准线和所述设定点,确定所述旋转矩阵。
方面3.如方面2所述的矫正方法,其中,确定至少两条3D准线包括:
确定所述第一初始图像中的至少两条第一像素带;
确定所述第二初始图像中与所述至少两条第一像素带各自位置对应的至少两条第二像素带;
基于位置对应的所述第一像素带和所述第二像素带,以及所述双目相机的内参,确定位置对应的所述第一像素带和所述第二像素带相应的深度信息;以及
基于位置对应的所述第一像素带和所述第二像素带以及相应的深度信息,确定所述3D准线。
方面4.如方面3所述的矫正方法,其中,所述3D准线为基于位置对应的第一像素带的中轴线及其深度信息和第二像素带的中轴线及其深度信息而确定。
方面5.如方面3所述的矫正方法,其中,所述第一像素带的宽度和所述第二像素带的宽度均小于所述第一初始图像宽度的一半。
方面6.如方面5所述的矫正方法,其中,所述至少两条第一像素带在所述第一初始图像的中轴线的两侧均有分布。
方面7.如方面2所述的矫正方法,其中,基于所述至少两条3D准线以及所述设定点,确定所述旋转矩阵包括:
计算所述至少两条3D准线中的每一条的平均深度;以及
基于所述至少两条3D准线的平均深度和所述设定点,确定所述旋转矩阵。
方面8.如方面7所述的矫正方法,其中,所述设定点为与所述第一相机的光轴平行的直线和所述待识别对象的交点,并且所述第一相机和第二相机的光心的连线的中点位于与所述第一相机的光轴平行的所述直线上。
方面9.如方面2-6中任一项所述的矫正方法,其中,确定包括所述待识别对象的3D图像包括:
基于所述至少两条3D准线,确定多条曲面直母线;以及
至少基于所述多条曲面直母线和所述至少两条3D准线,确定多个第一曲面采样点以及每一第一曲面采样点的三维坐标,
其中,所述3D图像由所述多个第一曲面采样点来表示。
方面10.如方面9所述的矫正方法,其中,获取所述旋转矫正图像包括:
基于所述多个第一曲面采样点中的每一第一曲面采样点的深度、所述设定点的深度,利用所述旋转矩阵确定所述多个第一曲面采样点各自相对所述设定点旋转后所得到的多个第二曲面采样点,
其中,所述旋转矫正图像由所述多个第二曲面采样点来表示。
方面11.如方面9所述的矫正方法,其中,所述至少两条3D准线包括第一3D准线和第二3D准线,
其中,确定多条曲面直母线包括:
对第一3D准线进行采样,得到n个第一离散点;
对第二3D准线进行采样,得到N个第二离散点,其中,n和N为正整数,并且N>n;
针对n个第一离散点中的每一个,根据预设规则从N个第二离散点中确定其中一个第二离散点为该第一离散点的最优对应离散点,其中,第一3D准线的通过该第一离散点的法向量和第二3D准线的通过该最优对应离散点的法向量一致;以及
基于n个第一离散点以及相应的最优对应离散点,确定所述多条曲面直母线。
方面12.如方面11所述的矫正方法,其中,根据预设规则从N个第二离散点中确定其中一个第二离散点为该第一离散点的最优对应离散点包括:
确定所述第二3D准线的与该第一离散点对应的最优对应范围,所述最优对应范围包括所述N个第二离散点中的至少一个第二离散点;
计算该第一离散点与相应最优对应范围中的每一个第二离散点之间的代价函数;
基于代价函数,从相应的最优对应范围中确定其中一个第二离散点为该第一离散点对应的最优对应离散点。
方面13.如方面1所述的矫正方法,其中,对所述旋转矫正图像进行展平矫正,得到最终矫正图像包括:
将所述旋转矫正图像正投影至所述第一相机的像平面上,得到映射图像;以及
对所述映射图像进行插值,得到最终矫正图像。
方面14.如方面13所述的矫正方法,其中,对所述映射图像进行插值,得到最终矫正图像包括:
针对所述映射图像,计算沿预设方向相邻的两个像素之间的3D距离;以及
基于所述3D距离,沿所述预设方向对所述映射图像进行插值,得到所述最终矫正图像。
方面15.如方面2所述的矫正方法,其中,从所述待识别对象的一侧边到相对的另一侧边,所述待识别对象的弯曲形状大致相同。
方面16.如方面15所述的矫正方法,其中,所述双目相机设置在所述待识别对象的所述一侧边所在的一侧。
方面17.如方面2所述的矫正方法,其中,所述至少两条3D准线沿所述待识别对象的弯曲方向延伸。
方面18.如方面1所述的矫正方法,其中,所述待识别对象包括文本区域。
方面19.如方面1所述的矫正方法,其中,所述第一相机和所述第二相机的光轴为平行设置。
方面20.一种电子电路,包括:
被配置为执行根据方面1-19中任一项所述的矫正方法的步骤的电路。
方面21.一种文本图像的矫正装置,包括:
双目相机,被配置为倾斜拍摄包括待识别对象的初始图像,所述双目相机包括第一相机和第二相机,所述第一相机和第二相机两者的光轴与所述待识别对象的放置面不垂直,所述初始图像包括待识别对象的第一初始图像和包括待识别对象的第二初始图像,所述第一相机被配置为倾斜拍摄所述第一初始图像,所述第二相机被配置为倾斜拍摄所述第二初始图像;以及
如方面20所述的电子电路。
方面22.如方面21所述的矫正装置,还包括:
平板,被配置为放置所述待识别对象;
固定装配在所述平板上的支架,
其中,所述双目相机固定装配在所述支架上。
方面23.一种电子设备,包括:
处理器;以及
存储程序的存储器,所述程序包括指令,所述指令在由所述处理器执行时使所述处理器执行根据方面1-19中任一项所述的矫正方法。
方面24.一种存储程序的非暂态计算机可读存储介质,所述程序包括指令,所述指令在由电子设备的处理器执行时,致使所述电子设备执行根据方面1-19中任一项所述的矫正方法。

Claims (25)

  1. 一种文本图像的矫正方法,包括:
    获取双目相机倾斜拍摄得到的包括待识别对象的初始图像,所述双目相机包括第一相机和第二相机,所述第一相机和第二相机两者的光轴与所述待识别对象的放置面不垂直,所述初始图像包括所述第一相机倾斜拍摄得到的包括待识别对象的第一初始图像和所述第二相机倾斜拍摄得到的包括待识别对象的第二初始图像;
    确定所述待识别对象绕所述待识别对象上的一设定点旋转至与所述第一相机的光轴垂直所对应的旋转矩阵;
    基于所述第一初始图像和第二初始图像,确定包括所述待识别对象的3D图像;
    利用所述旋转矩阵,获取所述3D图像绕所述设定点旋转至与所述第一相机的光轴垂直所得到的旋转矫正图像;以及
    对所述旋转矫正图像进行展平矫正,得到最终矫正图像。
  2. 如权利要求1所述的矫正方法,其中,确定所述旋转矩阵包括:
    基于所述第一初始图像和所述第二初始图像,以及所述双目相机的内参,确定所述待识别对象的至少两条3D准线;
    基于所述至少两条3D准线和所述设定点,确定所述旋转矩阵。
  3. 如权利要求2所述的矫正方法,其中,确定至少两条3D准线包括:
    确定所述第一初始图像中的至少两条第一像素带;
    确定所述第二初始图像中与所述至少两条第一像素带各自位置对应的至少两条第二像素带;
    基于位置对应的所述第一像素带和所述第二像素带,以及所述双目相机的内参,确定位置对应的所述第一像素带和所述第二像素带相应的深度信息;以及
    基于位置对应的所述第一像素带和所述第二像素带以及相应的深度信息,确定所述3D准线。
  4. 如权利要求3所述的矫正方法,其中,所述3D准线为基于位置对应的第一像素带的中轴线及其深度信息和第二像素带的中轴线及其深度信息而确定。
  5. 如权利要求3所述的矫正方法,其中,所述第一像素带的宽度和所述第二像素带的宽度均小于所述第一初始图像宽度的一半。
  6. 如权利要求5所述的矫正方法,其中,所述至少两条第一像素带在所述第一初始图像的中轴线的两侧均有分布。
  7. 如权利要求2所述的矫正方法,其中,基于所述至少两条3D准线以及所述设定点,确定所述旋转矩阵包括:
    计算所述至少两条3D准线中的每一条的平均深度;以及
    基于所述至少两条3D准线的平均深度和所述设定点,确定所述旋转矩阵。
  8. 如权利要求7所述的矫正方法,其中,所述设定点为与所述第一相机的光轴平行的直线和所述待识别对象的交点,并且所述第一相机和第二相机的光心的连线的中点位于与所述第一相机的光轴平行的所述直线上。
  9. 如权利要求2-6中任一项所述的矫正方法,其中,确定包括所述待识别对象的3D图像包括:
    基于所述至少两条3D准线,确定多条曲面直母线;以及
    至少基于所述多条曲面直母线和所述至少两条3D准线,确定多个第一曲面采样点以及每一第一曲面采样点的三维坐标,
    其中,所述3D图像由所述多个第一曲面采样点来表示。
  10. 如权利要求9所述的矫正方法,其中,获取所述旋转矫正图像包括:
    基于所述多个第一曲面采样点中的每一第一曲面采样点的深度、所述设定点的深度,利用所述旋转矩阵确定所述多个第一曲面采样点各自相对所述设定点旋转后所得到的多个第二曲面采样点,
    其中,所述旋转矫正图像由所述多个第二曲面采样点来表示。
  11. 如权利要求9所述的矫正方法,其中,所述至少两条3D准线包括第一3D准线和第二3D准线,
    其中,确定多条曲面直母线包括:
    对第一3D准线进行采样,得到n个第一离散点;
    对第二3D准线进行采样,得到N个第二离散点,其中,n和N为正整数,并且N>n;
    针对n个第一离散点中的每一个,根据预设规则从N个第二离散点中确定其中一个第二离散点为该第一离散点的最优对应离散点,其中,第一3D准线的通过该第一离散点的法向量和第二3D准线的通过该最优对应离散点的法向量一致;以及
    基于n个第一离散点以及相应的最优对应离散点,确定所述多条曲面直母线。
  12. 如权利要求11所述的矫正方法,其中,根据预设规则从N个第二离散点中确定其中一个第二离散点为该第一离散点的最优对应离散点包括:
    确定所述第二3D准线的与该第一离散点对应的最优对应范围,所述最优对应范围包括所述N个第二离散点中的至少一个第二离散点;
    计算该第一离散点与相应最优对应范围中的每一个第二离散点之间的代价函数;
    基于代价函数,从相应的最优对应范围中确定其中一个第二离散点为该第一离散点对应的最优对应离散点。
  13. 如权利要求1所述的矫正方法,其中,对所述旋转矫正图像进行展平矫正,得到最终矫正图像包括:
    将所述旋转矫正图像正投影至所述第一相机的像平面上,得到映射图像;以及
    对所述映射图像进行插值,得到最终矫正图像。
  14. 如权利要求13所述的矫正方法,其中,对所述映射图像进行插值,得到最终矫正图像包括:
    针对所述映射图像,计算沿预设方向相邻的两个像素之间的3D距离;以及
    基于所述3D距离,沿所述预设方向对所述映射图像进行插值,得到所述最终矫正图像。
  15. 如权利要求2所述的矫正方法,其中,从所述待识别对象的一侧边到相对的另一侧边,所述待识别对象的弯曲形状大致相同。
  16. 如权利要求15所述的矫正方法,其中,所述双目相机设置在所述待识别对象的所述一侧边所在的一侧。
  17. 如权利要求2所述的矫正方法,其中,所述至少两条3D准线沿所述待识别对象的弯曲方向延伸。
  18. 如权利要求1所述的矫正方法,其中,所述待识别对象包括文本区域。
  19. 如权利要求1所述的矫正方法,其中,所述第一相机和所述第二相机的光轴为平行设置。
  20. 一种电子电路,包括:
    被配置为执行根据权利要求1-19中任一项所述的矫正方法的步骤的电路。
  21. 一种文本图像的矫正装置,包括:
    双目相机,被配置为倾斜拍摄包括待识别对象的初始图像,所述双目相机包括第一相机和第二相机,所述第一相机和第二相机两者的光轴与所述待识别对象的放置面不垂直,所述初始图像包括待识别对象的第一初始图像和包括待识别对象的第二初始图像,所述第一相机被配置为倾斜拍摄所述第一初始图像,所述第二相机被配置为倾斜拍摄所述第二初始图像;以及
    如权利要求20所述的电子电路。
  22. 如权利要求21所述的矫正装置,还包括:
    平板,被配置为放置所述待识别对象;
    固定装配在所述平板上的支架,
    其中,所述双目相机固定装配在所述支架上。
  23. 一种电子设备,包括:
    处理器;以及
    存储程序的存储器,所述程序包括指令,所述指令在由所述处理器执行时使所述处理器执行根据权利要求1-19中任一项所述的矫正方法。
  24. 一种存储程序的非暂态计算机可读存储介质,所述程序包括指令,所述指令在由电子设备的处理器执行时,致使所述电子设备执行根据权利要求1-19中任一项所述的矫正方法。
  25. 一种计算机程序产品,包括计算机程序,其中,所述计算机程序在被处理器执行时实现根据权利要求1-19中任一项所述的矫正方法。
PCT/CN2021/135748 2020-12-09 2021-12-06 文本图像的矫正方法及装置、设备和介质 WO2022121842A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011451692.3 2020-12-09
CN202011451692.3A CN112560867B (zh) 2020-12-09 2020-12-09 文本图像的矫正方法及装置、设备和介质

Publications (1)

Publication Number Publication Date
WO2022121842A1 true WO2022121842A1 (zh) 2022-06-16

Family

ID=75061707

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/135748 WO2022121842A1 (zh) 2020-12-09 2021-12-06 文本图像的矫正方法及装置、设备和介质

Country Status (2)

Country Link
CN (1) CN112560867B (zh)
WO (1) WO2022121842A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115760620A (zh) * 2022-11-18 2023-03-07 荣耀终端有限公司 一种文档矫正方法、装置及电子设备

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112560867B (zh) * 2020-12-09 2023-11-21 上海肇观电子科技有限公司 文本图像的矫正方法及装置、设备和介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110193860A1 (en) * 2010-02-09 2011-08-11 Samsung Electronics Co., Ltd. Method and Apparatus for Converting an Overlay Area into a 3D Image
CN102592124A (zh) * 2011-01-13 2012-07-18 汉王科技股份有限公司 文本图像的几何校正方法、装置和双目立体视觉系统
CN102801894A (zh) * 2012-07-18 2012-11-28 天津大学 一种变形书页展平方法
CN107560543A (zh) * 2017-09-04 2018-01-09 华南理工大学 一种基于双目立体视觉的摄像机光轴偏移校正装置与方法
CN111353961A (zh) * 2020-03-12 2020-06-30 上海合合信息科技发展有限公司 一种文档曲面校正方法及装置
CN112560867A (zh) * 2020-12-09 2021-03-26 上海肇观电子科技有限公司 文本图像的矫正方法及装置、设备和介质

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111340737B (zh) * 2020-03-23 2023-08-18 北京迈格威科技有限公司 图像矫正方法、装置和电子系统

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110193860A1 (en) * 2010-02-09 2011-08-11 Samsung Electronics Co., Ltd. Method and Apparatus for Converting an Overlay Area into a 3D Image
CN102592124A (zh) * 2011-01-13 2012-07-18 汉王科技股份有限公司 文本图像的几何校正方法、装置和双目立体视觉系统
CN102801894A (zh) * 2012-07-18 2012-11-28 天津大学 一种变形书页展平方法
CN107560543A (zh) * 2017-09-04 2018-01-09 华南理工大学 一种基于双目立体视觉的摄像机光轴偏移校正装置与方法
CN111353961A (zh) * 2020-03-12 2020-06-30 上海合合信息科技发展有限公司 一种文档曲面校正方法及装置
CN112560867A (zh) * 2020-12-09 2021-03-26 上海肇观电子科技有限公司 文本图像的矫正方法及装置、设备和介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115760620A (zh) * 2022-11-18 2023-03-07 荣耀终端有限公司 一种文档矫正方法、装置及电子设备
CN115760620B (zh) * 2022-11-18 2023-10-20 荣耀终端有限公司 一种文档矫正方法、装置及电子设备

Also Published As

Publication number Publication date
CN112560867A (zh) 2021-03-26
CN112560867B (zh) 2023-11-21

Similar Documents

Publication Publication Date Title
WO2020233378A1 (zh) 版面分析方法、阅读辅助设备、电路和介质
CN109816011B (zh) 视频关键帧提取方法
WO2022121842A1 (zh) 文本图像的矫正方法及装置、设备和介质
US9589333B2 (en) Image correction apparatus for correcting distortion of an image
JP7132654B2 (ja) レイアウト解析方法、読取り支援デバイス、回路および媒体
US11475546B2 (en) Method for optimal body or face protection with adaptive dewarping based on context segmentation layers
JP7441917B2 (ja) 顔に対する射影歪み補正
Jung et al. Robust upright adjustment of 360 spherical panoramas
JP6311372B2 (ja) 画像処理装置および画像処理方法
WO2022121843A1 (zh) 文本图像的矫正方法及装置、设备和介质
JP5256974B2 (ja) 画像処理装置、画像処理方法、及びプログラム
CN111145153B (zh) 图像处理方法、电路、视障辅助设备、电子设备及介质
JP7110899B2 (ja) 画像処理装置、画像処理方法、及び画像処理プログラム
US11367296B2 (en) Layout analysis
WO2021096503A1 (en) Foreshortening-distortion correction on faces
CN112861735A (zh) 文本图像的识别方法及装置、设备和介质
JP2022137198A (ja) 画像処理装置、画像処理方法、及び画像処理プログラム
JP2022077221A (ja) 画像処理装置、画像処理システム、画像処理方法、およびプログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21902547

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21902547

Country of ref document: EP

Kind code of ref document: A1