WO2022012121A1 - 版面分析方法、阅读辅助设备、电路和介质 - Google Patents
版面分析方法、阅读辅助设备、电路和介质 Download PDFInfo
- Publication number
- WO2022012121A1 WO2022012121A1 PCT/CN2021/092338 CN2021092338W WO2022012121A1 WO 2022012121 A1 WO2022012121 A1 WO 2022012121A1 CN 2021092338 W CN2021092338 W CN 2021092338W WO 2022012121 A1 WO2022012121 A1 WO 2022012121A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- layout
- rectangular blocks
- text
- connected regions
- response
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/414—Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/418—Document matching, e.g. of document images
Definitions
- the present disclosure relates to the field of data processing, and in particular, to a layout analysis method, a chip circuit, a reading aid, an electronic device, and a computer-readable storage medium.
- a layout analysis method comprising: acquiring coordinate information of a plurality of text lines in an image; creating a layout model of the image according to the coordinate information; analyzing the layout structure of the text lines based on the layout model ; and determining the order of the lines of text relative to each other based on the layout structure.
- a chip circuit comprising: a circuit unit configured to perform the method according to the embodiment of the present disclosure.
- a reading aid comprising: a chip circuit as previously described; and an image sensor configured to acquire the image.
- an electronic device comprising: a processor; and a memory storing a program including instructions that, when executed by the processor, cause the processor to perform the methods described in the present disclosure.
- a computer-readable storage medium storing a program comprising instructions that, when executed by a processor of an electronic device, cause the electronic device to perform the methods described in the present disclosure.
- FIG. 1 is a schematic diagram illustrating an exemplary application scenario to which various methods described herein may be applied, according to an exemplary embodiment
- FIG. 2 is a flow chart illustrating an exemplary method that can be used in the application scenario of FIG. 1 to recognize text in an image and voice the recognized text;
- FIG. 3 is a flowchart illustrating a layout analysis method according to an exemplary embodiment
- FIG. 4 is a schematic diagram illustrating an image including a text area according to an exemplary embodiment
- FIG. 5 is a schematic diagram illustrating a layout model created for the image shown in FIG. 4 according to an exemplary embodiment
- FIG. 6 is a flowchart illustrating a method of analyzing the layout structure of a text line according to an exemplary embodiment
- FIG. 7 is a schematic diagram illustrating a layout model obtained by adjusting the width of the rectangular blocks in FIG. 5 to form a plurality of connected regions, according to an exemplary embodiment
- FIG. 8 is a flowchart illustrating an example process of analyzing the spatial layout of a plurality of connected regions in the method of FIG. 6;
- FIG. 9 is a flowchart illustrating an example process for selectively correcting the orientation of a plurality of connected regions in the method of FIG. 8;
- FIG. 10 is a schematic diagram illustrating a layout model obtained by performing an angle correction on the layout model shown in FIG. 7 according to an exemplary embodiment
- FIG. 11 is a flowchart illustrating an example process for selectively removing connected regions directly adjacent to either side of a layout model in the method of FIG. 8;
- FIG. 12 is a schematic diagram illustrating a vertical projection of the layout model shown in FIG. 10 according to an exemplary embodiment
- Fig. 13 is a schematic diagram showing a layout model obtained after removing connected regions representing incomplete pages from the layout model shown in Fig. 10 according to the projection result of Fig. 12;
- FIG. 14-17 are schematic diagrams illustrating projection segmentation of the layout model shown in FIG. 13, respectively, according to an exemplary embodiment
- FIG. 18 is a schematic diagram illustrating a layout model including a resulting set of partitioned zones according to an exemplary embodiment
- Fig. 19 is a schematic diagram showing that the layout model shown in Fig. 18 is adjusted to the original inclined state and the division zones are sorted in the reading order;
- Figure 20 is a schematic diagram illustrating the matching and sorting of a plurality of connected regions and segmentation zones according to an exemplary embodiment
- FIG. 21 is a schematic diagram illustrating sorting of text lines in the image of FIG. 4 according to layout analysis results, according to an exemplary embodiment
- 22 is a flowchart illustrating an example process for determining a line master layout type according to an example embodiment
- 23 is a flowchart illustrating an example process for selectively discarding sub-layout type text, according to an example embodiment
- FIG. 24 is a block diagram illustrating a structure of a reading aid according to an exemplary embodiment.
- 25 is a block diagram illustrating an example computing device that can be applied to example embodiments.
- first, second, etc. to describe various elements is not intended to limit the positional relationship, timing relationship or importance relationship of these elements, and such terms are only used for Distinguish one element from another.
- first element and the second element may refer to the same instance of the element, while in some cases they may refer to different instances based on the context of the description.
- the spatially relative terms “horizontal” and “vertical” are used in conjunction with layout models.
- horizontal direction refers to the row direction of the layout model
- vertical direction refers to the column direction of the layout model
- top refers to the layout model
- bottom refers to the left direction of the layout model
- top refers to the layout model
- top refers to the layout model
- left refers to the layout model
- right are also used in conjunction with the layout model.
- “Up”, “Down”, “Left”, “Right” indicate when the reading (eg, book or magazine) is properly oriented for reading relative to the reader, from the image sensor (eg, worn or held by the reader) The orientation on the image of the reading (or equivalently, the layout model of the image) as viewed from the viewing angle. Therefore, the “up and down direction” basically corresponds to the column direction of the layout model, and the “left and right direction” basically corresponds to the row direction of the layout model.
- the following description of the present disclosure is mainly based on the case where the lines of text extend in a substantially left-right direction relative to the reader (ie, horizontal version reading), but the technical solutions of the present disclosure are not limited thereto.
- the technical solutions of the present disclosure are also applicable to the case where the lines of characters extend in a substantially downward direction relative to the reader (ie, vertical reading materials), that is, the method of the present disclosure is also applicable to vertical reading materials.
- the character behavior is a sequence of characters extending substantially in the left-right direction (horizontal)
- the character behavior is substantially a sequence of characters extending in the up-down direction (vertical direction).
- the reading assistance device When reading books such as books or magazines, people with normal vision visually capture images in their field of vision, identify text areas in the images through their brains, and read the text in the text areas in the order in which they are read. However, for the visually impaired, it may be necessary to rely on reading aids to recognize and broadcast the text of the reading material. In this case, the reading assistance device not only needs to perform text recognition on the text in the image, but also judge the order of the text lines in the text area through a certain algorithm, so as to be able to "read” the text in the reading in the correct reading order. Word.
- FIG. 1 is a schematic diagram illustrating an exemplary application scenario 100 to which various methods described herein may be applied, according to an exemplary embodiment.
- the exemplary scenario 100 may include, but is not limited to, applications such as reading assistance for the blind, intelligent reading aloud, and the like.
- Reading aids such as smart glasses 110
- the text recognition device recognizes and broadcasts the text within its shooting range 112 through its built-in chip and algorithm.
- FIG. 2 is a flowchart illustrating an exemplary method 200 that may be used in application scenario 100 to recognize text in an image and to voice the recognized text.
- the method 200 includes the following steps: collecting an image and detecting a text line area in the image (step 210); performing layout analysis on the text line in the image (step 220); and performing a layout analysis on the text in the text line Recognize and broadcast the recognized text according to the result of the layout analysis (step 230).
- the detection of text regions (step 210 ) and the recognition of text (step 230 ) can be accomplished by various methods, including, for example, traditional image processing algorithms (eg, MSER) and/or deep learning methods.
- MSER traditional image processing algorithms
- FIG. 3 is a flowchart illustrating a layout analysis method 300 according to an exemplary embodiment of the present disclosure.
- the layout analysis method 300 may be used to implement step 220 in FIG. 2 .
- the layout analysis method 300 includes the following steps: acquiring coordinate information of a plurality of text lines in an image (step 310 ), creating a layout model of the image according to the coordinate information (step 320 ), and analyzing the text lines based on the layout model (step 330 ) and determine the order of the lines of text relative to each other based on the layout structure (step 340 ).
- the layout analysis method 300 does not operate on the basis of the original image, nor does it need to perform semantic analysis, but converts the image area containing text into the text distribution in the simulated image but with a simpler structure
- the layout model is then used to analyze the spatial layout of the data in the layout model.
- step 310 coordinate information of multiple lines of characters in the image is acquired.
- the image may be electronic image data acquired by an image sensor.
- the image sensor may be disposed on a user's wearable device or an item such as glasses, for example, in the application scenario 100 shown in FIG. 1 .
- FIG. 4 is a schematic diagram illustrating an image 400 including a text area, according to an exemplary embodiment.
- the image 400 may include text (which may include characters, numbers, characters, punctuation marks, etc. of various countries and regions), pictures, etc., and a text line 410 containing text is shown.
- image 400 may be a pre-processed image, which may include, but is not limited to, color correction, blur removal, and the like, for example.
- the detection of text regions can be achieved by various methods, such as image processing algorithms (such as MSER) or deep learning methods.
- image processing algorithms such as MSER
- the coordinate information of each text line in the image 400 can be obtained.
- the coordinate information of the text line can be obtained, for example, from other machines (such as a remote server or a cloud computing device), or can be obtained through a local detection algorithm.
- the obtained coordinate information of the text line may be stored in a local storage device or storage medium for subsequent use.
- text line refers to a continuous line of text, which may be, for example, a sequence of text with adjacent text spacing in the left-right direction less than a threshold spacing, or a sequence of text with adjacent text spacing in the vertical direction less than a threshold spacing.
- the coordinate information of a text line may be a rectangle containing the text line (eg, the smallest circumscribing rectangle containing the text line, or expanding the smallest circumscribing rectangle containing the text line up, down, left and/or right)
- the coordinate information of the rectangle obtained after a certain multiple).
- the coordinate information of the character line may include, for example, the coordinate information of the four vertices of the rectangle, and the coordinate information of the character row may also include the coordinate information of any vertex of the rectangle and the height information and length information of the rectangle.
- the definition of the coordinate information of the text line is not limited to this, as long as it can represent the spatial position and size occupied by the text line.
- a layout model of the image is created according to the coordinate information.
- the term "layout model” refers to a data structure that transforms an image containing text into a data structure that simulates the distribution of text in the image but with a simpler structure.
- the layout model is obtained by filling data elements in the data structure corresponding to the obtained coordinate information with data values.
- the data structure may include a plurality of data elements, the data elements populated with data values forming a plurality of rectangular blocks corresponding to respective lines of text of the plurality of lines of text.
- the data structure may be a file in memory (eg, memory, cache, etc.), or an image expressed in pixels, or may be a table or data array.
- the data structure is not limited to any specific data structure, as long as the data therein can simulate the lines of characters in the image.
- the size of the data structure can be the same as the size of the image, or it can have a scaled size relative to the size of the image. For example, if the image has a pixel size of 3840x2160, the data structure (and corresponding layout model) may be the same size as the image (ie, with 3840x2160 matrix elements).
- the data structure can be scaled only in the horizontal direction (e.g., with 1920x2160 matrix elements), in the vertical direction only (e.g., with 3840x1080 matrix elements), or in Scaling both horizontally and vertically (eg, with 1920x1080 matrix elements, or with 1280x1080 matrix elements), and so on.
- the data elements in the data structure can establish a correspondence or mapping relationship with pixels in the image.
- FIG. 5 is a schematic diagram illustrating a layout model 500 created for the image 400 in FIG. 4 according to an exemplary embodiment.
- the data structure is populated with corresponding data values such that the data elements populated with the data values form a rectangular block 510 corresponding to the text line 410 in FIG. 4 .
- the size of the layout model 500 is the same as the size of the image 400 .
- a rectangular block formed by a data element filled with a data value indicates the presence of text in its corresponding image area, regardless of the semantics or content of the text.
- the data structure may comprise a two-dimensional matrix, eg, a two-dimensional blank matrix.
- a two-dimensional blank matrix refers to a two-dimensional matrix in which the data values of the matrix elements are all "0" by default.
- the matrix elements of the two-dimensional matrix corresponding to the text row coordinate information in the image 400 may be filled with a data value of "1".
- the data value is not limited to this, as long as it can distinguish whether there is a text or a text line in the area.
- the data element corresponding to the textual line coordinate information in image 400 may be filled with a data value of "255".
- step 330 the layout structure of the text line is analyzed based on the layout model.
- FIG. 6 is a flowchart illustrating a process for implementing step 330, according to an exemplary embodiment. As shown in FIG. 6, the process includes: selectively adjusting the widths of the plurality of rectangular blocks (step 610); and analyzing the spatial layout of the plurality of connected regions (step 620).
- step 610 the widths of the plurality of rectangular blocks are selectively adjusted so that the plurality of rectangular blocks are merged into a plurality of connected regions separated from each other.
- FIG. 7 is a schematic diagram illustrating a layout model 700 obtained by adjusting the width of the rectangular block 510 in FIG. 5 to form a plurality of connected regions 710, according to an exemplary embodiment.
- the resulting plurality of connected regions 710 correspond to the plurality of paragraphs of the text line. Therefore, the operation of step 610 may be referred to as paragraph division.
- the widths of the plurality of rectangular blocks are selectively adjusted. For each rectangular block, if the width of the rectangular block is less than or equal to the representative width of the plurality of rectangular blocks, the width of the rectangular block is increased by a first amount. If the width of the rectangular block is greater than the representative width and less than or equal to a first multiple of the representative width, the width of the rectangular block is increased by a second amount. If the width of the rectangular block is greater than a first multiple of the representative width and less than or equal to a second multiple of the representative width, the width of the rectangular block is not adjusted. If the width of the rectangular block is greater than a second multiple of the representative width, reducing the width of the rectangular block by a third amount.
- the representative width may be an average width of a subset of the plurality of rectangular blocks, and the subset of the plurality of rectangular blocks is determined by the plurality of rectangular blocks. Consists of rectangular blocks other than those whose width is greater than the threshold width percentile.
- the rectangular blocks with larger width are filtered out of the plurality of rectangular blocks in the layout model, and then the average of the remaining rectangular blocks is calculated. width as the representative width.
- those rectangular blocks whose width is greater than the threshold width percentile do not participate in the calculation of the average width, instead of removing the part of the rectangular blocks.
- a threshold width percentile of 90%, 95%, etc. can be set, and the specific value can be set according to the actual application, which is not specifically limited here. This can prevent the rectangular block that is too wide from affecting the accuracy of paragraph division, for example, merging a paragraph that should be divided into two paragraphs into a single paragraph.
- the length direction of the rectangular block corresponding to the line of text is the direction of substantially left-right extension, and the rectangular block
- the width direction is the direction substantially perpendicular to the substantially left-right extending direction (that is, the direction extending substantially downward); and for the case where the text line extends in a substantially downward direction relative to the reader (ie vertical reading), the corresponding text line
- the length direction of the rectangular block is a direction extending substantially downward
- the width direction of the rectangular block is a direction substantially perpendicular to the substantially downward extending direction (ie, a direction extending substantially left and right).
- the representative width is the height of the font in the vertical direction (i.e., the line height); and for the text line extending in a substantially downward direction relative to the reader ( In the case of vertical reading materials), the representative width is the height of the font in the left-right direction (ie, the column width).
- the representative width may also be the average width of the above-mentioned plurality of rectangular blocks. This can simplify the calculation of paragraph division, which may be appropriate in some cases (eg, if the size of the heading text line is close to the size of the body text line).
- the first amount may comprise 0.5 times.
- the width of the rectangular block is increased by 0.5 times the width of the rectangular block at both ends in the width direction.
- the coordinates of the four vertices of the rectangular block are each increased or decreased by a value of 0.5 times the width of the rectangular block in the width direction thereof. It should be understood that the specific value of the first quantity can be specifically set according to the actual application, which is not specifically limited herein.
- the first multiple may include 1.5 times.
- Increasing the width of the rectangular block by the second amount includes increasing the width of the rectangular block by 0.5 times the representative width at both ends in the width direction. It should be understood that the specific numerical values of the first multiple and the second quantity can be specifically set according to practical applications, which are not specifically limited herein.
- the second multiple may include 2 times. Decreasing the width of the rectangular block by a third amount includes decreasing the width of the rectangular block by 0.5 times the representative width at both ends in the width direction. It should be understood that the specific values of the second multiple and the third quantity can be specifically set according to practical applications, which are not specifically limited herein.
- step 620 the spatial layout of the plurality of connected regions is analyzed.
- FIG. 8 is a flowchart illustrating an example process for implementing step 620 .
- analyzing the spatial layout of the plurality of connected regions may include: selectively correcting or not correcting the orientation of the plurality of connected regions in the layout model (step 810 ); Connecting regions that are directly adjacent to either side of the two sides of the layout model to obtain each selected connected region (step 820); and perform projection segmentation on each of the selected connected regions to obtain a set of segmentation zones and the order of the partitions relative to each other (step 830).
- step 810 the orientation of the plurality of connected regions in the layout model is selectively corrected or not.
- selectively correcting or not correcting the orientation of the plurality of connected regions in the layout model may include determining whether the plurality of connected regions is relative to any of a row direction and a column direction of the layout model. in a tilted state; and if it is determined that the plurality of communication regions are in a tilted state, rotating the plurality of communication regions by a correction angle so that the plurality of communication regions are not in a tilted state.
- the operation of correcting the tilt state is particularly advantageous for applications such as the application scenario 100 shown in FIG. 1 . In these applications, the reader usually holds a book or other reading material, and the text area in the image acquired by the image sensor is often inclined.
- the accuracy of layout analysis can be greatly improved.
- the analysis object is usually a flat image scanned by a scanner, for example, in which the text area is not inclined. Therefore, such conventional techniques may not be suitable for reading-assisted scenarios.
- determining whether the plurality of connected regions are in an inclined state with respect to any one of the row direction and the column direction of the layout model may be implemented through the following process. First, a specific connected region is searched among the plurality of connected regions, wherein the smallest circumscribed rectangle of the specific connected region has the largest area among the smallest circumscribed rectangles of the plurality of connected regions. Then, it is determined whether one side of the smallest circumscribed rectangle of the specific connected region is parallel to any one of the row direction and the column direction. If it is determined that the side of the smallest circumscribed rectangle of the specific connected region is not parallel to any one of the row direction and the column direction, it is determined that the plurality of connected regions are in an inclined state. If it is determined that the side of the smallest circumscribed rectangle of the specific connected region is parallel to any one of the row direction and the column direction, it is determined that the plurality of connected regions are not in an inclined state.
- FIG. 9 is a flowchart illustrating an example process of selectively correcting the orientation of a plurality of connected regions in the method of FIG. 8 .
- step 910 among the plurality of connected regions obtained after selectively adjusting the widths of the plurality of rectangular blocks, a specific connected region whose smallest circumscribed rectangle has the largest area is determined. If the side of the minimum circumscribed rectangle of the specific connected region is not parallel to the row direction or the column direction (step 920, "No"), rotate the plurality of connected regions by a correction angle so that the side of the minimum circumscribed rectangle of the specific connected region is parallel in the row direction or the column direction (step 930); otherwise (step 920, "Yes"), no correction processing is performed.
- FIG. 10 is a schematic diagram illustrating a layout model obtained by angularly correcting the layout model 700 shown in FIG. 7 according to an exemplary embodiment.
- the minimum circumscribed rectangles of the plurality of connected regions are all rotated around their centroids (ie, center points) at the same angle and direction, so that the One side of the minimum enclosing rectangle of a particular connected region is parallel to the row or column direction.
- the Hough transform method may be applied to the minimum circumscribed rectangle of a specific connected region to perform tilt angle detection, so as to obtain the tilt angle of the specific connected region, and when the tilt angle is greater than or equal to a preset first tilt angle
- the threshold value for example, 5°
- the inclination correction of the plurality of connected regions is performed, and the rotation directions and angles of the plurality of connected regions during the correction process are recorded.
- tilt correction method described above is merely exemplary, and in other embodiments, any other suitable correction method may be employed.
- step 820 the connected areas in the layout model that are directly adjacent to either side of the two sides of the layout model in the row direction are selectively removed or not removed to obtain each selected connected area.
- the plurality of connected regions are not in an inclined state, vertical projection segmentation is performed on the layout model. Then, depending on the result of the vertical projection segmentation, the connected regions directly adjacent to either of the two sides of the layout model in the row direction are selectively removed or not removed from the plurality of connected regions, thereby obtaining each selected connected area.
- the phrase "a connected area is directly adjacent to a side of the layout model" means that there are no other connected areas between the connected area and the side of the layout model.
- FIG. 11 is a flowchart illustrating an example process for selectively removing connected regions directly adjacent to either side of the layout model in the method of FIG. 8 .
- connected regions representing paragraphs in incomplete pages are filtered out of the layout model.
- vertical projection segmentation is first performed on the layout model (step 1110). It is determined whether at least two zones are segmented from the layout model by the vertical projection segmentation (step 1120 ), wherein the at least two zones contain the plurality of connected regions. If it is determined that at least two zones have not been segmented from the layout model (step 1120, "NO"), the removal is not performed (step 1180).
- step 1130 determines the corresponding effective size of the at least two zones in the row direction (step 1130), and for the at least two zones in the for each side zone directly adjacent to either of the two sides of the layout model in the row direction, do the following: if two zones are split from the layout model (step 1140, "Yes"), and The effective size of the side zone in the row direction is less than a first threshold percentage of the largest of the corresponding effective sizes and is less than a second threshold percentage of the effective size in the row direction of the other of the two zones (step 1150, "Yes"), remove the connected area in the side zone (step 1170), otherwise do not remove the connected area in the side area (step 1180); and if more than two areas are segmented from the layout model (step 1140, "NO"), and the effective size of the side zone in the row direction is less than a third threshold percentage of the largest of the corresponding effective sizes and less than the The fourth threshold percentage of the effective size of the
- a zone is directly adjacent to a side of the layout model means that there are no other zones between the zone and the side of the layout model.
- the effective size of a zone in the row direction refers to the row direction size of the connected regions in the zone, eg, the row direction size of the smallest circumscribed rectangle of these connected regions. In some embodiments, the effective size of a zone in the row direction may be the average of the row direction dimensions of all connected regions in the zone.
- the folded or incomplete pages of magazines, books and other reading materials can be filtered, so as to prevent the text lines in the incomplete pages from being recognized and broadcast by text in the subsequent process, resulting in confusion of the reading content .
- This can greatly improve the accuracy of the layout analysis, thereby improving the user experience.
- the first threshold percentage is less than the second threshold percentage, and the third threshold percentage is equal to the fourth threshold percentage.
- the first threshold percentage is 60%
- the second threshold percentage is 70%
- the third threshold percentage is 70%
- the fourth threshold percentage is 70%. It should be understood that the specific values of the first threshold percentage, the second threshold percentage, the third threshold percentage and the fourth threshold percentage can be specifically set according to actual applications, and are not specifically limited herein.
- Horizontal projection segmentation and “vertical projection segmentation” are themselves known text segmentation techniques.
- Horizontal projection segmentation involves searching a two-dimensional image for pixel rows that satisfy a predetermined condition as a horizontal dividing line.
- pixel rows may be pixel rows whose sum of pixel values is equal to zero.
- Vertical projection segmentation involves searching a two-dimensional image for pixel columns that satisfy predetermined conditions as vertical dividing lines.
- a pixel column may be a pixel column whose sum of pixel values is equal to zero.
- the data structure of the layout model may be in the form of a two-dimensional matrix, and the pixel values are data values of matrix elements of the two-dimensional matrix.
- FIG. 12 is a schematic diagram illustrating vertical projection of the layout model shown in FIG. 10 according to an exemplary embodiment.
- FIG. 12 shows a waveform 1210 indicating the sum of the data values of the data elements of each data column, a connecting line 1220 indicating the peaks and troughs connecting the waveform 1210, and a vertical dividing line 1230.
- the sum of the data values of the data elements is the minimum value (eg, zero), so the data column can be selected as the vertical dividing line.
- the sum of the data values of the data elements of each of them is also the minimum value, and thus any of these data columns can also be selected as the vertical dividing line. boundaries.
- FIG. 13 is a schematic diagram illustrating a layout model obtained by removing connected regions representing incomplete pages from the layout model shown in FIG. 10 according to the projection result of FIG. 12 . As shown in FIG. 13 , the connected regions on the far right in FIG. 12 representing paragraphs in incomplete pages have been removed.
- the text line before performing vertical projection segmentation on the layout model, the text line may be appropriately resized in the left and right directions, so as to improve the accuracy of removing incomplete pages.
- the length of the rectangular block For each rectangular block corresponding to a text line determined to be a horizontal type, the length of the rectangular block may be increased by several data elements at both ends in the length direction.
- the width of the rectangular block For each rectangular block corresponding to a text line determined to be a vertical type, the width of the rectangular block may be increased by several data elements at both ends in the width direction.
- the above-mentioned several data elements are, for example, 0.5 times the representative width or the like. It will be appreciated that for rectangular blocks, the length is generally greater than the width.
- the layout type may be the default type (eg, landscape by default).
- the user can also set the layout type by switching manually. For example, the user can change the default layout type to portrait.
- the accuracy of removing incomplete pages can be improved by appropriately adjusting the size of text lines in the left and right directions. This is because the resizing in the left-right direction makes it difficult for connected regions representing paragraphs located on the same page to be segmented from the layout model by vertical projection segmentation, thereby reducing the chance of being removed by mistake.
- each selected connected region is subjected to projective segmentation to obtain a set of segmentation zones and the order of the segmentation zones relative to each other.
- each selected connected region of the layout model is recursively and alternately performed with horizontal projection segmentation and vertical projection segmentation in order to segment from the layout model a A group of partitions is divided, and based on reading order rules, the order of the individual partitions in the set of partitions is determined relative to each other.
- recursively and alternately performing the horizontal projection segmentation and the vertical projection segmentation for each selected connected region may include performing the following operations cyclically: performing vertical projection segmentation on each of the horizontally segmented zones obtained by the horizontal projection segmentation Projective division, and performing horizontal projection division on each of the vertical division zones obtained by vertical projection division until each division zone cannot be divided by horizontal projection division and vertical projection division.
- the partitions that cannot be partitioned by horizontal projection and vertical projection form the set of partitions.
- the first projection division may be a horizontal projection division, or may be a vertical projection division.
- the present disclosure is not limited in this regard.
- recursion refers to a strategy for solving a large complex problem layer by layer into a smaller problem similar to the original problem.
- the recursive strategy can describe the repeated calculations required for the problem-solving process with only a small number of programs, which can greatly reduce the code amount of the program.
- performing vertical projection segmentation on each horizontal segmented zone obtained by the horizontal projection segmentation comprises: searching for a set of data columns in the horizontal segmented zone, wherein for each data column in the set of data columns, The sum of the data values of the data elements is in the range of zero to the first threshold.
- the first threshold value is greater than zero, for example, one time the representative width or the like. If the set of data columns is obtained by searching, a vertical dividing line for dividing the horizontal dividing zone is selected from the set of data columns, and the horizontal dividing zone is divided by the selected vertical dividing line to obtain vertical dividing zone.
- the sum of the data values of the data columns indicating the vertical dividing line is chosen to be in the range from zero to the first threshold, rather than being equal to zero. This is because the horizontal interval between paragraphs located on the same page is small, and selecting a larger sum of data values of the data column indicating the vertical dividing line can facilitate the correct execution of the vertical projection segmentation.
- performing the horizontal projection segmentation on each vertical segmented zone obtained by the vertical projection segmentation comprises: searching the vertical segmented zone for a set of data rows, wherein for each data row in the set of data rows, The sum of the data values of the matrix elements is in the range of zero to the second threshold.
- the second threshold is greater than zero, for example, one representative width or the like. If the set of data lines is obtained by searching, a horizontal dividing line for dividing the vertical dividing zone is selected from the set of data lines, and the vertical dividing zone is divided by the selected horizontal dividing line to obtain a horizontal dividing line zone.
- the sum of the data values of the data column indicating the horizontal dividing line is chosen to be in the range from zero to the second threshold. This is because the vertical interval between paragraphs located on the same page is small, selecting a larger sum of data values of the data column indicating the horizontal dividing line can facilitate the correct execution of the horizontal projection segmentation.
- the set of segmented zones is obtained by segmenting the layout model according to the above-mentioned horizontal and vertical dividing lines for segmenting the layout model.
- FIG. 14-17 are schematic diagrams illustrating segmentation of the layout model shown in FIG. 13, respectively, according to an exemplary embodiment.
- a horizontal projection segmentation is performed, and the corresponding zones are not segmented by this projection.
- vertical projection segmentation is performed, and this segmentation process separates the rightmost zone of the layout model from other parts of the layout model.
- the remaining connected area in FIG. 15 is divided into a plurality of division zones at the upper left in this horizontal projection division process.
- determining the order of the respective partitions in the set of partitions relative to each other includes, in the cyclically performing operation, between horizontal partitions, between vertical partitions, and horizontal
- the hierarchical relationship between the split zone and the vertical split zone is recorded in the hierarchical tree data structure, wherein the leaf nodes in the hierarchical tree data structure represent the group of split zones; and traverse these leaf nodes according to the reading order rule, The order of traversing the leaf nodes represents the order of each partition in the group of partitions relative to each other.
- the leaf node may record the coordinate information of the corresponding zone, for example, the coordinate information of the dividing line between the zones or the coordinate information of the rectangle formed by the dividing line. These coordinate information reflects the positional relationship between different zones, so that in the process of traversing the leaf nodes, the sequence between different zones can be determined according to the reading order rule. Reading order rules will be described later.
- each partitioned partition is marked in the hierarchical tree data structure in reading order. For a segmented zone that can be further divided by horizontal projection division or vertical projection division, after the next division of the segmented zone, the segmented zone from the segmented zone is used as the The child nodes are marked in the hierarchical tree data structure until each partition zone cannot be divided by horizontal projection and vertical projection. At this time, the entire hierarchical tree data structure is marked.
- the reading order rule includes: if it is determined that the plurality of characters are of a landscape type, sorting the vertical division zones from left to right according to the positional relationship between the vertical division zones, and according to the horizontal division zones The positional relationship between them sorts the horizontal split zones from top to bottom. Alternatively, if it is determined that the plurality of characters are of a vertical type, the vertical division zones are sorted from right to left according to the positional relationship between the vertical division zones, and the horizontal division zones are sorted according to the positional relationship between the horizontal division zones. Zones are sorted from top to bottom.
- the layout analysis method according to the embodiment of the present disclosure can adapt to the horizontal and vertical versions, thereby improving the The generality of the layout analysis method.
- analyzing the spatial layout of the plurality of connected regions may further include, after performing the projection segmentation on each selected connected region: determining whether each selected connected region has been rotated by a correction angle; and if it is determined that each selected connected region has been rotated by a correction angle; The fixed connected region has been rotated by the correction angle, so that the set of divided zones is reversely rotated by the correction angle.
- Fig. 19 is a schematic diagram showing that the layout model shown in Fig. 18 is adjusted to the original inclined state and the division zones are sorted according to the reading order, wherein numbers 0 to 8 represent the number and reading order of the division zones.
- step 340 the order of the lines of text relative to each other is determined based on the layout structure.
- determining the order of the lines of text relative to each other based on the layout structure may include: determining each of the selected connected regions according to the relative positions of the selected connected regions with respect to each of the partitioned regions in the set of partitioned regions and the corresponding relationship between the respective division zones, wherein each division zone contains a corresponding set of selected connected regions; according to the positional relationship between the selected connected regions in the corresponding set of selected connected regions, Sort the selected connected regions in the corresponding set of selected connected regions; sort the rectangular blocks in each selected connected region according to the positional relationship between the rectangular blocks in each selected connected region; and according to the correspondence between the plurality of character lines and the plurality of rectangular blocks, matching the plurality of character lines with the rectangular blocks in each of the selected connected regions.
- each selected connected region determines in which partition each selected connected region is located by determining the relative position of the center or centroid of each selected connected region with respect to each partition in the set of partitions Inside. For example, a selected connected region can be determined to lie within a partition if its center or centroid falls within that partition. In these examples, the selected connected regions within the partition may be ordered based on their center or centroid location.
- sorting the selected connected regions in the corresponding set of selected connected regions may include: if it is determined that the plurality of characters are of a landscape type, selecting selected connected regions in the corresponding set of selected connected regions may include: The predetermined connected regions are sorted from top to bottom; and if it is determined that the plurality of texts are of a vertical type, the selected connected regions in the corresponding set of selected connected regions are sorted from right to left.
- FIG. 20 is a schematic diagram illustrating the matching and sorting of multiple connected regions with segmentation zones according to an exemplary embodiment.
- the connected regions 0-5 are respectively matched with the corresponding segmented zones 0-5 shown in Fig. 19
- the connected regions 6-8 are matched with the segmented zone 6 shown in Fig. 19, and the connected region 9 Matches the partition 7 shown in FIG. 19
- the connected regions 10 - 11 match the partition 8 shown in FIG. 19 .
- the rectangular blocks in each connected region can be sorted.
- ordering the rectangular blocks in each of the selected connected regions includes: if it is determined that the plurality of characters are of a landscape type, ordering the rectangular blocks in each of the selected connected regions from top to bottom; and If the multiple text lines are determined to be of the vertical type, the rectangular blocks in each selected connected area are sorted from right to left.
- the coordinate information of the character line in the image matches the coordinate information of the rectangular block in the layout model.
- the coordinate information of the character line in the image may also be correspondingly reversely scaled with respect to the coordinate information of the rectangular block in the layout model.
- FIG. 21 is a schematic diagram illustrating sorting of text lines in an image 400 according to layout analysis results, according to an exemplary embodiment.
- text lines 0-5 are respectively in the corresponding connected areas 0-5 shown in FIG. 20
- text lines 6-26 are in the connected area 6 shown in FIG. 20
- text lines 27-35 are in
- the text line 36 is in the connected area 8 shown in FIG. 20
- the text lines 37-66 are in the connected area 9 shown in FIG.
- the lines of characters 93-105 are within the connected area 11 shown in FIG.
- the step 620 of analyzing the spatial layout of the plurality of connected regions in FIG. 6 may further include, before recursively and alternately performing horizontal projection segmentation and vertical projection segmentation on each selected connected region, performing the following operations: If it is determined that the plurality of characters are of a landscape type, the length of each rectangular block in the selected connected regions is reduced by several data elements at both ends in the length direction; and if it is determined that the plurality of characters are of a vertical type plate type, so that the width of each rectangular block in each of the selected connected regions is reduced by several data elements at both ends in the width direction.
- the image background color between paragraphs can be eliminated by resizing the rectangular blocks corresponding to the corresponding text lines in the left and right directions and other interference to improve the accuracy of segmentation.
- the layout type of the text line is determined to be horizontal or vertical by default (which can be switched manually) during the layout analysis process.
- some additional embodiments of the present disclosure will be described in which the layout type of a line of text is automatically identified.
- Automatic recognition of layout types can provide several advantages. For example, the order of the lines of text relative to each other can be correctly determined based on the automatically recognized layout type without manual switching by the user. This further allows for some useful functionality where the image includes both lines of text of a major layout type (eg, landscape) and lines of text of a secondary layout type (eg, portrait).
- the layout analysis can be performed on the main layout type text line first, and then the layout analysis can be performed on the secondary layout type text line, so that the main layout type text line can be recognized and voice broadcast first. This can improve the experience for users of reading aids, as the main layout type of text is often the first thing users want to know.
- identifying the main layout type of the plurality of text lines includes one item selected from the group consisting of a landscape type and a portrait type.
- identifying the main layout type of the multiple text lines may include: determining respective geometric parameters of the multiple rectangular blocks according to coordinate information of the multiple text lines in the image; and determining, based on the respective geometric parameters of the multiple rectangular blocks, determining Master layout type for multiple lines of text.
- the coordinates of the rectangular block 510 in the layout model 500 are the same as the coordinates of the corresponding text line 410 in the image 400, and the corresponding text can be directly obtained from the The coordinates of row 410 (eg, four vertex coordinates) determine the geometric parameters of the rectangular block 510 .
- the geometric parameter includes at least one of a length direction, a length, a width direction, and a width of each of the plurality of rectangular blocks 510 .
- the length direction is the direction extending substantially left and right
- the width direction is the direction substantially perpendicular to the direction extending substantially left and right (that is, the substantially downward extending direction). direction); and for the case where the lines of text extend in a substantially downward direction relative to the reader (i.e. vertical reading), the length direction is the direction extending substantially downward, and the width direction is a direction substantially perpendicular to the direction extending substantially downward (i.e. basically the direction in which it extends left and right).
- the text arrangement direction of the text row 410 corresponding to the rectangular block 510 is determined, so as to determine whether the layout type of the text row 410 is landscape or portrait.
- the layout type of the text line 410 can be obtained by determining the length direction of the rectangular block 510 corresponding to the text line 410 . For example, if the rectangular block 510 extends in the left-right direction, the corresponding text line 410 is a horizontal version, and if the rectangular block 510 extends in the vertical direction, the corresponding text line 410 is a vertical version. In the text area of the entire image 400, if the proportion of text lines 410 of a certain layout type (horizontal or vertical) exceeds a predetermined threshold, the layout type is the main layout type.
- the judgment rule for the main layout type is that if the ratio of the total area of the rectangular blocks corresponding to the vertical text line to the total area of all rectangular blocks is greater than or equal to a predetermined threshold, the main layout type is vertical, otherwise The main layout type is horizontal.
- a subset of a plurality of rectangular blocks is determined, and the subset of the plurality of rectangular blocks is composed of rectangular blocks that satisfy the following conditions among the plurality of rectangular blocks: the length direction of each rectangular block is the same as that of the layout model.
- the angle between the column directions is less than the threshold angle.
- the threshold angle may be, for example, 10°, 20°, or 30°, etc., but is not limited to these examples, and can be specifically set according to practical applications.
- a subset of a plurality of elements may include some or all of the plurality of elements, ie, a subset may be a "complete set,” a "proper subset,” or an "empty set.”
- a subset may be a "complete set,” a "proper subset,” or an "empty set.”
- all the rectangular blocks in the plurality of rectangular blocks satisfy the above conditions.
- the "proper subset” some of the plurality of rectangular blocks satisfy the above conditions.
- none of the plurality of rectangular blocks satisfies the above conditions.
- step 2220 the total area of the subset of the plurality of rectangular blocks and the total area of the plurality of rectangular blocks are determined
- step 2230 the total area of the subset of the plurality of rectangular blocks and the plurality of rectangular blocks are determined. Whether the ratio of the total area of the blocks is less than the first threshold ratio. If the ratio of the total area of the subset of the plurality of rectangular blocks to the total area of the plurality of rectangular blocks is less than the first threshold ratio (step 2230, "Yes"), determine that the main layout type is the landscape type (step 2240) ; otherwise (step 2230, "NO"), determine that the main layout type is vertical (step 2250).
- the first threshold ratio may be 80%, but is not limited thereto, and may be specifically set according to practical applications.
- determination rules for the main layout type are merely exemplary, and in other embodiments, other determination rules may be adopted.
- analyzing the layout structure of the text line based on the layout model may further include analyzing the layout structure of the text line of the main layout type.
- rectangular blocks corresponding to unimportant text in the image may be selectively discarded prior to analyzing the layout structure of the text line based on the layout model.
- rectangular blocks of the secondary layout type are selectively removed or not removed from the plurality of rectangular blocks prior to analyzing the layout structure of the lines of text of the primary layout type, wherein the secondary layout type includes selected from landscape type and portrait layout Another item in the group of types.
- lines of text of the sub-layout type with a small area ratio may be considered unimportant text.
- the secondary layout types of the plurality of character lines may be determined based on respective geometric parameters of the plurality of rectangular blocks.
- the primary layout type may be one of landscape and portrait (eg, landscape)
- the secondary layout may be the other of landscape and portrait (eg , portrait type).
- the rectangular blocks of the sub-layout type are selectively removed or not removed from the plurality of rectangular blocks, so as to obtain the selected rectangular blocks.
- the term "remove” may refer to modifying the data value of a data element of a layout model to a default value (eg, zero). By discarding some unimportant text, it is possible to avoid interrupting the reading order of text on the main page during text recognition and broadcast, and improve user experience.
- step 23 is a flowchart illustrating an example process for selectively discarding rectangular blocks corresponding to unimportant text in an image.
- whether to remove the rectangular blocks of the sub-layout type may be determined by calculating the ratio of the rectangular blocks of the sub-layout type to the total area of the plurality of rectangular blocks.
- step 2310 determine the total area of the rectangular blocks of the sub-layout type and the total area of the plurality of rectangular blocks, that is, determine the total area of the rectangular blocks of the sub-layout type and the total area of all rectangular blocks in the layout model .
- step 2320 it is determined whether the ratio of the total area of the rectangular blocks of the sub-layout type to the total area of the plurality of rectangular blocks is less than a second threshold ratio. If it is determined that the ratio of the total area of the rectangular blocks of the sub-layout type to the total area of the plurality of rectangular blocks is less than the second threshold ratio (step 2320, "Yes"), then the sub-layout type is removed from the plurality of rectangular blocks the rectangular block (step 2330).
- the second threshold ratio can be set according to practical applications, for example, 3%, 5%, 7%, etc. The present disclosure is not limited in this respect.
- the layout of the text line of the sub-layout type may continue to be analyzed structure.
- the analysis of the layout structure of the text line of the sub-layout type is similar to the method of analysis described above with respect to FIGS. 6 to 21 and is not repeated here for the sake of brevity.
- the text data recognized in each text line can be converted into sound data according to the text line sorting results in combination with the text recognition results, which can be used, for example, for audiobook-related applications and visually impaired assistive applications.
- the text line of the image includes horizontal and vertical versions, and the sub-layout type is not removed during the layout analysis
- the subsequent processing is performed to broadcast the text in combination with the text recognition result
- the main character can be identified and broadcasted first.
- the text in the text row of the layout type after the text in the text row of the main layout type is broadcast, the text in the text row of the sub layout type is identified and broadcasted.
- FIG. 24 is a block diagram illustrating a structure of a reading aid according to an exemplary embodiment of the present disclosure.
- the reading aid device 2400 includes: an image sensor 2410 (for example, it can be implemented as a camera, a camera, etc.), which is configured to acquire the aforementioned image (for example, the image can be a still image or a video image, and the image can contain text); and a chip circuit 2420 configured as a circuit unit that performs steps according to any of the foregoing methods.
- an image sensor 2410 for example, it can be implemented as a camera, a camera, etc.
- the image can be a still image or a video image, and the image can contain text
- a chip circuit 2420 configured as a circuit unit that performs steps according to any of the foregoing methods.
- circuitry may refer to a portion of or including an application specific integrated circuit (ASIC), an electronic circuit, a processor (shared, dedicated, or group) executing one or more software or firmware programs ) and/or memory (shared, dedicated or bank), combinational logic and/or other suitable hardware components to provide the described functionality.
- ASIC application specific integrated circuit
- a circuit or functionality associated with a circuit may be implemented by one or more software or firmware modules.
- a circuit may include logic that is at least partially operable in hardware. The embodiments described herein may be implemented as a system using any suitably configured hardware and/or software.
- the chip circuit may further include a circuit unit configured to perform text recognition on the image to obtain text data, and a circuit configured to convert the text data in each text line into sound data according to the sorting result of the text lines unit.
- the circuit unit configured to perform text recognition on an image to obtain text data may, for example, utilize any text recognition (such as optical text recognition OCR) software or circuit, and be configured to sort the text data in each text line according to the text line sorting result.
- the circuit unit that converts the sound data can use any text-to-speech conversion software or circuit, for example. These circuit units can be implemented, for example, by ASIC chips or FPGA chips.
- the reading aid 2400 may also include a sound output device 2430 (eg, speakers, headphones, etc.) configured to output the sound data (ie, voice data).
- One aspect of the present disclosure may include an electronic device that may include a processor; and a memory storing a program including instructions that, when executed by the processor, cause the processor to perform the foregoing any method.
- the program may further include instructions for converting textual data in line-by-line into sound data according to the line-by-line sorting results when executed by the processor.
- such an electronic device may be, for example, a reading aid.
- such an electronic device may be another device (eg, a cell phone, computer, server, etc.) in communication with the reading aid.
- this electronic device is another device that communicates with the reading aid
- the reading aid may send the captured image to the other device, the other device performs any of the aforementioned methods, and then transfers the method's
- the processing results (such as layout analysis results, text recognition results, and/or voice data converted from text data, etc.) are returned to the reading assistance device, and the reading assistance device performs subsequent processing (for example, playing the voice data to user).
- the reading aid may be implemented as a wearable device, such as a device that can be worn in the form of glasses, a head-mounted device (eg a helmet or hat, etc.), an ear-worn device equipment, accessories that can be attached to eyeglasses (eg, frames, temples, etc.), accessories that can be attached to hats, and the like.
- a wearable device such as a device that can be worn in the form of glasses, a head-mounted device (eg a helmet or hat, etc.), an ear-worn device equipment, accessories that can be attached to eyeglasses (eg, frames, temples, etc.), accessories that can be attached to hats, and the like.
- the reading aid device With the aid of the reading aid device, visually impaired users can "read” regular reading materials (such as books, magazines, etc.) by adopting a similar reading posture as a normal-sighted reader.
- the reading aid device automatically performs layout analysis on the captured layout image according to the method in the foregoing embodiment to sort the text lines, and sequentially converts the text in the text lines into sounds according to the order of the text lines, It is sent out through output devices such as speakers or headphones for the user to listen to.
- One aspect of the present disclosure may include a computer-readable storage medium storing a program comprising instructions that, when executed by a processor of an electronic device, cause the electronic device to perform any of the aforementioned methods.
- a computing device 2500 will now be described, which is an example of a hardware device that may be applied to various aspects of the present disclosure.
- Computing device 2500 may be any machine configured to perform processing and/or computation, which may be, but is not limited to, workstations, servers, desktop computers, laptop computers, tablet computers, personal digital assistants, smart phones, in-vehicle computers, wearables equipment or any combination thereof.
- the reading aids or electronic devices described above may also be implemented in whole or at least in part by computing device 2500 or a similar device or system.
- Computing device 2500 may include elements connected to or in communication with bus 2502 (possibly via one or more interfaces).
- computing device 2500 may include a bus 2502, one or more processors 2504 (which may be used to implement processors or chip circuits included in the aforementioned reading aids), one or more input devices 2506, and one or more Output device 2508.
- the one or more processors 2504 may be any type of processor, and may include, but are not limited to, one or more general-purpose processors and/or one or more special-purpose processors (eg, special processing chips).
- Input device 2506 may be any type of device capable of inputting information to computing device 2500, and may include, but is not limited to, sensors (eg, the aforementioned sensors that capture images), mice, keyboards, touch screens, microphones, and/or remote controls.
- Output device 2508 may be any type of device capable of presenting information, and may include, but is not limited to, displays, speakers (eg, output devices that may be used to implement the output of sound data described above), video/audio output terminals, vibrators, and/or or printer.
- Computing device 2500 may also include or be connected to storage device 2510, which may be non-transitory and may enable data storage (eg, any storage device that may be used to implement the computer-readable storage medium described above) , and may include, but are not limited to, magnetic disk drives, optical storage devices, solid state memory, floppy disks, flexible disks, hard disks, magnetic tapes or any other magnetic media, optical disks or any other optical media, ROM (read only memory), RAM (random access) memory), cache memory, and/or any other memory chip or cartridge, and/or any other medium from which a computer can read data, instructions, and/or code.
- Storage device 2510 is detachable from the interface.
- the storage device 2510 may have data/programs (including instructions)/code for implementing the above-described methods and steps.
- Computing device 2500 may also include communication device 2512.
- Communication device 2512 may be any type of device or system that enables communication with external devices and/or with a network, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication devices, and/or chipsets, such as Bluetooth devices , 1302.11 devices, WiFi devices, WiMax devices, cellular communication devices and/or the like.
- Computing device 2500 may also include working memory 2514 (which may be used to implement the memory contained in the aforementioned reading aids), which may be any memory device that may store programs (including instructions) and/or data useful for the work of processor 2504. type of working memory, and may include, but is not limited to, random access memory and/or read only memory devices.
- Software elements may be located in working memory 2514, including, but not limited to, operating system 2516, one or more applications (ie, application programs) 2518, drivers, and/or other data and code. Instructions for performing the methods and steps described above may be included in one or more applications 2518.
- the executable or source code of the instructions for the software elements (programs) may be stored in a non-transitory computer-readable storage medium (such as the storage device 2510 described above), and may be stored in the working memory 2514 (possibly compiled) when executed. and/or installation).
- the executable code or source code of the instructions for the software elements (programs) may also be downloaded from remote locations.
- the working memory 2514 may store program code for executing the flowcharts of the present disclosure and/or images containing textual content to be recognized, wherein the application 2518 This may include optical character recognition applications provided by third parties (eg Adobe), speech conversion applications, editable word processing applications, and the like.
- Input device 2506 may be a sensor for acquiring images containing textual content. The stored image containing text content or the acquired image can be processed by the OCR application into an output result containing text.
- the output device 2508 is, for example, a speaker or an earphone for voice broadcast, wherein the processor 2504 is used for according to the working memory 2514. program code to perform method steps according to aspects of the present disclosure.
- custom hardware may also be used, and/or particular elements (eg, the chip circuits described above) may be implemented in hardware, software, firmware, middleware, microcode, hardware description languages, or any combination thereof.
- some or all of the disclosed methods and apparatus eg, the individual circuit elements in the chip circuits described above
- can be written in assembly language or hardware programming languages such as VERILOG, VHDL, C++, etc.
- programming hardware eg, programmable logic circuits including Field Programmable Gate Arrays (FPGAs) and/or Programmable Logic Arrays (PLAs)).
- computing device 2500 may be distributed over a network. For example, some processing may be performed using one processor, while other processing may be performed by another processor remote from the one processor. Other components of computing device 2500 may be similarly distributed. As such, computing device 2500 may be interpreted as a distributed computing system that performs processing in multiple locations.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Artificial Intelligence (AREA)
- Multimedia (AREA)
- Computer Graphics (AREA)
- Geometry (AREA)
- Character Input (AREA)
Abstract
提供一种版面分析方法、芯片电路、阅读辅助设备、电子设备以及计算机可读存储介质。该版面分析方法包括:获取图像中的多个文字行的坐标信息;根据所述坐标信息创建所述图像的版面模型;基于该版面模型分析所述文字行的版面结构;以及基于该版面结构确定文字行相对于彼此的顺序。
Description
本公开涉及数据处理领域,特别涉及一种版面分析方法、芯片电路、阅读辅助设备、电子设备以及计算机可读存储介质。
相关技术中存在对图像进行版面分析的技术,它利用图像处理、人工智能等技术完成印刷品图像文件中的文字域的分类和识别,以方便后续进行例如电子书的生成以及有声读物的生成等应用。已知的技术通常基于印刷品的原图进行版面分析,导致处理速度较慢。
在此部分中描述的方法不一定是之前已经设想到或采用的方法。除非另有指明,否则不应假定此部分中描述的任何方法仅因其包括在此部分中就被认为是现有技术。类似地,除非另有指明,否则此部分中提及的问题不应认为在任何现有技术中已被公认。
发明内容
根据本公开的一些实施例,提供了一种版面分析方法,包括:获取图像中的多个文字行的坐标信息;根据该坐标信息创建图像的版面模型;基于该版面模型分析文字行的版面结构;以及基于该版面结构确定文字行相对于彼此的顺序。
根据本公开的一些实施例,提供了一种芯片电路,包括:被配置为执行根据本公开实施例中所述的方法的电路单元。
根据本公开的一些实施例,提供了一种阅读辅助设备,包括:如前所述的芯片电路;以及图像传感器,被配置为获取所述图像。
根据本公开的一些实施例,提供了一种电子设备,包括:处理器;以及存储程序的存储器,该程序包括指令,指令在由处理器执行时使处理器执行本公开中所述的方法。
根据本公开的一些实施例,提供了一种存储程序的计算机可读存储介质,该程序包括指令,指令在由电子设备的处理器执行时,致使电子设备执行本公开中所述的方法。
根据在下文中所描述的实施例,本公开的这些和其它方面将是清楚明白的,并且将参考在下文中所描述的实施例而被阐明。
附图示例性地示出了实施例并且构成说明书的一部分,与说明书的文字描述一起用于讲解实施例的示例性实施方式。所示出的实施例仅出于例示的目的,并不限制权利要求的范围。在所有附图中,相同的附图标记指代类似但不一定相同的要素。
图1是示出根据示例性实施例的可以应用本文描述的各种方法的示例性应用场景的示意图;
图2是示出可以在图1的应用场景中用于识别图像中的文字并语音播报所识别的文字的示例性方法的流程图;
图3是示出根据示例性实施例的版面分析方法的流程图;
图4是示出根据示例性实施例的包含文字区域的图像的示意图;
图5是示出根据示例性实施例的为图4所示的图像创建的版面模型的示意图;
图6是示出根据示例性实施例的分析文字行的版面结构的方法的流程图;
图7是示出根据示例性实施例的通过将图5中的矩形块进行宽度调整以形成多个连通区域所得到的版面模型的示意图;
图8是示出图6的方法中分析多个连通区域的空间布局的示例过程的流程图;
图9是示出图8的方法中选择性校正多个连通区域的取向的示例过程的流程图;
图10是示出根据示例性实施例的通过将图7所示版面模型进行角度校正后得到的版面模型的示意图;
图11是示出图8的方法中选择性去除与版面模型两侧边中任一侧直接相邻的连通区域的示例过程的流程图;
图12是示出根据示例性实施例的对图10所示版面模型进行垂直投影的示意图;
图13是示出根据图12的投影结果从图10所示版面模型中去除代表不完整页面的连通区域后得到的版面模型的示意图;
图14-17分别是示出根据示例性实施例的对图13所示的版面模型进行投影分割的示意图;
图18是示出根据示例性实施例的包括最终得到的一组分割区带的版面模型的示意图;
图19是示出将图18所示版面模型调整为原倾斜状态并按阅读顺序对分割区带进行排序后的示意图;
图20是示出根据示例性实施例的将多个连通区域与分割区带进行匹配并排序后的示意图;
图21是示出根据示例性实施例的根据版面分析结果对图4的图像中的文字行进行排序的示意图;
图22是示出根据示例性实施例的确定文字行主版面类型的示例过程的流程图;
图23是示出根据示例性实施例的选择性地舍弃次版面类型文字的示例过程的流程图;
图24是示出根据示例性实施例的阅读辅助设备的结构框图;以及
图25是示出能够应用于示例性实施例的示例性计算设备的结构框图。
在本公开中,除非另有说明,否则使用术语“第一”、“第二”等来描述各种要素不意图限定这些要素的位置关系、时序关系或重要性关系,这种术语只是用于将一个元件与另一元件区分开。在一些示例中,第一要素和第二要素可以指向该要素的同一实例,而在某些情况下,基于上下文的描述,它们也可以指代不同实例。
在本公开中对各种所述示例的描述中所使用的术语只是为了描述特定示例的目的,而并非旨在进行限制。除非上下文另外明确地表明,如果不特意限定要素的数量,则该要素可以是一个也可以是多个。术语“基于”是指至少部分地基于。此外,本公开中所使用的术语“和/或”涵盖所列出的项目中的任何一个以及全部可能的组合方式。
在本公开中,空间相对术语“水平”和“垂直”与版面模型结合使用。在这样的上下文中,“水平方向”是指版面模型的行方向,而“垂直方向”是指版面模型的列方向。另外,空间相对术语“上”、“下”、“左”、“右”也与版面模型结合使用。“上”、“下”、“左”、“右”指示,当读物(例如,书籍或杂志)相对于读者被正确地取向以供阅读时,从图像传感器(例如由读者佩戴或握持)的视角所观察到的该读物的图像(或等同地,该图像的版面模型)上的方位。因此,“上下方向”基本上对应于版面模型的列方向,而“左右方向”基本上对应于版面模型的行方向。
对本公开的以下描述主要基于文字行相对于读者在基本左右方向延伸(即横版读物)的情况,但是本公开的技术方案不限于此。本公开的技术方案也适用于文字行相对于读者在基本上下方向延伸(即竖版读物)的情况,即本公开的方法也适用于竖版读物的情况。在横版的情况下,文字行为基本上左右方向(横向)延伸的文字序列,而在竖版的情况下,文字行为基本上上下方向(竖向)延伸的文字序列。
在阅读诸如书籍或杂志之类的读物时,视觉正常的人通过视觉捕获视野中的图像,通过大脑识别出图像中的文字区域并按照阅读顺序依次阅读该文字区域中的文字。然而, 对于视障人士而言,可能需要依赖于阅读辅助设备对读物进行文字的识别和播报。在这种情况下,阅读辅助设备不仅需要对图像中的文字进行文字识别,还要通过一定算法对文字区域中的文字行的顺序进行判断以能够以正确的阅读顺序“阅读”该读物中的文字。
图1是示出根据示例性实施例的可以应用本文描述的各种方法的示例性应用场景100的示意图。如图1所示,该示例性场景100可以包括但不限于盲人阅读辅助、智能朗读等应用。诸如智能眼镜110之类的阅读辅助设备配备有文字识别装置,通过该文字识别装置实现对读物116中的包含一个或多个文字行114的文字区域进行拍摄。文字识别装置通过其内置的芯片和算法对其拍摄范围112内的文字进行识别和播报。
图2是示出可以在应用场景100中用于识别图像中的文字并语音播报所识别的文字的示例性方法200的流程图。如图2所示,该方法200包括以下步骤:采集图像并检测图像中的文字行区域(步骤210);对图像中的文字行进行版面分析(步骤220);以及对文字行中的文字进行识别并根据版面分析的结果对识别的文字进行语音播报(步骤230)。文字区域的检测(步骤210)和文字的识别(步骤230)可以通过各种方法实现,包括例如传统的图像处理算法(例如MSER)和/或深度学习方法等。
为了使得本公开的主题更为清晰,下面详细描述如何对图像中的文字行进行版面分析(步骤220)。将理解的是,上面关于图1和图2描述的应用场景100和方法200仅仅是示例性的,意味着根据本公开实施例的版面分析方法不限于上面描述的应用。
图3是示出根据本公开的示例性实施例的版面分析方法300的流程图。版面分析方法300可以用于实现图2中的步骤220。如图3所示,该版面分析方法300包括以下步骤:获取图像中的多个文字行的坐标信息(步骤310),根据坐标信息创建图像的版面模型(步骤320),基于版面模型分析文字行的版面结构(步骤330)以及基于版面结构确定文字行相对于彼此的顺序(步骤340)。
如从下文的描述将更清楚的,版面分析方法300不是在原图像的基础上进行操作,也无需进行语义分析,而是将包含文字的图像区域转化为模拟图像中的文字分布但结构更为简单的版面模型,进而对该版面模型中的数据进行空间布局分析。
在步骤310中,获取图像中的多个文字行的坐标信息。
由于本公开的示例性方法主要基于文字的坐标信息而非文字的原图像本身进行版面分析,因此在此步骤中,获取图像中的多个文字行的坐标信息,供后续处理使用。该图像可以是通过图像传感器获取的电子图像数据。根据一些实施例,图像传感器可以设置于用户的可穿戴设备或眼镜等物品上,例如在图1中所示的应用场景100中。
图4是示出根据示例性实施例的包含文字区域的图像400的示意图。如图4所示,该图像400可包含文字(可以包括各种国家和地区的文字、数字、字符、标点符号等)、图片等内容,其中示出了包含文字的文字行410。根据一些实施例,图像400可以是经过了预处理的图像,所述预处理例如可以包括但不限于颜色校正、模糊去除等等。
如前所述,文字区域的检测可以通过各种方法实现,例如图像处理算法(例如MSER)或深度学习方法。通过检测图像400中的文字区域,可得到图像400中各个文字行的坐标信息。文字行的坐标信息例如可以从其他机器(例如远程服务器或云计算设备)获得,也可以通过本地的检测算法来获得。根据一些实施例,获得的文字行的坐标信息可以存储在本地存储设备或存储介质中以供后续使用。如本文使用的,术语文字行是指连续的一行文字,其例如可以是左右方向上相邻文字间距小于阈值间距的文字的序列,或者是上下方向上相邻文字间距小于阈值间距的文字的序列。
根据一些实施例,一个文字行的坐标信息可以是包含该文字行的矩形(例如包含该文字行的最小外接矩形,或者将包含该文字行的最小外接矩形向上、下、左和/或右膨胀一定倍数后得到的矩形)的坐标信息。文字行的坐标信息例如可以包括该矩形的四个顶点的坐标信息,该文字行的坐标信息也可包括该矩形的任一顶点的坐标信息以及该矩形的高度信息和长度信息。然而,文字行的坐标信息定义不限于此,只要其能够代表文字行占据的空间位置和尺寸即可。
返回参考图3,在步骤320中,根据坐标信息创建图像的版面模型。如本文使用的,术语“版面模型”是指将包含文字的图像转化为模拟图像中的文字分布但结构更为简单的数据结构。
根据一些实施例,通过对数据结构中与该获取的坐标信息对应的数据元素填充数据值得到版面模型。该数据结构可以包括多个数据元素,填充有数据值的数据元素形成多个矩形块,该多个矩形块对应于多个文字行中的相应文字行。
根据一些实施例,该数据结构可以是存储器(例如,内存、缓存等)中的一个文件,或者是用像素表达的图像,也可以是一个表格或者数据阵列。该数据结构不限于任何具体的数据结构,只要其中的数据能够对图像中的文字行进行模拟即可。该数据结构的尺寸可以与图像尺寸相同,也可以具有相对于图像尺寸按照比例缩放的尺寸。例如,如果图像具有3840×2160的像素尺寸,则该数据结构(以及相应的版面模型)可以与图像具有相同尺寸(即,具有3840×2160个矩阵元素)。可替换地,该数据结构可以仅在水平方向上进行缩放(例如,具有1920×2160个矩阵元素),可以仅在垂直方向上进行缩放 (例如,具有3840×1080个矩阵元素),也可以在水平方向和垂直方向两者上进行缩放(例如,具有1920×1080个矩阵元素,或具有1280×1080个矩阵元素)等等。无论数据结构的尺寸与图像尺寸相同或具有相对于图像尺寸按照比例缩放的尺寸,该数据结构中的数据元素均可以与图像中的像素建立对应或映射关系。
图5是示出根据示例性实施例的为图4中的图像400创建的版面模型500的示意图。如图5所示,在数据结构中填充相应的数据值以使得填充有数据值的数据元素形成与图4中的文字行410相对应的矩形块510。在该示例中,版面模型500的尺寸与图像400的尺寸相同。
由填充了数据值的数据元素所形成的矩形块表示与其相对应的图像区域中存在文字,而与文字的语义或内容无关。根据一些实施例,该数据结构可以包括二维矩阵,例如,二维空白矩阵。二维空白矩阵是指矩阵元素的数据值均默认为“0”的二维矩阵。在创建图像400的版面模型500时,可以对该二维矩阵的与图像400中的文字行坐标信息对应的矩阵元素填充数据值“1”。然而该数据值不限于此,只要能区分在该区域中是否存在文字或文字行即可。例如,对于采用8比特数据元素的数据结构,可以向与图像400中的文字行坐标信息对应的数据元素填充数据值“255”。
返回参考图3,在步骤330中,基于版面模型分析文字行的版面结构。通过基于该版面模型来分析文字行的版面结构,无需对原图像进行操作。因此,可快速地实现文字行的版面结构的分析,提高了版面分析的效率。
图6是示出根据示例性实施例的用于实现步骤330的过程的流程图。如图6所示,该过程包括:选择性地调整多个矩形块的宽度(步骤610);以及分析多个连通区域的空间布局(步骤620)。
在步骤610中,选择性地调整多个矩形块的宽度,以使得该多个矩形块被合并成彼此分离的多个连通区域。
图7是示出根据示例性实施例的通过将图5中的矩形块510进行宽度调整以形成多个连通区域710所得到的版面模型700的示意图。所得到的多个连通区域710对应于文字行的多个段落。因此,步骤610的操作可以称为段落划分。
根据一些实施例,选择性地调整多个矩形块的宽度。对于每个矩形块,如果该矩形块的宽度小于或等于该多个矩形块的代表性宽度,使该矩形块的宽度增大第一量。如果该矩形块的宽度大于该代表性宽度且小于或等于该代表性宽度的第一倍数,使该矩形块的宽度增大第二量。如果该矩形块的宽度大于该代表性宽度的第一倍数且小于或等于该 代表性宽度的第二倍数,不调整该矩形块的宽度。如果该矩形块的宽度大于该代表性宽度的第二倍数,使该矩形块的宽度减小第三量。
根据一些实施例,在上述选择性地调整多个矩形块的宽度的步骤中,该代表性宽度可以为多个矩形块的子集的平均宽度,多个矩形块的该子集由该多个矩形块中除宽度大于阈值宽度百分位数的那些矩形块之外的矩形块组成。在这样的实施例中,在版面模型的多个矩形块中先过滤掉那些宽度较大(对应文字行中的字体较大,例如标题行等)的矩形块后,再计算剩余矩形块的平均宽度以作为该代表性宽度。其中,使其矩形块宽度大于阈值宽度百分位数的那些矩形块不参与平均宽度的计算,而不是去除该部分矩形块。例如,可以设置阈值宽度百分位数90%、95%等等,其具体数值根据实际应用具体设置即可,在此不做具体限定。这可以避免宽度过大的矩形块影响段落划分的准确性,例如,将本该划分为两段的段落合并成单个段落。
将理解的是,在本上下文中,对于文字行相对于读者在基本左右方向延伸(即横版读物)的情况,与文字行对应的矩形块的长度方向为基本左右延伸的方向,并且矩形块的宽度方向为与该基本左右延伸的方向基本垂直的方向(即基本上下延伸的方向);而对于文字行相对于读者在基本上下方向延伸(即竖版读物)的情况,与文字行对应的矩形块的长度方向为基本上下延伸的方向,并且矩形块的宽度方向为与该基本上下延伸的方向基本垂直的方向(即基本左右延伸的方向)。因此,对于文字行相对于读者在基本左右方向延伸(即横版读物)的情况,该代表性宽度为字体上下方向的高度(即行高);而对于文字行相对于读者在基本上下方向延伸(即竖版读物)的情况,该代表性宽度为字体左右方向的高度(即列宽)。
根据一些实施例,该代表性宽度也可以为上述多个矩形块的平均宽度。这可以简化段落划分的计算量,在一些情况下(例如,如果标题文字行的尺寸与正文文字行的尺寸接近)可以是适用的。
根据一些实施例,该第一量可以包括0.5倍。使该矩形块的宽度在宽度方向上的两端处均增大该矩形块的宽度的0.5倍。在一些实施例中,将该矩形块的四个顶点的坐标在其宽度方向上各自增加或减小0.5倍矩形块宽度的数值。应当理解,该第一量的具体数值可以根据实际应用具体设置,在此不做具体限定。
根据一些实施例,该第一倍数可以包括1.5倍。使该矩形块的宽度增大第二量包括:使该矩形块的宽度在宽度方向上的两端处均增大该代表性宽度的0.5倍。应当理解,该第一倍数和第二量的具体数值可以根据实际应用具体设置,在此不做具体限定。
根据一些实施例,该第二倍数可以包括2倍。使该矩形块的宽度减小第三量包括:使该矩形块的宽度在宽度方向上的两端处均减小该代表性宽度的0.5倍。应当理解,该第二倍数和第三量的具体数值可以根据实际应用具体设置,在此不做具体限定。
返回参考图6,在步骤620中,分析多个连通区域的空间布局。
图8是示出用于实现步骤620的示例过程的流程图。根据一些实施例,分析多个连通区域的空间布局可以包括:选择性地校正或不校正多个连通区域在版面模型中的取向(步骤810);选择性地去除或不去除版面模型中在行方向上与版面模型的两侧边中任一侧边直接相邻的连通区域,以得到各选定连通区域(步骤820);以及对各选定连通区域进行投影分割,以得到一组分割区带和该分割区带相对于彼此的顺序(步骤830)。
在步骤810中,选择性地校正或不校正多个连通区域在版面模型中的取向。
根据一些实施例,选择性地校正或不校正所述多个连通区域在所述版面模型中的取向可以包括:确定该多个连通区域相对于版面模型的行方向和列方向中的任一个是否处于倾斜状态;以及如果确定该多个连通区域处于倾斜状态,旋转该多个连通区域一校正角度以使得该多个连通区域不处于倾斜状态。校正倾斜状态的操作对于诸如图1所示的应用场景100之类的应用而言尤其是有利的。在这些应用中,读者通常手持书本或其他读物,图像传感器获取的图像中的文字区域往往是倾斜的。通过将多个连通区域旋转该校正角度以使得该多个连通区域不处于倾斜状态,可以大大提高版面分析的准确性。这提供了相对于常规版面分析技术的优点。在常规版面分析技术中,分析对象通常是例如通过扫描仪扫描得到的平整图像,其中文字区域没有倾斜。因此,这样的常规技术可能不能适用于辅助阅读的场景。
根据一些实施例,确定该多个连通区域相对于版面模型的行方向和列方向中的任一个是否处于倾斜状态可以通过以下过程来实现。首先,在该多个连通区域中搜索特定连通区域,其中该特定连通区域的最小外接矩形在该多个连通区域的最小外接矩形中具有最大面积。然后,确定该特定连通区域的最小外接矩形的一边是否平行于该行方向和列方向中的任一个。如果确定该特定连通区域的最小外接矩形的所述边不平行于该行方向和列方向中的任一个,确定该多个连通区域处于倾斜状态。如果确定该特定连通区域的最小外接矩形的所述边平行于该行方向和列方向中的任一个,确定该多个连通区域不处于倾斜状态。
图9是示出图8的方法中选择性校正多个连通区域的取向的示例过程的流程图。如图9所示,在步骤910中,在选择性地调整多个矩形块的宽度后所得到的多个连通区域 中,确定其最小外接矩形具有最大面积的特定连通区域。如果该特定连通区域的最小外接矩形的一边不平行于行方向或列方向(步骤920,“否”),将多个连通区域旋转一校正角度以使得该特定连通区域的最小外接矩形的一边平行于行方向或列方向(步骤930);否则(步骤920,“是”)不做校正处理。
图10是示出根据示例性实施例的通过将图7所示版面模型700进行角度校正后得到的版面模型的示意图。根据一些实施例,在该多个连通区域不处于倾斜状态的情况下,将该多个连通区域的最小外接矩形均围绕其质心(即中心点)以相同的角度和方向进行旋转,以使得该特定连通区域的最小外接矩形的一条边平行于行方向或列方向。
根据一些实施例,可以对特定连通区域的最小外接矩形应用例如霍夫变换法进行倾斜角检测,以得到该特定连通区域的倾斜角,并在该倾斜角大于或等于预设的第一倾斜角阈值(例如5°)的情况下,才对该多个连通区域进行倾斜校正,并记录下在校正过程中该多个连通区域的旋转方向和角度。
将理解的是,上面描述的倾斜校正方法仅仅是示例性的,在其他实施例中,可以采用任何其他适当的校正方法。
返回参考图8,在步骤820中,选择性地去除或不去除版面模型中在行方向上与版面模型的两侧边中任一侧边直接相邻的连通区域,以得到各选定连通区域。
在一些情况下,还需要实现对杂志、书本等读物的被折叠或未拍摄完整的不完整页面的过滤,从而避免不完整页面中的文字行在后续过程中被文字识别和播报。针对这些情况,根据一些实施例,如果该多个连通区域不处于倾斜状态,则对该版面模型执行垂直投影分割。然后,取决于垂直投影分割的结果,从该多个连通区域中选择性地去除或不去除在行方向上与该版面模型的两侧中任一侧直接相邻的连通区域,从而得到各选定连通区域。如本文使用的,短语“连通区域与版面模型的一侧直接相邻”是指该连通区域与版面模型的该侧之间没有其他的连通区域。
图11是示出图8的方法中对选择性去除与版面模型两侧边中任一侧直接相邻的连通区域的示例过程的流程图。通过图11所示的过程,将代表不完整页面中的段落的连通区域从版面模型中过滤掉。如图11所示,首先对该版面模型执行垂直投影分割(步骤1110)。确定通过该垂直投影分割是否从该版面模型分割出至少两个区带(步骤1120),其中该至少两个区带包含所述多个连通区域。如果确定从该版面模型未分割出至少两个区带(步骤1120,“否”),则不执行该去除(步骤1180)。如果确定从该版面模型分割出至少两个区带(步骤1120,“是”),确定该至少两个区带在行方向上的相应有效尺寸(步 骤1130),并且对于该至少两个区带中在行方向上与该版面模型的两侧中任一侧直接相邻的每个侧边区带,执行以下操作:如果从该版面模型分割出两个区带(步骤1140,“是”),并且该侧边区带在行方向上的有效尺寸小于该相应有效尺寸中的最大尺寸的第一阈值百分比且小于该两个区带中另一区带在行方向上的有效尺寸的第二阈值百分比(步骤1150,“是”),去除该侧边区带中的连通区域(步骤1170),否则不去除该侧面区域中的连通区域(步骤1180);以及如果从该版面模型分割出多于两个区带(步骤1140,“否”),并且该侧边区带在行方向上的有效尺寸小于所述相应有效尺寸中的最大尺寸的第三阈值百分比且小于各区带中与该侧边区带直接相邻的区带在行方向上的有效尺寸的第四阈值百分比(步骤1160,“是”),去除该侧边区带中的连通区域(步骤1170),否则不去除该侧边区域中的连通区域(步骤1180)。
如本文使用的,短语“区带与版面模型的一侧边直接相邻”是指该区带与版面模型的该侧边之间没有其他的区带。
如本文使用的,区带在行方向上的有效尺寸是指该区带中的连通区域在行方向上的尺寸,例如这些连通区域的最小外接矩形在行方向上的尺寸。在一些实施例中,区带在行方向上的有效尺寸可以是该区带中的所有连通区域在行方向上的尺寸的平均值。
通过上述步骤,可以实现对杂志、书本等读物的被折叠或未拍摄完整的不完整页面的过滤,从而避免不完整页面中的文字行在后续过程中被文字识别和播报,导致阅读内容的混乱。这可以大大提升版面分析的准确性,从而改善用户体验。
根据一些实施例,该第一阈值百分比小于该第二阈值百分比,并且该第三阈值百分比等于该第四阈值百分比。
根据一些实施例,例如该第一阈值百分比为60%,该第二阈值百分比为70%;该第三阈值百分比为70%,该第四阈值百分比为70%。应当理解,该第一阈值百分比、第二阈值百分比、第三阈值百分比和第四阈值百分比的具体数值可以根据实际应用具体设置,在此不做具体限定。
应当理解的是,“水平投影分割”和“垂直投影分割”本身是已知的文本分割技术。水平投影分割涉及在二维图像中搜索满足预定条件的像素行作为水平分界线。在二值化图像的情况下,这样的像素行可以是像素值之和等于零的像素行。垂直投影分割涉及在二维图像中搜索满足预定条件的像素列作为垂直分界线。在二值化图像的情况下,这样的像素列可以是像素值之和等于零的像素列。在本公开的一些实施例中,版面模型的数据结构可以为二维矩阵形式,并且像素值为该二维矩阵的矩阵元素的数据值。
图12是示出根据示例性实施例的对图10所示版面模型进行垂直投影的示意图。为了便于直观地理解,图12示出了指示每个数据列的数据元素的数据值之和的波形1210、指示连接该波形1210的波峰和波谷的连线1220、以及垂直分界线1230。如图12所示,对于垂直分界线1230所对应的数据列,数据元素的数据值之和为最小值(例如,零),因此该数据列可以被选择作为垂直分界线。类似地,对于位于垂直分界线1230右侧的若干数据列,它们中的每一个的数据元素的数据值之和也为最小值,并且因此这些数据列中的任一个也可以被选择作为垂直分界线。
图13是示出根据图12的投影结果从图10所示版面模型中去除代表不完整页面的连通区域后得到的版面模型的示意图。如图13所示,位于图12中最右侧的代表不完整页面中段落的连通区域已经被去除。
根据一些实施例,在对版面模型执行垂直投影分割之前,可以适当地对文字行进行左右方向的尺寸调整,以便提高去除不完整页面的准确性。对于确定为横版类型的文字行所对应的每个矩形块,可以使该矩形块的长度在长度方向上的两端处均增大若干数据元素。对于确定为竖版类型的文字行所对应的每个矩形块,可以使该矩形块的宽度在宽度方向上的两端处均增大若干数据元素。上述若干数据元素例如为0.5倍代表性宽度等。将理解的是,对于矩形块而言,长度通常大于宽度。在一些示例中,版面类型可以是默认的类型(例如,默认为横版)。在另外的示例中,用户也可以通过手动切换来设置版面类型。例如,用户可以将默认的版面类型改变为竖版。在执行垂直投影分割以去除不完整页面之前,通过适当地对文字行进行左右方向的尺寸调整,能够提高去除不完整页面的准确性。这是因为左右方向的尺寸调整使得代表位于同一页的段落的连通区域难以通过垂直投影分割从版面模型中分割出来,从而减少被错误地去除的几率。
返回参考图8,在步骤830中,对各选定连通区域进行投影分割,以得到一组分割区带和该分割区带相对于彼此的顺序。
根据一些实施例,在对该版面模型的不完整页面进行过滤之后,对该版面模型的各选定连通区域递归地和交替地执行水平投影分割和垂直投影分割,以便从该版面模型分割出一组分割区带,并基于阅读顺序规则,确定该一组分割区带中的各个分割区带相对于彼此的顺序。
根据一些实施例,对该各选定连通区域递归地和交替地执行水平投影分割和垂直投影分割可以包括循环地执行以下操作:对通过水平投影分割得到的水平分割区带中的每一个执行垂直投影分割,以及对通过垂直投影分割得到的垂直分割区带中的每一个执行 水平投影分割,直至每个分割区带均无法通过水平投影分割和垂直投影分割进行分割。无法通过水平投影分割和垂直投影分割进行分割的分割区带形成该一组分割区带。
将理解的是,水平投影分割和垂直投影分割的顺序可以调换。也即,在上述循环操作中,第一次投影分割可以是水平投影分割,或者可以是垂直投影分割。本公开在此方面不受限制。还将理解的是,递归是指把一个大型复杂的问题层层转化为一个与原问题相似的规模较小的问题来求解的策略。在计算机编程的语境下,递归策略只需少量的程序就可描述出解题过程所需要的多次重复计算,可以大大地减少程序的代码量。
根据一些实施例,对通过水平投影分割得到的每个水平分割区带执行垂直投影分割包括:在该水平分割区带中搜索一组数据列,其中对于该组数据列中的每个数据列,数据元素的数据值之和处于零至第一阈值的范围中。该第一阈值大于零,例如为一倍代表性宽度等。如果通过搜索得到该一组数据列,从该一组数据列中选择用于分割该水平分割区带的垂直分界线,并利用所选择的垂直分界线分割该水平分割区带,以得到垂直分割区带。此处,指示垂直分界线的数据列的数据值之和被选择为处于零至第一阈值的范围中,而不是等于零。这是因为位于同一页的段落之间的水平间隔较小,选择较大的指示垂直分界线的数据列的数据值之和可以有利于垂直投影分割的正确执行。
根据一些实施例,对通过垂直投影分割得到的每个垂直分割区带执行水平投影分割包括:在该垂直分割区带中搜索一组数据行,其中对于该组数据行中的每个数据行,矩阵元素的数据值之和处于零至第二阈值的范围中。该第二阈值大于零,例如为一倍代表性宽度等。如果通过搜索得到该一组数据行,从该一组数据行中选择用于分割该垂直分割区带的水平分界线,并利用所选择的水平分界线分割该垂直分割区带,以得到水平分割区带。此处,指示水平分界线的数据列的数据值之和被选择为处于零至第二阈值的范围中,而不是等于零。这是因为位于同一页的段落之间的垂直间隔较小,选择较大的指示水平分界线的数据列的数据值之和可以有利于水平投影分割的正确执行。
根据一些实施例,根据上述用于分割该版面模型的水平分界线和垂直分界线,从版面模型中分割得到该一组分割区带。
图14-17分别是示出根据示例性实施例的对图13所示的版面模型进行分割的示意图。在该示例中,如图14所示,执行一次水平投影分割,此次投影并未分割出相应的区带。然后如图15所示,再执行垂直投影分割,此次分割过程将版面模型最右侧的区带与该版面模型的其他部分分割开。继续执行水平投影分割,对图15分割出的区带以及剩余连通区域分别进行水平投影分割,而上一步骤已分割出的区带无法再继续分割。如图16所示, 图15中剩余连通区域在此次水平投影分割过程中分割出左上方的多个分割区带。如图17所示,对上一步骤分割出的区带继续执行垂直分割,直到每个分割区带均无法通过水平投影分割和垂直投影分割进行分割。最后,全部分割出的一组分割区带形成如图18所示的版面模型。
根据一些实施例,确定该一组分割区带中的各个分割区带相对于彼此的顺序包括:在该循环地执行操作中,将水平分割区带之间、垂直分割区带之间、以及水平分割区带和垂直分割区带之间的等级关系记录在等级树数据结构中,其中该等级树数据结构中的叶子节点代表该一组分割区带;以及根据该阅读顺序规则遍历这些叶子节点,其中遍历叶子节点的顺序表示该一组分割区带中的各个分割区带相对于彼此的顺序。
在一些示例中,叶子节点可以记录相应区带的坐标信息,例如区带之间的分界线的坐标信息或由分界线所形成的矩形的坐标信息。这些坐标信息反映了不同区带之间的位置关系,使得在遍历叶子节点的过程中,能够根据阅读顺序规则来确定不同区带之间的顺序。阅读顺序规则将在稍后进行描述。
在一些示例中,在该循环执行操作中,将每次分割出的分割区带均按照阅读顺序将其标记在等级树数据结构中。对于能够再次通过水平投影分割或垂直投影分割继续进行分割的分割区带,在下一次对该分割区带进行分割后,从该分割区带分割出的分割区带按阅读顺序作为该分割区带的子节点标记在该等级树数据结构中,直到每个分割区带均无法通过水平投影分割和垂直投影分割进行分割,此时标记完整个等级树数据结构。
根据一些实施例,该阅读顺序规则包括:如果确定该多个文字行为横版类型,根据垂直分割区带之间的位置关系将垂直分割区带从左向右顺序排序,并且根据水平分割区带之间的位置关系将水平分割区带从上向下顺序排序。替换地,如果确定该多个文字行为竖版类型,根据垂直分割区带之间的位置关系将垂直分割区带从右向左顺序排序,并且根据水平分割区带之间的位置关系将水平分割区带从上向下顺序排序。
通过利用树形结构来保存分割区带之间的等级关系,并且通过阅读顺序规则确定叶子节点之间的排序,使得根据本公开实施例的版面分析方法可自适应横版和竖版,从而提高了版面分析方法的通用性。
根据一些实施例,分析该多个连通区域的空间布局还可以包括在对各选定连通区域进行投影分割之后:确定该各选定连通区域是否曾经被旋转一校正角度;以及如果确定该各选定连通区域曾经被旋转校正角度,使该一组分割区带反向旋转该校正角度。图19是示出将图18所示版面模型调整为原倾斜状态并按阅读顺序对分割区带进行排序后的示 意图,其中数字0至8代表分割区带的编号和阅读顺序。通过将版面模型调整为原倾斜状态,可以方便在后续处理中将原图中的文字行与版面模型中的矩形块进行匹配,提高处理速度。
现在返回参考图3,在步骤340中,基于版面结构确定文字行相对于彼此的顺序。
根据一些实施例,基于版面结构确定文字行相对于彼此的顺序可以包括:根据各选定连通区域相对于该一组分割区带中的各个分割区带的相对位置,确定该各选定连通区域与该各个分割区带之间的对应关系,其中每个分割区带包含相应的一组选定连通区域;根据该相应的一组选定连通区域中的选定连通区域之间的位置关系,对该相应的一组选定连通区域中的选定连通区域进行排序;根据每个选定连通区域中的矩形块之间的位置关系,对每个选定连通区域中的矩形块进行排序;以及根据该多个文字行与该多个矩形块之间的对应关系,将该多个文字行与该各选定连通区域中的矩形块相匹配。
在一些示例中,可以通过确定各选定连通区域的中心或质心相对于该一组分割区带中的各个分割区带的相对位置,来确定该各选定连通区域分别位于哪个分割区带之内。例如,如果某个选定连通区域的中心或质心落入某个分割区带内,则可以确定该选定连通区域位于该分割区带内。在这些示例中,基于分割区带内的各选定连通区域的中心或质心位置,可以对该分割区带内的这些选定连通区域进行排序。
根据一些实施例,对该相应的一组选定连通区域中的选定连通区域进行排序可以包括:如果确定该多个文字行为横版类型,将该相应的一组选定连通区域中的选定连通区域从上向下顺序排序;以及如果确定该多个文字行为竖版类型,将该相应的一组选定连通区域中的选定连通区域从右向左顺序排序。
图20是示出根据示例性实施例的将多个连通区域与分割区带进行匹配并排序后的示意图。如图20所示,连通区域0-5分别与图19所示的相应的分割区带0-5相匹配,连通区域6-8与图19所示的分割区带6相匹配,连通区域9与图19所示的分割区带7相匹配,以及连通区域10-11与图19所示的分割区带8相匹配。
在对连通区域进行排序之后,可以对每个连通区域中的矩形块进行排序。
根据一些实施例,对每个选定连通区域中的矩形块进行排序包括:如果确定该多个文字行为横版类型,将每个选定连通区域中的矩形块从上向下顺序排序;以及如果确定该多个文字行为竖版类型,将每个选定连通区域中的矩形块从右向左顺序排序。
在版面模型的尺寸与图像的尺寸相同的情况下,图像中的文字行的坐标信息与版面模型中的矩形块的坐标信息一致。在版面模型的尺寸与图像的尺寸相比有缩放的情况下, 图像中的文字行的坐标信息也相对于版面模型中的矩形块的坐标信息进行相应的反向缩放即可。由此,可以根据图像中的多个文字行与版面模型中的多个矩形块之间的对应关系,将图像中的多个文字行与各选定连通区域中的矩形块相匹配,从而实现对图像中的文字行的排序。
图21是示出示例性实施例的根据版面分析结果对图像400中的文字行进行排序的示意图。如图21所示,文字行0-5分别在图20所示的相应的连通区域0-5内,文字行6-26在图20所示的连通区域6内,文字行27-35在图20所示的连通区域7内,文字行36在图20所示的连通区域8内,文字行37-66在图20所示的连通区域9内,文字行67-92在图20所示的连通区域10内,以及文字行93-105在图20所示的连通区域11内。
根据一些实施例,图6中分析该多个连通区域的空间布局的步骤620还可以包括,在对各选定连通区域递归地和交替地执行水平投影分割和垂直投影分割之前,执行以下操作:如果确定该多个文字行为横版类型,使该各选定连通区域中的每个矩形块的长度在长度方向上的两端处均减小若干数据元素;以及如果确定该多个文字行为竖版类型,使该各选定连通区域中的每个矩形块的宽度在宽度方向上的两端处均减小若干数据元素。
在对各选定连通区域递归地和交替地执行水平投影分割和垂直投影分割之前,通过对与相应文字行相对应的矩形块进行左右方向上的尺寸调整,可以消除段落之间的图像底色等的干扰,提高分割的准确性。
上面已经描述了在版面分析过程中将文字行的版面类型默认确定为横版或竖版(可以通过手动切换)的实施例。在下文中,将描述本公开的一些附加实施例,其中文字行的版面类型被自动地识别。自动识别版面类型可以提供一些优点。例如,可以根据自动识别出的版面类型来正确地确定文字行相对于彼此的顺序,而无需用户的手动切换。在图像既包括主版面类型(例如,横版)的文字行又包括次版面类型(例如,竖版)的文字行的情况下,这进一步允许实现一些有用的功能。例如,可以先对主版面类型文字行进行版面分析,然后对次版面类型文字行进行版面分析,使得主版面类型文字行可以首先被识别并语音播报。这可以提高阅读辅助设备的用户的使用体验,因为主版面类型的文字行通常是用户希望首先了解的内容。
根据一些实施例,在基于版面模型分析文字行的版面结构之前:识别多个文字行的主版面类型。该主版面类型包括选自横版类型和竖版类型所组成的组中的一项。根据一些实施例,识别多个文字行的主版面类型可以包括:根据图像中的多个文字行的坐标信息确定多个矩形块各自的几何参数;以及基于多个矩形块各自的几何参数,确定多个文字行的主版面类型。
在一些示例中,返回参考图4和图5,可以根据图像400中的多个文字行410的坐标信息以及该多个文字行410与版面模型500中的多个矩形块510之间的对应关系,确定版面模型500中的各个矩形块510的几何参数。例如,在版面模型500的尺寸与图像400的尺寸相同的情况下,矩形块510在版面模型500中的坐标与对应的文字行410在图像400中的坐标相同,并且可以直接从该对应的文字行410的坐标(例如,四个顶点坐标)确定该矩形块510的几何参数。
根据一些实施例,该几何参数包括多个矩形块510各自的长度方向、长度、宽度方向和宽度中的至少一项。对于文字行相对于读者在基本左右方向延伸(即横版读物)的情况,该长度方向为基本左右延伸的方向,宽度方向为与该基本左右延伸的方向基本垂直的方向(即基本上下延伸的方向);而对于文字行相对于读者在基本上下方向延伸(即竖版读物)的情况,该长度方向为基本上下延伸的方向,宽度方向为与该基本上下延伸的方向基本垂直的方向(即基本左右延伸的方向)。
根据一些实施例,根据矩形块510各自的几何参数,确定与该矩形块510对应的文字行410的文字排列方向,以确定该文字行410的版面类型为横版还是竖版。在一些实施例中,文字行410的版面类型可以通过确定与该文字行410对应的矩形块510的长度方向而得到。例如,如果该矩形块510在左右方向上延伸,则对应的文字行410即为横版,而如果该矩形块510在上下方向上延伸,则对应的文字行410即为竖版。在整个图像400的文字区域中,如果某一版面类型(横版或竖版)的文字行410的占比超过预定阈值,则该版面类型即为主版面类型。
图22是示出基于多个矩形块各自的几何参数确定多个文字行的主版面类型的示例过程的流程图。在该示例中,主版面类型的判断规则为,如果对应于竖版文字行的矩形块的总面积与所有矩形块的总面积之比大于或等于预定阈值,则主版面类型为竖版,否则主版面类型为横版。
在步骤2210中,确定多个矩形块的子集,该多个矩形块的该子集由该多个矩形块中满足下述条件的矩形块组成:每个矩形块的长度方向与版面模型的列方向之间的夹角小于阈值角度。根据一些实施例,该阈值角度例如可以为10°、20°或30°等,但并不限于这些示例,可根据实际应用具体设置。如本文使用的,多个元素的子集可以包括该多个元素中的一些或全部,即子集可以是“全集”、“真子集”或“空集”。在“全集”的情况下,多个矩形块中所有的矩形块都满足上述条件。在“真子集”的情况下,多个 矩形块中一些矩形块满足上述条件。在“空集”的情况下,多个矩形块中没有矩形块满足上述条件。
在步骤2220中,确定该多个矩形块的子集的总面积以及该多个矩形块的总面积,并在步骤2230中,确定该多个矩形块的子集的总面积与该多个矩形块的总面积的比率是否小于第一阈值比率。如果该多个矩形块的子集的总面积与该多个矩形块的总面积的比率小于第一阈值比率(步骤2230,“是”),确定该主版面类型为横版类型(步骤2240);否则(步骤2230,“否”)确定该主版面类型为竖版类型(步骤2250)。根据一些实施例,该第一阈值比率可以为80%,但并不限于此,可根据实际应用具体设置。
将理解的是,上面描述的主版面类型的判断规则仅仅是示例性的,在其他实施例中,可以采用其他的判断规则。
根据一些实施例,基于版面模型分析文字行的版面结构还可以包括分析主版面类型的文字行的版面结构。根据一些实施例,在基于版面模型分析文字行的版面结构之前可以选择性地舍弃对应于图像中的不重要文字的矩形块。
根据一些实施例,在分析主版面类型的文字行的版面结构之前从多个矩形块中选择性地去除或不去除次版面类型的矩形块,其中次版面类型包括选自横版类型和竖版类型所组成的组中的另一项。
在一些示例中,可以将面积占比小的次版面类型的文字行认为是不重要文字。在这样的实施例中,在分析该主版面类型的文字行的版面结构之前,可以基于多个矩形块各自的几何参数,确定该多个文字行的次版面类型。如前所述,主版面类型可以是横版类型和竖版类型中的一项(例如,横版类型),那么该次版面类型可以是横版类型和竖版类型中的另一项(例如,竖版类型)。然后,从多个矩形块中选择性地去除或不去除该次版面类型的矩形块,从而得到该各选定矩形块。如本文使用的,术语“去除”可以是指将版面模型的数据元素的数据值修改为默认值(例如零)。通过舍弃掉一部分不重要的文字,能够实现在文字识别和播报时尽量不打断主要版面文字的阅读次序,提升用户体验。
图23是示出选择性地舍弃对应于图像中的不重要文字的矩形块的示例过程的流程图。如图23所示,可以通过计算该次版面类型的矩形块占该多个矩形块总面积的比率来确定是否去除该次版面类型的矩形块。首先,在步骤2310中,确定该次版面类型的矩形块的总面积和该多个矩形块的总面积,即确定次版面类型的矩形块的总面积和该版面模型中所有矩形块的总面积。然后,在步骤2320中,确定该次版面类型的矩形块的总面积与该 多个矩形块的总面积的比率是否小于第二阈值比率。如果确定该次版面类型的矩形块的总面积与该多个矩形块的总面积的比率小于第二阈值比率(步骤2320,“是”),则从该多个矩形块中去除该次版面类型的矩形块(步骤2330)。如果确定该次版面类型的矩形块的总面积与该多个矩形块的总面积的比率不小于第二阈值比率(步骤2320,“否”),则不从该多个矩形块中去除该次版面类型的矩形块(步骤2340)。根据一些实施例,该第二阈值比率可根据实际应用进行设置,例如为3%、5%、7%等,本公开在此方面不受限制。
在执行完该操作后,进入分析主版面类型的文字行的版面结构的步骤。主版面类型的文字行的版面结构的分析与上面关于图6至图21描述的分析的方法类似,并且为了简洁起见在此不再重复。
根据一些实施例,在分析该主版面类型的文字行的版面结构之后,如果该次版面类型的矩形块未从该多个矩形块中去除,则可以继续分析该次版面类型的文字行的版面结构。次版面类型的文字行的版面结构的分析与上面关于图6至图21描述的分析的方法类似,并且为了简洁起见在此不再重复。
以上已经结合附图描述了根据本公开的版面分析的示例性方法。在进行版面分析之后,还可以进行后续处理,例如可以结合文字识别结果,按照文字行排序结果而将逐个文字行中识别出的文字数据转换成声音数据,这可以用于例如与有声读物相关的应用以及视障辅助应用中。在该图像的文字行包括横版以及竖版并且该作为次版面类型在版面分析时也未被去除的情况下,在进行后续处理以结合文字识别结果进行文字播报时,可以先识别和播报主版面类型的文字行中的文字,在该主版面类型的文字行中文字播报完毕后,再去识别和播报该次版面类型的文字行中的文字。
图24是示出根据本公开的示例性实施例的阅读辅助设备的结构框图。如图24所示,所述阅读辅助设备2400包括:图像传感器2410(例如可实现为摄像头、照相机等),被配置为获取前述的图像(图像例如可以是静态图像或视频图像,图像中可包含文字);以及芯片电路2420,所述芯片电路被配置为执行根据前述任何方法的步骤的电路单元。
如本文所使用的,术语“电路”可指代以下电路的一部分或包括以下电路:专用集成电路(ASIC)、电子电路、执行一个或多个软件或固件程序的处理器(共用、专用或组)和/或存储器(共用、专用或组)、提供所述功能的组合逻辑电路和/或其它合适的硬件组件。在一些实施例中,可通过一个或多个软件或固件模块来实现电路或者与电路相 关联的功能。在一些实施例中,电路可包括在硬件中至少部分地可操作的逻辑。本文描述的实施例可实现为使用任何适当配置的硬件和/或软件的系统。
根据一些实施例,该芯片电路还可以包括被配置对图像进行文字识别以获得文字数据的电路单元,以及被配置为按照文字行排序结果而将逐个文字行中的文字数据转换成声音数据的电路单元。所述被配置对图像进行文字识别以获得文字数据的电路单元例如可以利用任何文字识别(例如光学文字识别OCR)软件或电路,被配置为按照文字行排序结果而将逐个文字行中的文字数据转换成声音数据的电路单元例如可以利用任何文字语音转换软件或电路。这些电路单元例如可通过ASIC芯片或FPGA芯片来实现。该阅读辅助设备2400还可以包括声音输出设备2430(例如扬声器、耳机等等),被配置为输出所述声音数据(即语音数据)。
本公开的一个方面可包括一种电子设备,该电子设备可包括处理器;以及存储程序的存储器,所述程序包括指令,所述指令在由所述处理器执行时使所述处理器执行前述任何方法。根据一些实施例,所述程序还可以包括在由所述处理器执行时按照文字行排序结果而将逐个文字行中的文字数据转换成声音数据的指令。根据一些实施例,这种电子设备例如可以是阅读辅助设备。根据一些实施例,这种电子设备可以是与阅读辅助设备进行通信的另一设备(例如手机、计算机、服务器等)。在这种电子设备是与阅读辅助设备进行通信的另一设备的情况下,阅读辅助设备可以将拍摄到的图像发送到所述另一设备,由另一设备执行前述任何方法,再将方法的处理结果(例如版面分析结果、文字识别结果、和/或将文字数据转换而成的声音数据等等)返回到阅读辅助设备,并由阅读辅助设备执行之后的处理(例如,将声音数据播放给用户)。
根据一些实施方式,所述阅读辅助设备可以被实施为可穿戴设备,例如可以被实施为可作为眼镜形式而被佩戴的设备、头戴式设备(例如头盔或帽子等)、可佩戴在耳朵上的设备、可附接到眼镜(例如眼镜架、眼镜腿等)上的配件、可附接到帽子上的配件等等。
借助该阅读辅助设备,视力障碍用户可以与视力正常读者一样,采用类似的阅读姿势即可实现对常规读物(例如书本、杂志等)的“阅读”。在“阅读”过程中,阅读辅助设备按照前述实施例中的方法自动对捕获的版面图像进行版面分析以对文字行进行排序,并依照文字行的顺序依次将文字行中的文字转化为声音,通过扬声器或耳机等输出装置发出供用户聆听。
本公开的一个方面可包括存储程序的计算机可读存储介质,所述程序包括指令,所述指令在由电子设备的处理器执行时,致使所述电子设备执行前述任何方法。参照图25,现将描述计算设备2500,其是可以应用于本公开的各方面的硬件设备的示例。计算设备2500可以是被配置为执行处理和/或计算的任何机器,可以是但不限于工作站、服务器、台式计算机、膝上型计算机、平板计算机、个人数字助理、智能电话、车载计算机、可穿戴设备或其任何组合。根据一些实施方式,上述的阅读辅助设备或电子设备也可以全部或至少部分地由计算设备2500或类似设备或系统实现。
计算设备2500可以包括(可能经由一个或多个接口)与总线2502连接或与总线2502通信的元件。例如,计算设备2500可以包括总线2502、一个或多个处理器2504(其可以用于实施前述的阅读辅助设备所包含的处理器或芯片电路)、一个或多个输入设备2506以及一个或多个输出设备2508。一个或多个处理器2504可以是任何类型的处理器,并且可以包括但不限于一个或多个通用处理器和/或一个或多个专用处理器(例如特殊处理芯片)。输入设备2506可以是能向计算设备2500输入信息的任何类型的设备,并且可以包括但不限于传感器(例如前文所述的获取图像的传感器)、鼠标、键盘、触摸屏、麦克风和/或遥控器。输出设备2508可以是能呈现信息的任何类型的设备,并且可以包括但不限于显示器、扬声器(例如可用于实施前文所述的输出声音数据的输出设备)、视频/音频输出终端、振动器和/或打印机。计算设备2500还可以包括存储设备2510或者与存储设备2510连接,所述存储设备(例如可以用于实施前文所述的计算机可读存储介质)可以是非暂时性的并且可以实现数据存储的任何存储设备,并且可以包括但不限于磁盘驱动器、光学存储设备、固态存储器、软盘、柔性盘、硬盘、磁带或任何其他磁介质,光盘或任何其他光学介质、ROM(只读存储器)、RAM(随机存取存储器)、高速缓冲存储器和/或任何其他存储器芯片或盒、和/或计算机可从其读取数据、指令和/或代码的任何其他介质。存储设备2510可以从接口拆卸。存储设备2510可以具有用于实现上述方法和步骤的数据/程序(包括指令)/代码。计算设备2500还可以包括通信设备2512。通信设备2512可以是使得能够与外部设备和/或与网络通信的任何类型的设备或系统,并且可以包括但不限于调制解调器、网卡、红外通信设备、无线通信设备和/或芯片组,例如蓝牙设备、1302.11设备、WiFi设备、WiMax设备、蜂窝通信设备和/或类似物。
计算设备2500还可以包括工作存储器2514(其可以用于实施前述的阅读辅助设备所包含的存储器),其可以是可以存储对处理器2504的工作有用的程序(包括指令)和/ 或数据的任何类型的工作存储器,并且可以包括但不限于随机存取存储器和/或只读存储器设备。
软件要素(程序)可以位于工作存储器2514中,包括但不限于操作系统2516、一个或多个应用(即应用程序)2518、驱动程序和/或其他数据和代码。用于执行上述方法和步骤的指令可以被包括在一个或多个应用2518中。软件要素(程序)的指令的可执行代码或源代码可以存储在非暂时性计算机可读存储介质(例如上述存储设备2510)中,并且在执行时可以被存入工作存储器2514中(可能被编译和/或安装)。软件要素(程序)的指令的可执行代码或源代码也可以从远程位置下载。
在将图25所示的计算设备2500应用于本公开的实施方式时,工作存储器2514可以存储用于执行本公开的流程图的程序代码和/或待识别的包含文字内容的图像,其中应用2518中可以包括由第三方提供的光学字符识别应用(例如Adobe)、语音转换应用、可编辑文字处理应用等等。输入设备2506可以是传感器用于获取包含文字内容的图像。其中所存储的包含文字内容的图像或者所获取的图像可以被OCR应用处理为包含文字的输出结果,输出设备2508例如是扬声器或耳机用于语音播报,其中处理器2504用于根据工作存储器2514中的程序代码来执行根据本公开的各方面的方法步骤。
还应该理解,可以根据具体要求而进行各种变型。例如,也可以使用定制硬件,和/或可以用硬件、软件、固件、中间件、微代码,硬件描述语言或其任何组合来实现特定元件(例如上述的芯片电路)。例如,所公开的方法和设备中的一些或全部(例如上述的芯片电路中的各个电路单元)可以通过使用根据本公开的逻辑和算法,用汇编语言或硬件编程语言(诸如VERILOG,VHDL,C++)对硬件(例如,包括现场可编程门阵列(FPGA)和/或可编程逻辑阵列(PLA)的可编程逻辑电路)进行编程来实现。
还应该理解,计算设备2500的组件可以分布在网络上。例如,可以使用一个处理器执行一些处理,而同时可以由远离该一个处理器的另一个处理器执行其他处理。计算设备2500的其他组件也可以类似地分布。这样,计算设备2500可以被解释为在多个位置执行处理的分布式计算系统。
虽然已经参照附图描述了本公开的实施例或示例,但应理解,上述的方法、系统和设备仅仅是示例性的实施例或示例,本发明的范围并不由这些实施例或示例限制,而是仅由授权后的权利要求书及其等同范围来限定。实施例或示例中的各种要素可以被省略或者可由其等同要素替代。此外,可以通过不同于本公开中描述的次序来执行各步骤。 进一步地,可以以各种方式组合实施例或示例中的各种要素。重要的是随着技术的演进,在此描述的很多要素可以由本公开之后出现的等同要素进行替换。
Claims (37)
- 一种版面分析方法,包括:获取图像中的多个文字行的坐标信息;根据所述坐标信息创建所述图像的版面模型;基于所述版面模型分析所述文字行的版面结构;以及基于所述版面结构确定所述文字行相对于彼此的顺序。
- 如权利要求1所述的方法,其中,根据所述坐标信息创建所述图像的版面模型包括:对数据结构中与所述坐标信息对应的数据元素填充数据值,以得到所述版面模型,其中,填充有数据值的数据元素形成多个矩形块,所述多个矩形块对应于所述多个文字行中的相应文字行。
- 如权利要求2所述的方法,其中,基于所述版面模型分析所述文字行的版面结构包括:选择性地调整所述多个矩形块的宽度,以使得所述多个矩形块被合并成彼此分离的多个连通区域;以及分析所述多个连通区域的空间布局,以得到所述文字行的版面结构。
- 如权利要求3所述的方法,其中,选择性地调整所述多个矩形块的宽度包括:对于每个矩形块:响应于该矩形块的宽度小于或等于所述多个矩形块的代表性宽度,使该矩形块的宽度增大第一量;响应于该矩形块的宽度大于所述代表性宽度且小于或等于所述代表性宽度的第一倍数,使该矩形块的宽度增大第二量;响应于该矩形块的宽度大于所述代表性宽度的所述第一倍数且小于或等于所述代表性宽度的第二倍数,不调整该矩形块的宽度;以及响应于该矩形块的宽度大于所述代表性宽度的所述第二倍数,使该矩形块的宽度减小第三量。
- 如权利要求4所述的方法,其中,所述代表性宽度为所述多个矩形块的子集的平均宽度,所述多个矩形块的该子集由所述多个矩形块中除宽度大于阈值宽度百分位数的那些矩形块之外的矩形块组成。
- 如权利要求4所述的方法,其中,所述代表性宽度为所述多个矩形块的平均宽度。
- 如权利要求3所述的方法,其中,分析所述多个连通区域的空间布局包括:选择性地校正或不校正所述多个连通区域在所述版面模型中的取向;选择性地去除或不去除所述版面模型中在行方向上与所述版面模型的两侧边中任一侧边直接相邻的连通区域,以得到各选定连通区域;以及对所述各选定连通区域进行投影分割,以得到一组分割区带和所述分割区带相对于彼此的顺序。
- 如权利要求7所述的方法,其中,选择性地校正或不校正所述多个连通区域在所述版面模型中的取向包括:确定所述多个连通区域相对于所述版面模型的行方向和列方向中的任一个是否处于倾斜状态;以及响应于确定所述多个连通区域处于所述倾斜状态,旋转所述多个连通区域一校正角度以使得所述多个连通区域不处于所述倾斜状态。
- 如权利要求8所述的方法,其中,确定所述多个连通区域相对于所述版面模型的行方向和列方向中的任一个是否处于倾斜状态包括:在所述多个连通区域中搜索特定连通区域,其中该特定连通区域的最小外接矩形在所述多个连通区域的最小外接矩形中具有最大面积;确定所述特定连通区域的最小外接矩形的一边是否平行于所述行方向和列方向中的任一个;响应于确定所述特定连通区域的最小外接矩形的所述边不平行于所述行方向和列方向中的任一个,确定所述多个连通区域处于所述倾斜状态;以及响应于确定所述特定连通区域的最小外接矩形的所述边平行于所述行方向和列方向中的任一个,确定所述多个连通区域不处于所述倾斜状态。
- 如权利要求8或9所述的方法,其中,选择性地去除或不去除所述版面模型中在行方向上与所述版面模型的两侧边中任一侧边直接相邻的连通区域包括:响应于所述多个连通区域不处于所述倾斜状态,对所述版面模型执行垂直投影分割;以及取决于所述垂直投影分割的结果,从所述多个连通区域中选择性地去除或不去除在行方向上与所述版面模型的两侧边中任一侧边直接相邻的连通区域。
- 如权利要求10所述的方法,其中,从所述多个连通区域中选择性地去除或不去除在行方向上与所述版面模型的两侧边中任一侧边直接相邻的连通区域包括:响应于确定所述垂直投影分割从所述版面模型未分割出至少两个区带,不执行所述去除;以及响应于确定所述垂直投影分割从所述版面模型分割出至少两个区带,确定所述至少两个区带在行方向上的相应有效尺寸,并且对于所述至少两个区带中在行方向上与所述版面模型的两侧边中任一侧边直接相邻的每个侧边区带,执行以下操作:响应于从所述版面模型分割出两个区带,并且该侧边区带在行方向上的有效尺寸小于所述相应有效尺寸中的最大尺寸的第一阈值百分比且小于所述两个区带中另一区带在行方向上的有效尺寸的第二阈值百分比,去除该侧边区带中的连通区域;以及响应于从所述版面模型分割出多于两个区带,并且该侧边区带在行方向上的有效尺寸小于所述相应有效尺寸中的最大尺寸的第三阈值百分比且小于各区带中与该侧边区带直接相邻的区带在行方向上的有效尺寸的第四阈值百分比,去除该侧边区带中的连通区域。
- 如权利要求11所述的方法,其中,所述第一阈值百分比小于所述第二阈值百分比,并且其中,所述第三阈值百分比等于所述第四阈值百分比。
- 如权利要求10所述的方法,其中,分析所述多个连通区域的空间布局还包括,在对所述版面模型执行垂直投影分割之前:响应于确定所述多个文字行为横版类型,使每个矩形块的长度在长度方向上的两端处均增大若干数据元素;或者响应于确定所述多个文字行为竖版类型,使每个矩形块的宽度在宽度方向上的两端处均增大若干数据元素。
- 如权利要求7所述的方法,其中,对所述各选定连通区域进行投影分割包括:对所述各选定连通区域递归地和交替地执行水平投影分割和垂直投影分割,以便从所述版面模型分割出所述一组分割区带;以及基于阅读顺序规则,确定所述一组分割区带中的各个分割区带相对于彼此的顺序。
- 如权利要求14所述的方法,其中,对所述各选定连通区域递归地和交替地执行水平投影分割和垂直投影分割包括:循环地执行操作,所述操作包括:对通过水平投影分割得到的水平分割区带中的每一个执行垂直投影分割;以及对通过垂直投影分割得到的垂直分割区带中的每一个执行水平投影分割,直至每个分割区带均无法通过水平投影分割和垂直投影分割进行分割,其中,无法通过水平投影分割和垂直投影分割进行分割的分割区带形成所述一组分割区带。
- 如权利要求15所述的方法,其中,对通过水平投影分割得到的每个水平分割区带执行垂直投影分割包括:在该水平分割区带中搜索一组数据列,其中对于该组数据列中的每个数据列,数据值之和处于零至第一阈值的范围中,所述第一阈值大于零;响应于通过搜索得到所述一组数据列,从所述一组数据列中选择用于分割该水平分割区带的垂直分界线;以及利用所选择的垂直分界线分割该水平分割区带,以得到垂直分割区带。
- 如权利要求15所述的方法,其中,对通过垂直投影分割得到的每个垂直分割区带执行水平投影分割包括:在该垂直分割区带中搜索一组数据行,其中对于该组数据行中的每个数据行,数据值之和处于零至第二阈值的范围中,所述第二阈值大于零;响应于通过搜索得到所述一组数据行,从所述一组数据行中选择用于分割该垂直分割区带的水平分界线;以及利用所选择的水平分界线分割该垂直分割区带,以得到水平分割区带。
- 如权利要求15所述的方法,其中,确定所述一组分割区带中的各个分割区带相对于彼此的顺序包括:在所述循环地执行操作中,将水平分割区带之间、垂直分割区带之间、以及水平分割区带和垂直分割区带之间的等级关系记录在等级树数据结构中,其中所述等级树数据结构中的叶子节点代表所述一组分割区带;以及根据所述阅读顺序规则遍历所述叶子节点,其中遍历所述叶子节点的顺序表示所述一组分割区带中的各个分割区带相对于彼此的顺序。
- 如权利要求15所述的方法,其中,所述阅读顺序规则包括:响应于确定所述多个文字行为横版类型,根据垂直分割区带之间的位置关系将垂直分割区带从左向右顺序排序,并且根据水平分割区带之间的位置关系将水平分割区带从上向下顺序排序;或者响应于确定所述多个文字行为竖版类型,根据垂直分割区带之间的位置关系将垂直分割区带从右向左顺序排序,并且根据水平分割区带之间的位置关系将水平分割区带从上向下顺序排序。
- 如权利要求8所述的方法,其中,分析所述多个连通区域的空间布局还包括,在对所述各选定连通区域进行投影分割之后:确定所述各选定连通区域是否曾经被旋转所述校正角度;以及响应于确定所述各选定连通区域曾经被旋转所述校正角度,使所述一组分割区带反向旋转所述校正角度。
- 如权利要求20所述的方法,其中,基于所述版面结构确定所述文字行相对于彼此的顺序包括:根据所述各选定连通区域相对于所述一组分割区带中的各个分割区带的相对位置,确定所述各选定连通区域与所述各个分割区带之间的对应关系,其中每个分割区带包含相应的一组选定连通区域;根据所述相应的一组选定连通区域中的选定连通区域之间的位置关系,对所述相应的一组选定连通区域中的选定连通区域进行排序;根据每个选定连通区域中的矩形块之间的位置关系,对每个选定连通区域中的所述矩形块进行排序;以及根据所述多个文字行与所述多个矩形块之间的对应关系,将所述多个文字行与所述各选定连通区域中的所述矩形块相匹配。
- 如权利要求21所述的方法,其中,对所述相应的一组选定连通区域中的选定连通区域进行排序包括:响应于确定所述多个文字行为横版类型,将所述相应的一组选定连通区域中的选定连通区域从上向下顺序排序;或者响应于确定所述多个文字行为竖版类型,将所述相应的一组选定连通区域中的选定连通区域从右向左顺序排序。
- 如权利要求21所述的方法,其中,对每个选定连通区域中的所述矩形块进行排序包括:响应于确定所述多个文字行为横版类型,将每个选定连通区域中的所述矩形块从上向下顺序排序;或者响应于确定所述多个文字行为竖版类型,将每个选定连通区域中的所述矩形块从右向左顺序排序。
- 如权利要求14所述的方法,其中,分析所述多个连通区域的空间布局还包括,在对所述各选定连通区域递归地和交替地执行水平投影分割和垂直投影分割之前:响应于确定所述多个文字行为横版类型,使所述各选定连通区域中的每个矩形块的长度在长度方向上的两端处均减小若干数据元素;或者响应于确定所述多个文字行为竖版类型,使所述各选定连通区域中的每个矩形块的宽度在宽度方向上的两端处均减小若干数据元素。
- 如权利要求2所述的方法,还包括,在基于所述版面模型分析所述文字行的版面结构之前:识别所述多个文字行的主版面类型,其中,所述主版面类型包括选自横版类型和竖版类型所组成的组中的一项。
- 如权利要求25所述的方法,其中,识别所述多个文字行的主版面类型包括:根据所述坐标信息确定所述多个矩形块各自的几何参数;以及基于所述多个矩形块各自的几何参数,确定所述多个文字行的主版面类型。
- 如权利要求26所述的方法,其中,所述几何参数包括所述多个矩形块各自的长度方向、长度、宽度方向和宽度中的至少一项。
- 如权利要求27所述的方法,其中,确定所述多个文字行的主版面类型包括:确定所述多个矩形块的子集,其中所述多个矩形块的该子集由所述多个矩形块中满足下述条件的矩形块组成:每个矩形块的长度方向与所述版面模型的列方向之间的夹角小于阈值角度;确定所述多个矩形块的所述子集的总面积与所述多个矩形块的总面积的比率;响应于所述多个矩形块的所述子集的总面积与所述多个矩形块的总面积的比率小于第一阈值比率,确定所述主版面类型为横版类型;以及响应于所述多个矩形块的所述子集的总面积与所述多个矩形块的总面积的比率不小于所述第一阈值比率,确定所述主版面类型为竖版类型。
- 如权利要求25所述的方法,其中,基于所述版面模型分析所述文字行的版面结构包括:分析所述主版面类型的文字行的版面结构。
- 如权利要求29所述的方法,还包括,在分析所述主版面类型的文字行的版面结构之前:从所述多个矩形块中选择性地去除或不去除次版面类型的矩形块,其中所述次版面类型包括选自横版类型和竖版类型所组成的组中的另一项。
- 如权利要求30所述的方法,其中,从所述多个矩形块中选择性地去除或不去除次版面类型的矩形块包括:确定所述次版面类型的矩形块的总面积与所述多个矩形块的总面积的比率;响应于所述次版面类型的矩形块的总面积与所述多个矩形块的总面积的比率小于第二阈值比率,从所述多个矩形块中去除所述次版面类型的矩形块;以及响应于所述次版面类型的矩形块的总面积与所述多个矩形块的总面积的比率不小于第二阈值比率,不从所述多个矩形块中去除所述次版面类型的矩形块。
- 如权利要求30所述的方法,还包括,在分析所述主版面类型的文字行的版面结构之后:响应于所述次版面类型的矩形块未从所述多个矩形块中去除,分析所述次版面类型的文字行的版面结构。
- 如权利要求2所述的方法,其中,所述数据结构包括二维空白矩阵。
- 一种芯片电路,包括:被配置为执行根据权利要求1-33中任一项所述的方法的电路单元。
- 一种阅读辅助设备,包括:如权利要求34所述的芯片电路;以及图像传感器,被配置为获取所述图像。
- 一种电子设备,包括:处理器;以及存储程序的存储器,所述程序包括指令,所述指令在由所述处理器执行时使所述处理器执行根据权利要求1-33中任一项所述的方法。
- 一种存储程序的计算机可读存储介质,所述程序包括指令,所述指令在由电子设备的处理器执行时,致使所述电子设备执行根据权利要求1-33中任一项所述的方法。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/351,080 US11367296B2 (en) | 2020-07-13 | 2021-06-17 | Layout analysis |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010667074.6A CN111832476A (zh) | 2020-07-13 | 2020-07-13 | 版面分析方法、阅读辅助设备、电路和介质 |
CN202010667074.6 | 2020-07-13 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/351,080 Continuation US11367296B2 (en) | 2020-07-13 | 2021-06-17 | Layout analysis |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022012121A1 true WO2022012121A1 (zh) | 2022-01-20 |
Family
ID=72900564
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/092338 WO2022012121A1 (zh) | 2020-07-13 | 2021-05-08 | 版面分析方法、阅读辅助设备、电路和介质 |
Country Status (5)
Country | Link |
---|---|
EP (1) | EP3940589B1 (zh) |
JP (1) | JP7132654B2 (zh) |
KR (1) | KR102399508B1 (zh) |
CN (1) | CN111832476A (zh) |
WO (1) | WO2022012121A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116167143A (zh) * | 2023-04-20 | 2023-05-26 | 江西少科智能建造科技有限公司 | 一种工位布置方法、系统、存储介质及设备 |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111832476A (zh) * | 2020-07-13 | 2020-10-27 | 上海肇观电子科技有限公司 | 版面分析方法、阅读辅助设备、电路和介质 |
US11367296B2 (en) | 2020-07-13 | 2022-06-21 | NextVPU (Shanghai) Co., Ltd. | Layout analysis |
CN113033338B (zh) * | 2021-03-09 | 2024-03-29 | 太极计算机股份有限公司 | 电子报头版头条新闻位置识别方法及装置 |
CN114494711B (zh) * | 2022-02-25 | 2023-10-31 | 南京星环智能科技有限公司 | 一种图像特征的提取方法、装置、设备及存储介质 |
CN114757144B (zh) * | 2022-06-14 | 2022-09-06 | 成都数之联科技股份有限公司 | 图像文档的重建方法、装置、电子设备和存储介质 |
CN114998885A (zh) * | 2022-06-23 | 2022-09-02 | 小米汽车科技有限公司 | 页面数据处理方法、装置、车辆及存储介质 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102479173A (zh) * | 2010-11-25 | 2012-05-30 | 北京大学 | 识别版面阅读顺序的方法及装置 |
US20150212654A1 (en) * | 2014-01-28 | 2015-07-30 | Comikka, Inc. | Architecture for providing dynamically sized image sequences |
CN109934210A (zh) * | 2019-05-17 | 2019-06-25 | 上海肇观电子科技有限公司 | 版面分析方法、阅读辅助设备、电路和介质 |
CN110969056A (zh) * | 2018-09-29 | 2020-04-07 | 杭州海康威视数字技术股份有限公司 | 文档图像的文档版面分析方法、装置及存储介质 |
CN111340037A (zh) * | 2020-03-25 | 2020-06-26 | 上海智臻智能网络科技股份有限公司 | 文本版面分析方法、装置、计算机设备和存储介质 |
CN111832476A (zh) * | 2020-07-13 | 2020-10-27 | 上海肇观电子科技有限公司 | 版面分析方法、阅读辅助设备、电路和介质 |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH06215184A (ja) * | 1992-09-17 | 1994-08-05 | Fuji Facom Corp | 抽出領域のラベリング装置 |
JP3683923B2 (ja) * | 1994-11-17 | 2005-08-17 | キヤノン株式会社 | 文字領域の順序付け方法 |
JP3940491B2 (ja) * | 1998-02-27 | 2007-07-04 | 株式会社東芝 | 文書処理装置および文書処理方法 |
JP2004240643A (ja) | 2003-02-05 | 2004-08-26 | Toshiba Corp | 文字認識システム、文字認識方法およびプログラム |
JP2004272822A (ja) | 2003-03-12 | 2004-09-30 | Seiko Epson Corp | 文字認識装置および文字認識方法並びにコンピュータプログラム |
JP4856925B2 (ja) * | 2005-10-07 | 2012-01-18 | 株式会社リコー | 画像処理装置、画像処理方法及び画像処理プログラム |
US8594422B2 (en) * | 2010-03-11 | 2013-11-26 | Microsoft Corporation | Page layout determination of an image undergoing optical character recognition |
US9330070B2 (en) * | 2013-03-11 | 2016-05-03 | Microsoft Technology Licensing, Llc | Detection and reconstruction of east asian layout features in a fixed format document |
CN109934209B (zh) * | 2019-05-17 | 2019-07-30 | 上海肇观电子科技有限公司 | 版面分析方法、阅读辅助设备、电路及介质 |
-
2020
- 2020-07-13 CN CN202010667074.6A patent/CN111832476A/zh active Pending
-
2021
- 2021-05-08 WO PCT/CN2021/092338 patent/WO2022012121A1/zh active Application Filing
- 2021-06-25 EP EP21181721.8A patent/EP3940589B1/en active Active
- 2021-07-05 KR KR1020210087974A patent/KR102399508B1/ko active IP Right Grant
- 2021-07-09 JP JP2021113960A patent/JP7132654B2/ja active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102479173A (zh) * | 2010-11-25 | 2012-05-30 | 北京大学 | 识别版面阅读顺序的方法及装置 |
US20150212654A1 (en) * | 2014-01-28 | 2015-07-30 | Comikka, Inc. | Architecture for providing dynamically sized image sequences |
CN110969056A (zh) * | 2018-09-29 | 2020-04-07 | 杭州海康威视数字技术股份有限公司 | 文档图像的文档版面分析方法、装置及存储介质 |
CN109934210A (zh) * | 2019-05-17 | 2019-06-25 | 上海肇观电子科技有限公司 | 版面分析方法、阅读辅助设备、电路和介质 |
CN111340037A (zh) * | 2020-03-25 | 2020-06-26 | 上海智臻智能网络科技股份有限公司 | 文本版面分析方法、装置、计算机设备和存储介质 |
CN111832476A (zh) * | 2020-07-13 | 2020-10-27 | 上海肇观电子科技有限公司 | 版面分析方法、阅读辅助设备、电路和介质 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116167143A (zh) * | 2023-04-20 | 2023-05-26 | 江西少科智能建造科技有限公司 | 一种工位布置方法、系统、存储介质及设备 |
CN116167143B (zh) * | 2023-04-20 | 2023-08-15 | 江西少科智能建造科技有限公司 | 一种工位布置方法、系统、存储介质及设备 |
Also Published As
Publication number | Publication date |
---|---|
KR20220008224A (ko) | 2022-01-20 |
JP7132654B2 (ja) | 2022-09-07 |
EP3940589A1 (en) | 2022-01-19 |
JP2022017202A (ja) | 2022-01-25 |
EP3940589B1 (en) | 2023-10-25 |
CN111832476A (zh) | 2020-10-27 |
KR102399508B1 (ko) | 2022-05-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022012121A1 (zh) | 版面分析方法、阅读辅助设备、电路和介质 | |
WO2020233378A1 (zh) | 版面分析方法、阅读辅助设备、电路和介质 | |
CN108492343B (zh) | 一种扩充目标识别的训练数据的图像合成方法 | |
US10872420B2 (en) | Electronic device and method for automatic human segmentation in image | |
CN111126394A (zh) | 文字识别方法、阅读辅助设备、电路和介质 | |
CN112036395A (zh) | 基于目标检测的文本分类识别方法及装置 | |
EP2246808A2 (en) | Automated method for alignment of document objects | |
US20210133437A1 (en) | System and method for capturing and interpreting images into triple diagrams | |
CN110929805B (zh) | 神经网络的训练方法、目标检测方法及设备、电路和介质 | |
US10621428B1 (en) | Layout analysis on image | |
WO2022227218A1 (zh) | 药名识别方法、装置、计算机设备和存储介质 | |
WO2022121842A1 (zh) | 文本图像的矫正方法及装置、设备和介质 | |
WO2020248346A1 (zh) | 文字的检测 | |
CN111862124A (zh) | 图像处理方法、装置、设备及计算机可读存储介质 | |
CN108520263B (zh) | 一种全景图像的识别方法、系统及计算机存储介质 | |
CN113850238A (zh) | 文档检测方法、装置、电子设备及存储介质 | |
US11367296B2 (en) | Layout analysis | |
WO2020244076A1 (zh) | 人脸识别方法、装置、电子设备及存储介质 | |
WO2022121843A1 (zh) | 文本图像的矫正方法及装置、设备和介质 | |
CN113850239B (zh) | 多文档检测方法、装置、电子设备及存储介质 | |
CN110969161B (zh) | 图像处理方法、电路、视障辅助设备、电子设备和介质 | |
WO2018061174A1 (ja) | 電子書籍作成システム、電子書籍作成法及びプログラム | |
CN112183253A (zh) | 数据处理方法、装置、电子设备及计算机可读存储介质 | |
CN112861735A (zh) | 文本图像的识别方法及装置、设备和介质 | |
JP2006065613A (ja) | 特定画像領域区画装置および方法,ならびに特定画像領域区画処理をコンピュータに実行させるプログラム |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21843001 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21843001 Country of ref document: EP Kind code of ref document: A1 |