WO2022161293A1 - 图像处理方法及装置、电子设备和存储介质 - Google Patents

图像处理方法及装置、电子设备和存储介质 Download PDF

Info

Publication number
WO2022161293A1
WO2022161293A1 PCT/CN2022/073310 CN2022073310W WO2022161293A1 WO 2022161293 A1 WO2022161293 A1 WO 2022161293A1 CN 2022073310 W CN2022073310 W CN 2022073310W WO 2022161293 A1 WO2022161293 A1 WO 2022161293A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
typeset
image
content
contents
Prior art date
Application number
PCT/CN2022/073310
Other languages
English (en)
French (fr)
Inventor
何涛
罗欢
陈明权
Original Assignee
杭州大拿科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 杭州大拿科技股份有限公司 filed Critical 杭州大拿科技股份有限公司
Publication of WO2022161293A1 publication Critical patent/WO2022161293A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • G06F40/109Font handling; Temporal or kinetic typography
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words

Definitions

  • Embodiments of the present disclosure relate to an image processing method, an image processing apparatus, an electronic device, and a non-transitory computer-readable storage medium.
  • Users can take pictures of paper documents to extract relevant information in paper documents and archive them electronically, so as to facilitate the management and storage of paper documents.
  • students will use a large number of test papers, homework, workbooks, etc. in the process of learning, and they need to organize and practice these test papers, homework, and workbooks, so as to efficiently and conveniently manage test papers, test papers, etc.
  • Storage, error recording, etc. can significantly improve students' learning efficiency; for example, in other usage scenarios, with the development of mobile networks, students often learn online courses and hand in homework through the Internet, so it is necessary to obtain homework with a clear background documents for submission via the web.
  • At least one embodiment of the present disclosure provides an image processing method, including: acquiring an image to be recognized; recognizing the image to be recognized to obtain multiple area frames and multiple areas corresponding to the multiple area frames one-to-one information and a plurality of contents to be typeset; based on the images to be recognized and the plurality of area information, perform typeset on the plurality of contents to be typeset, so as to obtain a typesetting document corresponding to the images to be recognized.
  • the to-be-recognized image is identified to obtain a plurality of area frames, a plurality of area information corresponding to the plurality of area frames, and a plurality of area frames.
  • the content to be typeset includes: recognizing the to-be-recognized image through an object detection model to obtain the multiple area frames and the multiple area information, wherein the multiple area frames include multiple first text boxes ; Recognize the plurality of first text boxes through a text recognition model to obtain a plurality of text contents corresponding to the plurality of first text boxes one-to-one; wherein, the plurality of contents to be typeset includes the plurality of one or more of the text content.
  • the image processing method provided by at least one embodiment of the present disclosure further includes: determining, according to the plurality of area information and the plurality of text contents, a plurality of text categories corresponding to the plurality of first text boxes one-to-one, wherein , the region information corresponding to any first text box in the plurality of first text boxes includes a text category of the any first text box.
  • the plurality of area frames further include at least one picture frame
  • the to-be-recognized image is identified to obtain a plurality of area frames, which are related to the plurality of area frames.
  • a one-to-one correspondence of multiple area information and multiple contents to be typeset further comprising: extracting at least one image to be typeset corresponding to the at least one frame respectively, wherein the multiple contents to be typeset further include the at least one image to be typeset.
  • the to-be-recognized image is an image including at least one topic
  • the multiple area frames further include at least one topic that corresponds to the at least one topic one-to-one
  • Each title frame includes at least one first text box in the area covered by the image to be recognized, and each area information includes the position of the area frame corresponding to the information about each area in the image to be recognized information, and determining the multiple text categories corresponding to the multiple first text boxes one-to-one according to the multiple area information and the multiple text contents includes: determining, according to the location information in the multiple area information, The correspondence between the at least one title box and the plurality of first text boxes; and the plurality of text categories are determined based on the correspondence and the plurality of text contents.
  • the at least one topic frame includes a first topic frame, and in the first direction, the first topic frame has a first side, and the plurality of first topic frames
  • a text box includes a first to-be-processed box
  • the multiple text categories include topic names
  • determining the multiple text categories based on the corresponding relationship and the multiple text contents includes: responding to the corresponding relationship indicating that the first frame to be processed is located within the area covered by the first subject frame in the image to be recognized, and there is no area frame between the first frame to be processed and the first side, determining that the text category of the first frame to be processed is the title of the topic; or, in response to the corresponding relationship indicating that the first frame to be processed is located in the image to be recognized and covered by the first title frame outside the area of and there is no area frame between the first frame to be processed and the first side, and it is determined that the text content corresponding to the first frame to be processed includes the feature information of the big topic, and the
  • the plurality of first text boxes include second to-be-processed boxes
  • the plurality of text categories include titles
  • the second The frame to be processed has a first edge
  • the image to be recognized has a first edge
  • determining the multiple text categories based on the corresponding relationship and the multiple text contents includes: indicating the first edge in the corresponding relationship.
  • the second frame to be processed in response to the first edge of the second frame to be processed and the first edge of the image to be recognized If the distance between them is smaller than the preset distance and the text content corresponding to the second frame to be processed includes title feature information, it is determined that the text category of the second frame to be processed is the title.
  • the plurality of to-be-typeset contents are typeset, so as to obtain a layout corresponding to the to-be-recognized image
  • the typesetting document comprising: determining, based on the plurality of area information and the to-be-recognized images, a plurality of typesetting information corresponding to the plurality of content to be typeset respectively; The content to be typeset is typeset to obtain the typesetting document.
  • determining a plurality of typesetting information corresponding to the plurality of content to be typeset respectively including: by The classification model performs classification processing on the to-be-recognized image to determine the image category of the to-be-recognized image; obtains a layout template corresponding to the image category according to the image category; according to the layout template and the plurality of regions information to determine the plurality of typesetting information.
  • determining the multiple types of typesetting information according to the typesetting template and the multiple area information includes: for the first type of the multiple content to be typeset i content to be typeset: in response to the i-th content to be typeset being text content, determine the area information of the area frame corresponding to the i-th content to be typeset, and determine the i-th content to be typeset according to the area information The text category of the content; according to the typesetting template and the text category of the i-th content to be typeset, determine the typesetting information corresponding to the i-th content to be typeset, where i is a positive integer, and is less than or equal to the multiple The total number of content to be typeset.
  • typesetting the multiple contents to be typeset to obtain the typesetting document includes: processing the layout content to obtain a plurality of display contents; determining the positional relationship between the plurality of display contents; based on the positional relationship between the plurality of display contents and the plurality of layout information, The displayed content is subjected to typesetting processing to obtain the typesetting document.
  • the multiple contents to be typeset include at least one first content to be typeset and at least one second content to be typeset, and the at least one title frame includes the at least one content to be typeset.
  • performing title number detection processing on the at least one first content to be typeset to obtain at least one intermediate display content includes: extracting the at least one title frame corresponding to to obtain at least one title information; determine the positional relationship between the at least one title frame; based on the positional relationship between the at least one title frame and the at least one title information, determine whether there is
  • missing question numbers in response to the situation of missing question numbers: extract the missing question number information, determine the missing area corresponding to the missing question number information in the to-be-recognized image, and complete all the information based on the missing area.
  • the missing question number information is obtained to obtain the missing display content corresponding to the missing area, and the missing display content and the at least one first to-be-typed content are used as the at least one intermediate display content.
  • the at least one first content to be typeset is used as the at least one intermediate display content.
  • determining the positional relationship between the plurality of display contents includes: determining the plurality of area frames according to the position information in the plurality of area information position in the to-be-recognized image; determine the positional relationship between the plurality of to-be-typeset contents based on the positions of the plurality of area frames in the to-be-recognized image; The positional relationship between the plurality of display contents is determined, and the positional relationship between the plurality of display contents is determined.
  • determining the positional relationship between the plurality of display contents according to the positional relationship between the plurality of contents to be typeset includes: according to the plurality of The position of the area frame in the to-be-recognized image, determining whether the to-be-recognized image includes multiple image partitions, and in response to the to-be-recognized image including multiple image partitions, determining a plurality of image partitions corresponding to the multiple image partitions A set of content to be typeset, and the positional relationship between the multiple image partitions in the image to be identified is determined, and based on the positional relationship between the multiple image partitions, the multiple content sets to be typeset are determined.
  • the positional relationship between the plurality of content sets to be typeset is determined based on the positional relationship between the plurality of content sets to be typeset and the positional relationship between the plurality of content to be typeset, and the positional relationship between the plurality of display contents is determined.
  • typesetting processing is performed on the plurality of display contents based on the positional relationship between the plurality of display contents and the plurality of typesetting information, so as to obtain the plurality of display contents.
  • the typesetting document comprising: performing typesetting processing on the plurality of display contents based on the plurality of typesetting information, so as to obtain a plurality of typeset display contents; The positional relationship between them is arranged in sequence to obtain the typesetting document.
  • the text category includes handwritten text
  • the text category in response to the first text content in the plurality of text contents is handwritten text
  • the typesetting document Without including the first text content
  • the to-be-recognized image is identified to obtain a plurality of area frames, a plurality of area information corresponding to the plurality of area frames, and a plurality of content to be typeset, further comprising: Deleting the first text content from the plurality of text contents to obtain at least one remaining text content, wherein the plurality of contents to be typeset includes the remaining at least one text content but does not include the first text content. a textual content.
  • At least one embodiment of the present disclosure provides an image processing apparatus, including: an acquisition unit configured to acquire an image to be recognized; a recognition unit configured to recognize the image to be recognized to obtain a plurality of area frames, A plurality of area information and a plurality of contents to be typeset corresponding to each area frame one-to-one; the typesetting unit is configured to typeset the plurality of contents to be typeset based on the to-be-recognized image and the plurality of area information, so as to obtain A typesetting document corresponding to the to-be-recognized image.
  • At least one embodiment of the present disclosure provides an electronic device, comprising: a memory non-transitory storing computer-executable instructions; a processor configured to execute the computer-executable instructions, wherein the computer-executable instructions are The image processing method according to any embodiment of the present disclosure is implemented when the processor is running.
  • At least one embodiment of the present disclosure provides a non-transitory computer-readable storage medium, wherein the non-transitory computer-readable storage medium stores computer-executable instructions that, when executed by a processor, implement a The image processing method described in any embodiment of the present disclosure.
  • FIG. 1 is a schematic flowchart of an image processing method provided by at least one embodiment of the present disclosure
  • FIG. 2 is a schematic diagram of an image to be recognized according to at least one embodiment of the present disclosure
  • FIG. 3 is a schematic diagram of an image to be recognized with a region frame provided by at least one embodiment of the present disclosure
  • FIG. 4A is an exemplary flowchart of step S30 in the image processing method shown in FIG. 1;
  • FIG. 4B is an exemplary flowchart of step S302 in the image processing method shown in FIG. 4A;
  • 4C is a schematic diagram of a typesetting document including article paragraphs according to an embodiment of the present disclosure.
  • 4D is a schematic diagram of a typesetting document corresponding to the to-be-recognized image shown in FIG. 2 according to an embodiment of the present disclosure
  • FIG. 5 is a schematic block diagram of an image processing apparatus according to at least one embodiment of the present disclosure.
  • FIG. 6 is a schematic diagram of an electronic device according to at least one embodiment of the present disclosure.
  • FIG. 7 is a schematic diagram of a non-transitory computer-readable storage medium provided by at least one embodiment of the present disclosure.
  • At least one embodiment of the present disclosure provides an image processing method, an image processing apparatus, an electronic device, and a non-transitory computer-readable storage medium.
  • the image processing method includes: acquiring an image to be recognized; recognizing the image to be recognized to obtain a plurality of Area frame, multiple area information corresponding to multiple area frames, and multiple contents to be typeset; Typesetting documents.
  • the image processing method processes the image to be recognized by using a pre-trained model to obtain a plurality of contents to be typeset and their category information, so as to obtain a corresponding typesetting format according to the category information of the contents to be typeset for the content to be typeset Typesetting, and finally get a typesetting document.
  • an electronic document corresponding to the image to be recognized can be obtained, thereby facilitating various operations on the image to be recognized, such as storage, transmission, management, backup, printing and other operations.
  • the image processing method provided by the embodiment of the present disclosure can be applied to the image processing apparatus provided by the embodiment of the present disclosure, and the image processing apparatus can be configured on an electronic device.
  • the electronic device may be a personal computer, a mobile terminal, etc.
  • the mobile terminal may be a hardware device such as a mobile phone or a tablet computer.
  • FIG. 1 is a schematic flowchart of an image processing method provided by at least one embodiment of the present disclosure.
  • the image processing method provided by at least one embodiment of the present disclosure includes steps S10 to S30.
  • Step S10 Acquire an image to be recognized.
  • Step S20 Recognize the image to be recognized to obtain a plurality of area frames, a plurality of area information corresponding to the plurality of area frames one-to-one, and a plurality of contents to be typeset.
  • Step S30 Based on the image to be recognized and the plurality of area information, typesetting is performed on a plurality of contents to be typeset, so as to obtain a typesetting document corresponding to the image to be recognized.
  • the to-be-recognized image in step S10 is an image containing a question
  • the to-be-recognized image is an image of a test paper, homework, exercise book, etc.
  • the test paper, homework, exercise book, etc. may be paper files , so that the electronic document of the paper test paper can be obtained through the image processing method provided by the embodiment of the present disclosure, so that the test paper can be stored, transmitted, managed, backed up, printed, and other operations.
  • the test papers may be test papers of various subjects, for example, Chinese, mathematics, foreign languages (eg, English, etc.), and similarly, the workbooks may also be workbooks of various subjects.
  • the title may include text content
  • the text content may include text in various languages, such as Chinese (Chinese characters and/or Pinyin), English, Japanese, etc.
  • the text content may also include various numbers (Chinese numbers, Roman numerals, Arabic numerals, etc.), symbols (eg, greater than, less than, percent signs, etc.), graphics (circles, rectangles, etc.), eg, text content can have various fonts, various colors, etc.
  • the text content may include printed text content and handwritten text content, such as handwritten words and letters, handwritten numbers, handwritten symbols and graphics, and the like.
  • the title can also include other types of information such as pictures or tables.
  • the present disclosure does not specifically limit the content included in the subject to be recognized in the image.
  • the image to be recognized may be an image captured by an image acquisition device (eg, a digital camera or a mobile phone, etc.), and the image to be recognized may be a grayscale image or a color image.
  • the to-be-recognized image refers to a form in which the to-be-processed object (eg, test paper, homework, exercise book, etc.) is presented in a visual manner, such as a picture of the to-be-processed object.
  • the image to be recognized can also be obtained by scanning or the like.
  • the image to be recognized may be an image directly collected by an image collection device, or may be an image obtained after preprocessing the collected image.
  • the typeset document may be an electronic document.
  • FIG. 2 is a schematic diagram of an image to be recognized according to at least one embodiment of the present disclosure.
  • the to-be-recognized image is a test paper image
  • the to-be-recognized image includes a plurality of questions.
  • the multiple areas divided by the black box in the example in Figure 2 correspond to multiple topics, for example, each topic here refers to a sub-topic (in Figure 2, it is drawn by Arabic numerals, for example, "1. Fill in the appropriate unit”, etc.), and multiple questions constitute a major question (such as “1. Fill in the blanks”, “2. Multiple choice questions”, etc.).
  • topic 1 may include text, and the text includes symbols, characters, numbers, handwritten text, etc.; for example, as shown in topic 2 marked in FIG. 2, topic 2 may include text and Tables; for example, as shown in question 3 labeled in Figure 2, question 3 may include text and pictures.
  • the images to be recognized and the topics contained in the images to be recognized in the present disclosure are not limited to the situation shown in FIG. 2 , the images to be recognized can also be exercise books, etc., and the topics can also refer to other topic forms, for example, to be recognized Each topic included in the identification image can also refer to a broad topic.
  • a pre-trained object detection model can be used to determine multiple area frames in the to-be-recognized image and the area information corresponding to the multiple area frames.
  • the area information may include the location information of the area frames in the to-be-recognized image; then , the text content corresponding to the text box in the area box is determined by the text recognition model, and the text content is used as the content to be typeset.
  • a pre-trained model can be used to directly acquire multiple area frames in the image to be recognized, area information corresponding to the multiple area frames, and text content corresponding to the text boxes in the area frames, and use the text content as the content to be typeset.
  • the pre-trained model can complete the functions of the aforementioned object detection model and text recognition model. That is to say, the recognition model for recognizing the image to be recognized may be multiple models, or may be one model, which is not limited in the present disclosure.
  • step S20 may include: identifying the image to be recognized by the object detection model to obtain multiple area frames and multiple area information, wherein the multiple area frames include multiple first text frames;
  • the first text box is identified to obtain a plurality of text contents corresponding to the plurality of first text boxes one-to-one.
  • the plurality of contents to be typeset includes one or more of the plurality of text contents.
  • step S20 may further include: determining a plurality of text categories corresponding to a plurality of first text boxes one-to-one according to a plurality of area information and a plurality of text contents.
  • the area information corresponding to any first text box in the plurality of first text boxes includes the text category of the any first text box.
  • the object detection model can be a pre-trained neural network classification model, such as Faster R-CNN (Faster Region-Convolutional Neural Networks), R-FCN (Region-based Fully Convolutional Network) and other models.
  • regions such as text and pictures in the image to be recognized can be identified, and different regions can be marked with different categories.
  • the region boxes can include text boxes, picture boxes, and topic boxes etc.
  • the title frame may be composed of at least one text frame and/or at least one picture frame, for example, the title frame may be the frame corresponding to “title 1” in FIG.
  • the area frame may also include a student information frame, a title frame, etc. Since student information, titles, etc. usually have specific formats and text features, the object detection model can also be trained so that the object detection model can directly The title frame and the student information frame are obtained by processing the to-be-recognized image.
  • the title box and the student information box can also belong to the text box, that is to say, the object detection model can not classify the area boxes corresponding to the student information, title, etc., but directly classify the area corresponding to the student information, title, etc. Boxes are classified as text boxes.
  • step S20 may further include: extracting at least one picture to be typeset corresponding to the at least one frame respectively.
  • the plurality of contents to be typeset further include at least one image to be typeset.
  • the image to be recognized contains complex mathematical forms or mathematical formulas, such as vertical formulas in mathematical operations.
  • Formulas are included as part of the typeset document, thereby simplifying the generation of typesetting documents.
  • the text content of each line in the vertical format can also be obtained, and typesetting is performed according to the positional relationship between the text contents in the vertical format and the template corresponding to the vertical format, so as to generate the typesetting vertical format. vertical as part of the typeset document.
  • a text box may appear in a picture frame identified by the object detection model, that is, some text content in the picture frame will also have a corresponding text box.
  • the multiple area boxes further include at least one second text box, the at least one second text box is located in the region where the at least one picture frame is located, and any picture frame in the at least one picture frame includes a picture, and the at least one picture frame is extracted corresponding to
  • the at least one picture to be typeset includes: in response to the area covered by any frame in the image to be recognized including the area covered by N second text boxes in at least one second text frame in the image to be recognized, The text content corresponding to the N second text boxes and the picture in any picture box as a whole are regarded as the picture to be typeset corresponding to any picture box, wherein N is a positive integer.
  • step S20 may further include: identifying the student information boxes through a character recognition model to obtain student information, wherein the multiple content to be typeset further includes student information.
  • step S20 may further include: identifying the title frames through a character recognition model to obtain title information, wherein the plurality of contents to be typeset further include title information.
  • Step S20 also includes It may include: identifying the header frame and the footer frame through a text recognition model to obtain the header information and the footer information, wherein the plurality of contents to be typeset further include the header information and the footer information. Then in step S30, the header information and the footer information are typeset based on the preset header and footer formats, and a typesetting document with the header information and the footer information is generated.
  • the table can be recognized as a table frame according to the table recognition model and converted into a spreadsheet.
  • the plurality of area frames further include at least one table frame, and the area information corresponding to each table frame includes table information.
  • Step S20 may further include: identifying the image to be recognized by a table recognition model to obtain at least one table frame; The table frame and the table information corresponding to the at least one table frame generate at least one table content corresponding to the at least one table frame, wherein the plurality of contents to be typeset further include at least one table content.
  • a table in the image to be recognized corresponds to a table frame
  • a table frame may include multiple text boxes, each text box corresponds to the text content in the table
  • the table information includes the table frame and the table
  • the positional relationship between the multiple text boxes included in the box for example, a table is generated based on the number of rows and columns of the table, and the text content in the text box is filled in the corresponding position in the generated table to generate a table content.
  • other table identification and generation methods may also be used to generate table content, which is not limited in the present disclosure.
  • the text recognition model may include a neural network-based character recognition model, such as an OCR (Optical Character Recognition, Optical Character Recognition) model, etc.
  • the text recognition model can recognize the text content of the text box, for example, the text recognition model can The text content of the box is used as the content to be typeset.
  • the text content here can include printed text content and handwritten text content, that is, the text recognition model can output printed text content and handwritten text content without distinction. as content to be published.
  • the text recognition model may also recognize the type of textual content, such as printed or handwritten.
  • the type of the text content is handwriting type
  • a specific typesetting format different from that of the printing type text content is set for it, for example, setting the font of the handwriting type text content to a handwriting type, etc., to generate a typesetting document.
  • the text category includes handwritten text
  • the type of text content is a handwritten type
  • the text category of the text box is handwritten text, eg, text responsive to a first text content of the plurality of text contents
  • step S20 may further include: deleting the first text content from the multiple text contents to obtain at least one remaining text content, wherein the multiple content to be typeset is The remaining at least one text content is included without the first text content.
  • printed text content not only refers to the text, characters, graphics and other content input on the electronic device through the input device, but the printed text content can also be the content handwritten by the user after printing.
  • the present disclosure can also identify the text whose text category is handwritten text according to actual needs, and use it as a part of the content to be typeset to generate a typesetting document, which is not limited in the present disclosure.
  • the text box can be further subdivided to determine different text categories of different text boxes. , so that the text content is typeset according to the format corresponding to the text category, and a typesetting document with better typesetting effect is obtained.
  • the image to be recognized is an image containing at least one topic
  • the plurality of area frames further include at least one topic frame corresponding to the at least one topic one-to-one
  • each topic frame includes at least one first topic frame in the area covered by the image to be recognized Text boxes
  • each area information includes the position information of the area frame corresponding to each area information in the image to be recognized
  • a plurality of first text boxes corresponding to one-to-one are determined.
  • the text category may include: determining a correspondence between at least one title box and a plurality of first text boxes according to the position information in the plurality of area information; and determining a plurality of text categories based on the correspondence and the plurality of text contents.
  • multiple text categories include the title of the big question, where the "big question name” refers to the category title of "multiple choice”, "fill-in-the-blank question”, etc. as shown in Figure 2, because it needs to be processed in special format (For example, enlarged font, bold display, etc.), so this type of text box needs to be identified from multiple text boxes.
  • At least one topic box includes a first topic box, and in the first direction, the first topic box has a first edge, and the plurality of first text boxes include a first to-be-processed box.
  • a plurality of text categories may include: indicating that the first frame to be processed is located in the area covered by the first subject frame in the image to be recognized in response to the corresponding relationship, and there is no area between the first frame to be processed and the first side frame, determine that the text category of the first frame to be processed is the title of the topic; or, in response to the corresponding relationship indicating that the first frame to be processed is located outside the area covered by the first frame of the subject in the image to be identified and the first frame to be processed There is no area frame between it and the first side, and it is determined that the text content corresponding to the first to-be-processed frame contains feature information of the big topic, and the text category of the first to-be-processed frame is determined to be the
  • the first frame to be processed is located in the area covered by the first subject frame in the image to be recognized
  • the image to be recognized means on the image to be recognized, the area covered by the first frame to be processed is located in the area covered by the first subject frame. within the area.
  • the first direction may be a vertical direction.
  • the first side can be the upper side of the first title frame ;
  • the first frame may be the first frame.
  • Top side of the title box indicates that the text content includes Chinese capitalized numbers, for example, the text content includes "one", "two", "three” and so on.
  • multiple text categories also include titles, where "title” refers to the text located at the head of the test paper as shown in Figure 2: "Comprehensive Test Paper for Unit 1", since it needs to be processed in special formats (such as font size) larger, bold, etc.), so this type of text box needs to be identified from multiple text boxes.
  • the plurality of first text boxes include a second frame to be processed.
  • the second frame to be processed has a first edge
  • the image to be recognized has a first edge.
  • the a text category including: in the case that the corresponding relationship indicates that the second frame to be processed is not located in the area covered by at least one title frame in the image to be recognized, in response to the difference between the first edge of the second frame to be processed and the image to be recognized
  • the distance between the first edges is less than the preset distance and the text content corresponding to the second frame to be processed includes title feature information, and the text category of the second frame to be processed is determined as the title.
  • the first edge can be the upper side of the image to be recognized, and the first edge can be the upper side of the second frame to be processed, so that it can be processed according to the second edge.
  • the position of the frame in the image to be recognized and its corresponding text content determine whether the text category of the second frame to be processed is a title.
  • FIG. 3 is a schematic diagram of an image to be recognized with a region frame provided by at least one embodiment of the present disclosure.
  • the multiple boxes in FIG. 3 are multiple area frames obtained by recognizing the image to be recognized through step S20 .
  • the multiple area frames include a picture frame and a text frame.
  • the text box at the top is the text box with the text category as the title; for example, each question box corresponds to a major question, for example, the question box 1 corresponds to the first major question (as shown in "One, column vertical calculation" in Figure 3 "), the title box 2 corresponds to the fifth major question ("five, " in Figure 3); for example, the text box located in the first row of the title box is the text box whose text category is the title of the title (as shown in the figure 3); for example, a text box whose text content type is handwritten is a text box whose text category is handwritten text (as shown in the text box in Figure 3, "Answer: Xiao Cong's house is near the school”) .
  • the area frame shown in FIG. 3 is only an example of the area frame. Different forms of area frames can be generated when the object detection model is trained differently.
  • the text frame can contain multiple lines of text content. etc., the present disclosure does not limit it.
  • format adjustment may be performed on the content to be typeset according to different typeset formats corresponding to different contents to be typeset, so as to generate a typesetting document.
  • FIG. 4A is an exemplary flowchart of step S30 in the image processing method shown in FIG. 1 .
  • step S30 in the image processing method may specifically include steps S301-S302.
  • step S301 based on the plurality of area information and the to-be-identified image, a plurality of layout information respectively corresponding to the plurality of contents to be typeset is determined.
  • step S301 may include: classifying the image to be recognized by a classification model to determine the image category of the image to be recognized; acquiring a layout template corresponding to the image category according to the image category; determining according to the layout template and the plurality of area information Multiple typographic information.
  • a classification model can be used to divide the images to be recognized according to subjects, such as Chinese, mathematics, English, etc., to obtain the typesetting template corresponding to the subject category.
  • subjects such as Chinese, mathematics, English, etc.
  • other classification manners can also be used as required, which is not limited in the present disclosure.
  • the typesetting template can specify information such as the number of words per line, font size, font category, word spacing, line spacing, paragraph spacing, etc.
  • the font category of Chinese adopts Song Dynasty
  • the font category of English and numbers adopts "Times New Roman”
  • the title category adopts "Times New Roman”.
  • the font size of the title is larger (for example, the third size, etc.) and is blackened and bolded
  • the font size of the title name is larger (for example, the small three size, etc.) and is blackened, bolded, etc.
  • step S301 determining a plurality of typeset information according to the typesetting template and a plurality of area information, which may include: for the i-th content to be typeset in the multiple contents to be typeset: in response to the i-th content to be typeset being a text content, determine the area information of the area frame corresponding to the i-th content to be typeset, and determine the text category of the i-th to-be-typeset content according to the area information; Typesetting information corresponding to the typesetting content, where i is a positive integer and less than or equal to the total number of multiple types of content to be typeset.
  • the typesetting information specifies the typesetting format of the content to be typeset. For example, if the text category of the content to be typeset is title name, the typesetting information may include general typesetting formats such as the number of characters per line, word spacing, etc., and may also include font size, font blackened and added. Bold and other typeset formats specially set for the title of the title.
  • the typesetting information is obtained by obtaining a preset typesetting template, so as to typeset the content to be typed, the complexity of typesetting is reduced, and a typesetting document with better typesetting can be obtained.
  • step S302 based on a plurality of typesetting information, typesetting is performed on a plurality of contents to be typeset, so as to obtain a typesetting document.
  • FIG. 4B is an exemplary flowchart of step S302 in the image processing method shown in FIG. 4A .
  • step S302 in the image processing method may specifically include steps S3021-S3023.
  • step S3021 a plurality of contents to be typeset are processed to obtain a plurality of display contents.
  • step S3022 the positional relationship among the plurality of display contents is determined.
  • step S3023 based on the positional relationship between the plurality of display contents and the plurality of typesetting information, typesetting processing is performed on the plurality of display contents to obtain a typesetting document.
  • the multiple contents to be typeset include at least one first content to be typeset and at least one second content to be typeset
  • at least one title box includes at least one first content to be typeset
  • step S3021 may include: for the at least one first content to be typeset Perform title number detection processing to obtain at least one intermediate display content; perform format processing on at least one second content to be typeset and at least one intermediate display content to obtain multiple display contents.
  • the first content to be typeset here refers to the content to be typeset contained in the title box
  • the second content to be typeset refers to other content to be typeset except the content to be typeset included in the title box, such as titles, student information, etc.
  • the first content to be typeset contained in the title frame can be subjected to question number detection processing to determine whether there is a missing question number and retrieve the missing question, so as to obtain a complete Typesetting documents.
  • the question number here can refer to the question number in the title of the big question, such as the uppercase numbers “one", “two", “three”, etc., or it can refer to the question number of each sub-question, such as "1", "2” , "3", etc.
  • performing title number detection processing on at least one first content to be typeset to obtain at least one intermediate display content may include: extracting title number information corresponding to at least one title frame to obtain at least one title number information; determining at least one title The positional relationship between the boxes; based on the positional relationship between at least one question frame and at least one question number information, determine whether there is a situation of missing question numbers, and in response to the situation of missing question numbers: extract the missing question number information, determine The missing title information corresponds to the missing area in the image to be identified, and the missing title information is completed based on the missing area to obtain the missing displayed content corresponding to the missing area, and the missing displayed content and at least one first content to be typeset are regarded as at least one.
  • An intermediate display content, in response to the situation that there is no missing question number: at least one first content to be typeset is used as at least one intermediate display content.
  • judging whether there is a situation of missing question number may include: based on the positional relationship between at least one question box, sorting at least one question number information, In order to obtain the title number information sequence; in response to any two adjacent title number information in the title number information sequence being continuous, it is determined that there is no situation of missing title numbers, in response to at least two adjacent question number information in the title number information sequence. If the number information is discontinuous, it is determined that there is a situation where the question number is missing.
  • completing the missing question number information based on the missing area to obtain the missing display content corresponding to the missing area may include: in response to the existence of a question frame in the missing area, completing the missing question number information for the title frame to obtain the corresponding missing area
  • the missing display content of the missing area wherein the missing display content corresponding to the missing area includes the missing question number information and the text content in the title box; in response to the absence of the title frame in the missing area, the missing area is identified to obtain the missing area corresponding to the missing area Display content, wherein the missing display content corresponding to the missing area includes the text content in the missing area and the missing title number information.
  • At least one second content to be typeset and at least one intermediate display content can be obtained collectively referred to as the content to be processed) for format processing to obtain multiple display contents.
  • Append, segment, that is, format processing may include performing text segmentation processing on the content to be processed to obtain multiple display contents, for example, dividing the to-be-typed content belonging to the same paragraph into one display content, that is, one display content Can correspond to a paragraph.
  • Each paragraph contains at least one line of text.
  • the stem content of a question is usually a sentence
  • a symbol such as a period, a question mark, etc.
  • there is no other text content after the symbol in the horizontal direction it can be judged that a paragraph ends .
  • whether segmentation is required can be determined based on the length of the text content. For example, if there are three consecutive lines of text, the first and third lines of text are both long, while the second line of text is short, and between the second and third lines of text If there is no picture, it is judged that the paragraph ends at the second line of text content, and the third line of text content belongs to a new paragraph.
  • it can be segmented according to the characteristics of the question type. For example, for a multiple-choice question, if the text content is an option and the option is on a different line from the upper and lower adjacent text boxes, it is judged that the option belongs to a new independent paragraph.
  • the text containing the sub-question number is judged
  • the content belongs to a new paragraph.
  • the text in the to-be-recognized image is formatted horizontally from top to bottom as an example for description.
  • the present disclosure is not limited to this, if the text in the to-be-recognized image is When the typesetting is performed vertically from right to left, the above-mentioned “upper side” can be “right side”, the above-mentioned “lower side” can be “left side”, and the above-mentioned “horizontal right side” can be “vertical side” straight up”.
  • the article paragraph can be identified according to whether the first line of text content in the content to be processed is indented by N characters, so as to follow the article format Typesetting.
  • the format processing may include performing text segmentation processing on the to-be-processed content with the first line of text content indented by N characters, and typeset the first line of text content according to a preset indentation format, so as to obtain multiple display contents , where N is a positive integer greater than 1. It should be noted that, if the article paragraph includes multiple paragraphs, the text content of the first line of each paragraph in the article paragraph is typeset according to the preset indentation format.
  • FIG. 4C is a schematic diagram of a typesetting document including article paragraphs provided by at least one embodiment of the present disclosure.
  • the title of the title is displayed in bold and black; the first line under the title of the title (Title 1 shown in Figure 4C) is not indented by more than two characters, according to the typesetting information of the title stem.
  • Typesetting the first line of text in an English paragraph is indented by more than two characters, and it is judged that the English paragraph is an article paragraph, so the English paragraph is used as a display content and typeset according to the preset indentation format; If the sub-question number appears in the text content (question stem 2 shown in FIG.
  • each text content containing the sub-question number belongs to a paragraph , take the text content including the title number of the sub-question as a display content, and perform typesetting according to the typesetting information of the title stem.
  • the format processing may also include format conversion of the content to be processed that contains the special format, so as to obtain the special format containing the special format.
  • special formats such as fractions, upper marks and lower marks can also be expressed in special formats, such as using LaTex representation to represent mathematical symbols, so that the text recognition model can directly output the display content and display it in the typesetting document without the need for format processing.
  • step S3022 may include: determining the positions of the plurality of area frames in the image to be recognized according to the position information in the plurality of area information; The positional relationship between the plurality of display contents is determined according to the positional relationship between the plurality of contents to be typeset.
  • each column or page is called an image partition. or three image partitions.
  • the images to be recognized that have columns or pages need to be formatted to attribute the titles of the same column or page to the same page of the typeset document. For example, this can be done based on the location information in the area information corresponding to the area frame.
  • determining the positional relationship between the plurality of displayed contents according to the positional relationship between the plurality of contents to be typeset may include: determining whether the to-be-recognized image includes multiple images according to the positions of the plurality of area frames in the to-be-recognized image Partitioning, in response to the image to be recognized including multiple image partitions, determining a plurality of content sets to be typeset corresponding to the multiple image partitions respectively, and determining the positional relationship between the multiple image partitions in the image to be recognized, based on the multiple image partitions Determine the positional relationship between multiple content sets to be typeset; based on the positional relationship between the multiple content sets to be typeset and the positional relationship between positional relationship.
  • the to-be-recognized image when determining whether the to-be-recognized image includes multiple image partitions, it may be determined according to the positions of the multiple area frames in the to-be-recognized image (eg, coordinates in the to-be-recognized image). For example, as shown in Figure 2, when the content of the image to be recognized is clearly divided into two columns, the abscissa value of the upper left corner of the title frame will have a large gap, so it can be determined whether there are multiple image partitions based on this feature.
  • step S3023 may include: performing typesetting processing on multiple display contents based on multiple typesetting information to obtain multiple typeset display contents; and arranging multiple typeset display contents in sequence according to the positional relationship between the multiple display contents , to get the typeset document.
  • Typesetting processing is performed to obtain multiple typeset display contents; then, the multiple typeset display contents are sequentially arranged according to the positional relationship between the multiple display contents, so as to obtain a typesetting document.
  • FIG. 4D is a schematic diagram of a typesetting document corresponding to the to-be-recognized image shown in FIG. 2 according to an embodiment of the present disclosure.
  • the schematic diagram of the typesetting document corresponding to the image to be recognized shown in FIG. 2 includes three pages, namely page (1), page (2) and page (3), wherein page (1) and page ( 2) displays the content of the left image subregion in the to-be-recognized image shown in FIG. 2 , and page (3) displays the content of the right-hand image subregion in the to-be-recognized image shown in FIG. 2 .
  • the table and the picture are displayed in the typesetting document in the form of pictures (the table of the third major topic is not shown).
  • the above-mentioned table recognition model can also be used to generate a spreadsheet. Here No longer.
  • the typesetting document shown in FIG. 4D is a blank typesetting document corresponding to the object to be recognized shown in FIG.
  • the image processing method provided by the present disclosure can process the image to be recognized, obtain a typesetting document corresponding to the image to be recognized, and target the image to be recognized that contains questions (for example, images of test papers, exercise books, etc. obtained by photographing or scanning, etc.)
  • the unique features are optimized, the recognition accuracy of this type of image to be recognized is higher, and the degree of restoration of the typesetting document corresponding to this type of image to be recognized is higher, providing an efficient and convenient method for test paper management, test paper storage, error method of recording.
  • FIG. 5 is a schematic block diagram of an image processing apparatus provided by at least one embodiment of the present disclosure.
  • the image processing apparatus 500 may include: an acquisition unit 501 , an identification unit 502 and a typesetting unit 503 .
  • these modules may be implemented by hardware (eg, circuit) modules, software modules, or any combination of the two, and the following embodiments are the same, and will not be described again.
  • it may be implemented by a central processing unit (CPU), graphics processing unit (GPU), tensor processing unit (TPU), field programmable gate array (FPGA), or other form of data processing capability and/or instruction execution capability.
  • CPU central processing unit
  • GPU graphics processing unit
  • TPU tensor processing unit
  • FPGA field programmable gate array
  • the acquisition unit 501 is configured to acquire an image to be recognized.
  • the identifying unit 502 is configured to identify the to-be-identified image to obtain a plurality of area frames, a plurality of area information corresponding to the plurality of area frames, and a plurality of contents to be typeset.
  • the typesetting unit 503 is configured to typeset the plurality of contents to be typeset based on the image to be recognized and the plurality of area information, so as to obtain a typesetting document corresponding to the image to be recognized.
  • the acquiring unit 501, the identifying unit 502 and the typesetting unit 503 may include codes and programs stored in a memory; the processor may execute the codes and programs to implement the above-mentioned acquisition unit 501, the identifying unit 502 and the typesetting unit 503 Some functions or all functions.
  • the acquiring unit 501 , the identifying unit 502 and the typesetting unit 503 may be dedicated hardware devices for implementing some or all of the functions of the acquiring unit 501 , the identifying unit 502 and the typesetting unit 503 as described above.
  • the acquiring unit 501 , the identifying unit 502 and the typesetting unit 503 may be one circuit board or a combination of multiple circuit boards, for implementing the functions as described above.
  • the one circuit board or the combination of multiple circuit boards may include: (1) one or more processors; (2) one or more non-transitory memories connected to the processors; and (3) The firmware stored in the memory executable by the processor.
  • the acquiring unit 501 may be used to implement step S10 shown in FIG. 1
  • the identifying unit 502 may be used to implement step S20 shown in FIG. 1
  • the typesetting unit 503 may be used to implement step S30 shown in FIG. 1 . Therefore, for the specific description of the functions that the acquisition unit 501, the identification unit 502 and the typesetting unit 503 can implement, reference may be made to the relevant descriptions of steps S10 to S30 in the embodiments of the above-mentioned image processing method, and repetitions will not be repeated.
  • the image processing apparatus 500 can achieve technical effects similar to those of the aforementioned image processing method, which will not be repeated here.
  • the image processing apparatus 500 may include more or less circuits or units, and the connection relationship between the various circuits or units is not limited, and may be determined according to actual requirements .
  • the specific structure of each circuit or unit is not limited, and can be composed of analog devices, digital chips, or other suitable ways according to circuit principles.
  • FIG. 6 is a schematic diagram of an electronic device provided by at least one embodiment of the present disclosure.
  • the electronic device includes a processor 601 , a communication interface 602 , a memory 603 and a communication bus 604 .
  • the processor 601 , the communication interface 602 , and the memory 603 communicate with each other through the communication bus 604 , and the components such as the processor 601 , the communication interface 602 , and the memory 603 can also communicate through a network connection.
  • the present disclosure does not limit the type and function of the network. It should be noted that the components of the electronic device shown in FIG. 6 are only exemplary and not restrictive, and the electronic device may also have other components according to actual application requirements.
  • memory 603 is used for non-transitory storage of computer readable instructions.
  • the processor 601 is configured to execute the computer-readable instructions, the image processing method according to any one of the foregoing embodiments is implemented.
  • the image processing method according to any one of the foregoing embodiments is implemented.
  • the communication bus 604 may be a Peripheral Component Interconnect Standard (PCI) bus or an Extended Industry Standard Architecture (EISA) bus, or the like.
  • PCI Peripheral Component Interconnect Standard
  • EISA Extended Industry Standard Architecture
  • the communication bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of presentation, only one thick line is used in the figure, but it does not mean that there is only one bus or one type of bus.
  • the communication interface 602 is used to enable communication between the electronic device and other devices.
  • the processor 601 and the memory 603 may be provided on the server side (or the cloud).
  • the processor 601 may control other components in the electronic device to perform desired functions.
  • the processor 601 may be a central processing unit (CPU), a network processing unit (NP), a tensor processing unit (TPU), a graphics processing unit (GPU), or other devices with data processing capability and/or program execution capability; it may also be Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • the central processing unit (CPU) can be an X86 or an ARM architecture or the like.
  • memory 603 may include any combination of one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory.
  • Volatile memory may include, for example, random access memory (RAM) and/or cache memory, among others.
  • Non-volatile memory may include, for example, read only memory (ROM), hard disk, erasable programmable read only memory (EPROM), portable compact disk read only memory (CD-ROM), USB memory, flash memory, and the like.
  • ROM read only memory
  • EPROM erasable programmable read only memory
  • CD-ROM portable compact disk read only memory
  • USB memory flash memory, and the like.
  • One or more computer-readable instructions may be stored on the computer-readable storage medium, and the processor 601 may execute the computer-readable instructions to implement various functions of the electronic device.
  • Various application programs, various data and the like can also be stored in the storage medium.
  • the electronic device may also include an image capture component.
  • the image acquisition component is used to acquire images.
  • the memory 603 is also used to store acquired images.
  • the image acquisition component may be a smartphone camera, a tablet camera, a personal computer camera, a digital camera lens, or even a web camera.
  • the acquired image to be recognized may be an original image directly acquired by the image acquiring component, or an image acquired after preprocessing the original image.
  • Preprocessing can eliminate irrelevant information or noise information in the original image, so as to better process the acquired image.
  • the preprocessing may include, for example, performing image augmentation (Data Augment), image scaling, gamma (Gamma) correction, image enhancement, or noise reduction filtering on the original image.
  • FIG. 7 is a schematic diagram of a non-transitory computer-readable storage medium provided by at least one embodiment of the present disclosure.
  • the storage medium 700 may be a non-transitory computer-readable storage medium on which one or more computer-readable instructions 701 may be stored non-transitory.
  • the computer readable instructions 701 may perform one or more steps in the image processing method according to the above when executed by a processor.
  • the storage medium 700 may be applied to the above-mentioned electronic device, for example, the storage medium 700 may include a memory in the electronic device.
  • the storage medium may include a memory card of a smartphone, a storage component of a tablet computer, a hard disk of a personal computer, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM), A portable compact disk read only memory (CD-ROM), flash memory, or any combination of the above storage media, may also be other suitable storage media.
  • RAM random access memory
  • ROM read only memory
  • EPROM erasable programmable read only memory
  • CD-ROM compact disk read only memory
  • flash memory or any combination of the above storage media, may also be other suitable storage media.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Processing Or Creating Images (AREA)

Abstract

一种图像处理方法、图像处理装置、电子设备和存储介质。该图像处理方法包括:获取待识别图像;对待识别图像进行识别,以得到多个区域框、与多个区域框一一对应的多个区域信息和多个待排版内容;基于待识别图像和多个区域信息,对多个待排版内容进行排版,以得到与待识别图像对应的排版文档。通过该图像处理方法可以得到待识别图像对应的电子文档,从而便于对待识别图像进行各种操作,例如存储、传输、管理、备份、打印等操作。

Description

图像处理方法及装置、电子设备和存储介质 技术领域
本公开的实施例涉及一种图像处理方法、图像处理装置、电子设备和非瞬时性计算机可读存储介质。
背景技术
用户可以通过对纸质文件进行拍照,以提取纸质文件中的相关信息并进行电子存档,以便于对纸质文件进行管理和存储。例如,在一些使用场景中,学生在学习过程中会使用大量的试卷、作业、练习册等,并且需要对这些试卷、作业、练习册进行整理和反复练习,高效、便捷地进行试卷管理、试卷存储、错题记录等可以显著提高学生的学习效率;例如,在另一些使用场景中,在移动网络的发展下学生经常会以网课方式学习并且通过网络交作业,因此需要获得背景清晰的作业文档以通过网络进行提交。
发明内容
本公开至少一实施例提供一种图像处理方法,包括:获取待识别图像;对所述待识别图像进行识别,以得到多个区域框、与所述多个区域框一一对应的多个区域信息和多个待排版内容;基于所述待识别图像和所述多个区域信息,对所述多个待排版内容进行排版,以得到与所述待识别图像对应的排版文档。
例如,在本公开至少一实施例提供的图像处理方法中,对所述待识别图像进行识别,以得到多个区域框、与所述多个区域框一一对应的多个区域信息和多个待排版内容,包括:通过物体检测模型对所述待识别图像进行识别,以得到所述多个区域框以及所述多个区域信息,其中,所述多个区域框包括多个第一文本框;通过文字识别模型对所述多个第一文本框进行识别,以得到所述多个第一文本框一一对应的多个文本内容;其中,所述多个待排版内容包括所述多个文本内容中的一个或多个。
例如,本公开至少一实施例提供的图像处理方法还包括:根据所述多个区域信息和所述多个文本内容,确定所述多个第一文本框一一对应的多个文本类别,其中,所述多个第一文本框中的任一第一文本框对应的区域信息包括所述任一第一文本框的文本类别。
例如,在本公开至少一实施例提供的图像处理方法中,所述多个区域框还包括至少一个图框,对所述待识别图像进行识别,以得到多个区域框、与所述多个区域框一一对应的多个区域信息和多个待排版内容,还包括:提取所述至少一个图框分别对应的至少一个待排版图片,其中,所述多个待排版内容还包括所述至少一个待排版图片。
例如,在本公开至少一实施例提供的图像处理方法中,所述待识别图像为包含至少一个题目的图像,所述多个区域框还包括与所述至少一个题目一一对应的至少一个题目框,每个题目框在所述待识别图像中覆盖的区域内包括至少一个第一文本框,每个区域信息包括与所述每个区域信息对应的区域框在所述待识别图像中的位置信息,根据所述多个区域信息和所述多个文本内容,确定所述多个第一文本框一一对应的多个文本类别,包括:根据所述多个区域信息中的位置信息,确定所述至少一个题目框和所述多个第一文本框之间的对应关系;基于所述对应关系和所述多个文本内容,确定所述多个文本类别。
例如,在本公开至少一实施例提供的图像处理方法中,所述至少一个题目框包括第一题目框,在第一方向上,所述第一题目框具有第一边,所述多个第一文本框包括第一待处理框,所述多个文本类别包括大题名称,基于所述对应关系和所述多个文本内容,确定所述多个文本类别,包括:响应于所述对应关系指示所述第一待处理框位于所述待识别图像中由所述第一题目框所覆盖的区域内,且所述第一待处理框与所述第一边之间不具有任何区域框,确定所述第一待处理框的文本类别为所述大题名称;或者,响应于所述对应关系指示所述第一待处理框位于所述待识别图像中由所述第一题目框所覆盖的区域之外且所述第一待处理框与所述第一边之间不具有任何区域框,且确定所述第一待处理框对应的文本内容包含大题特征信息,确定所述第一待处理框的文本类别为所述大题名称。
例如,在本公开至少一实施例提供的图像处理方法中,所述多个第一文本框包括第二待处理框,所述多个文本类别包括标题,在第一方向上,所述第二待处理框具有第一边,所述待识别图像具有第一边缘,基于所述对应关系和所述多个文本内容,确定所述多个文本类别,包括:在所述对应关系指示所述第二待处理框不位于所述待识别图像中由所述至少一个题目框所覆盖的区域的情况下,响应于所述第二待处理框的第一边与所述待识别图像的第一边缘之间的距离小于预设距离且所述第二待处理框对应的文本内容包含标 题特征信息,确定所述第二待处理框的文本类别为所述标题。
例如,在本公开至少一实施例提供的图像处理方法中,基于所述待识别图像和所述多个区域信息,对所述多个待排版内容进行排版,以得到与所述待识别图像对应的排版文档,包括:基于所述多个区域信息和所述待识别图像,确定与所述多个待排版内容分别对应的多个排版信息;基于所述多个排版信息,对所述多个待排版内容进行排版,以得到所述排版文档。
例如,在本公开至少一实施例提供的图像处理方法中,基于所述多个区域信息和所述待识别图像,确定与所述多个待排版内容分别对应的多个排版信息,包括:通过分类模型对所述待识别图像进行分类处理,以确定所述待识别图像的图像类别;根据所述图像类别获取与所述图像类别对应的排版模板;根据所述排版模板和所述多个区域信息,确定所述多个排版信息。
例如,在本公开至少一实施例提供的图像处理方法中,根据所述排版模板和所述多个区域信息,确定所述多个排版信息,包括:对于所述多个待排版内容中的第i个待排版内容:响应于所述第i个待排版内容为文本内容,确定所述第i个待排版内容对应的区域框的区域信息,根据所述区域信息确定所述第i个待排版内容的文本类别;根据所述排版模板和所述第i个待排版内容的文本类别,确定所述第i个待排版内容对应的排版信息,其中,i为正整数,且小于等于所述多个待排版内容的总个数。
例如,在本公开至少一实施例提供的图像处理方法中,基于所述多个排版信息,对所述多个待排版内容进行排版,以得到所述排版文档,包括:对所述多个待排版内容进行处理,以得到多个显示内容;确定所述多个显示内容之间的位置关系;基于所述多个显示内容之间的位置关系和所述多个排版信息,对所述多个显示内容进行排版处理,以得到所述排版文档。
例如,在本公开至少一实施例提供的图像处理方法中,所述多个待排版内容包括至少一个第一待排版内容和至少一个第二待排版内容,所述至少一个题目框包括所述至少一个第一待排版内容,对所述多个待排版内容进行处理,以得到多个显示内容,包括:对所述至少一个第一待排版内容进行题号检测处理,以得到至少一个中间显示内容;对所述至少一个第二待排版内容和所述至少一个中间显示内容进行格式处理,以得到所述多个显示内容。
例如,在本公开至少一实施例提供的图像处理方法中,对所述至少一个第一待排版内容进行题号检测处理,以得到至少一个中间显示内容,包括:提取所述至少一个题目框对应的题号信息,以得到至少一个题号信息;确定 所述至少一个题目框之间的位置关系;基于所述至少一个题目框之间的位置关系和所述至少一个题号信息,判断是否存在遗漏题号的情况,响应于存在遗漏题号的情况:提取遗漏的题号信息,确定所述遗漏的题号信息在所述待识别图像中对应的遗漏区域,基于所述遗漏区域补全所述遗漏的题号信息,以得到所述遗漏区域对应的遗漏显示内容,将所述遗漏显示内容和所述至少一个第一待排版内容作为所述至少一个中间显示内容,响应于不存在遗漏题号的情况:将所述至少一个第一待排版内容作为所述至少一个中间显示内容。
例如,在本公开至少一实施例提供的图像处理方法中,确定所述多个显示内容之间的位置关系,包括:根据所述多个区域信息中的位置信息,确定所述多个区域框在所述待识别图像中的位置;基于所述多个区域框在所述待识别图像中的位置,确定所述多个待排版内容之间的位置关系;根据所述多个待排版内容之间的位置关系,确定所述多个显示内容之间的位置关系。
例如,在本公开至少一实施例提供的图像处理方法中,根据所述多个待排版内容之间的位置关系,确定所述多个显示内容之间的位置关系,包括:根据所述多个区域框在所述待识别图像中的位置,确定所述待识别图像是否包括多个图像分区,响应于所述待识别图像包括多个图像分区,确定所述多个图像分区分别对应的多个待排版内容集合,并确定在所述待识别图像中所述多个图像分区之间的位置关系,基于所述多个图像分区之间的位置关系,确定所述多个待排版内容集合之间的位置关系;基于所述多个待排版内容集合之间的位置关系和所述多个待排版内容之间的位置关系,确定所述多个显示内容之间的位置关系。
例如,在本公开至少一实施例提供的图像处理方法中,基于所述多个显示内容之间的位置关系和所述多个排版信息,对所述多个显示内容进行排版处理,以得到所述排版文档,包括:基于所述多个排版信息,对所述多个显示内容进行排版处理,以得到多个排版后显示内容;将所述多个排版后显示内容按照所述多个显示内容之间的位置关系依次排列,得到所述排版文档。
例如,在本公开至少一实施例提供的图像处理方法中,所述文本类别包括手写文本,响应于所述多个文本内容中的第一文本内容的文本类别为手写文本,且所述排版文档不包含所述第一文本内容,对所述待识别图像进行识别,以得到多个区域框、与所述多个区域框一一对应的多个区域信息和多个待排版内容,还包括:将所述第一文本内容从所述多个文本内容中删除,以得到剩余的至少一个文本内容,其中,所述多个待排版内容包括所述剩余的 至少一个文本内容而不包括所述第一文本内容。
本公开至少一实施例提供一种图像处理装置,包括:获取单元,配置为获取待识别图像;识别单元,配置为对所述待识别图像进行识别,以得到多个区域框、与所述多个区域框一一对应的多个区域信息和多个待排版内容;排版单元,配置为基于所述待识别图像和所述多个区域信息,对所述多个待排版内容进行排版,以得到与所述待识别图像对应的排版文档。
本公开至少一实施例提供一种电子设备,包括:存储器,非瞬时性地存储有计算机可执行指令;处理器,配置为运行所述计算机可执行指令,其中,所述计算机可执行指令被所述处理器运行时实现根据本公开任一实施例所述的图像处理方法。
本公开至少一实施例提供一种非瞬时性计算机可读存储介质,其中,所述非瞬时性计算机可读存储介质存储有计算机可执行指令,所述计算机可执行指令被处理器执行时实现根据本公开任一实施例所述的图像处理方法。
附图说明
为了更清楚地说明本公开实施例的技术方案,下面将对实施例的附图作简单地介绍,显而易见地,下面描述中的附图仅仅涉及本公开的一些实施例,而非对本公开的限制。
图1为本公开至少一实施例提供的一种图像处理方法的示意性流程图;
图2为本公开至少一实施例提供的一种待识别图像的示意图;
图3为本公开至少一实施例提供的带有区域框的待识别图像的示意图;
图4A为图1所示的图像处理方法中步骤S30的示例流程图;
图4B为图4A所示的图像处理方法中步骤S302的示例流程图;
图4C为本公开一实施例提供的包含文章段落的排版文档的示意图;
图4D为本公开一实施例提供的图2所示的待识别图像对应的排版文档的示意图;
图5为本公开至少一实施例提供的一种图像处理装置的示意性框图;
图6为本公开至少一实施例提供的一种电子设备的示意图;以及
图7为本公开至少一实施例提供的一种非瞬时性计算机可读存储介质的示意图。
具体实施方式
为了使得本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例的附图,对本公开实施例的技术方案进行清楚、完整地描述。显然,所描述的实施例是本公开的一部分实施例,而不是全部的实施例。基于所描述的本公开的实施例,本领域普通技术人员在无需创造性劳动的前提下所获得的所有其他实施例,都属于本公开保护的范围。
除非另外定义,本公开使用的技术术语或者科学术语应当为本公开所属领域内具有一般技能的人士所理解的通常意义。本公开中使用的“第一”、“第二”以及类似的词语并不表示任何顺序、数量或者重要性,而只是用来区分不同的组成部分。“包括”或者“包含”等类似的词语意指出现该词前面的元件或者物件涵盖出现在该词后面列举的元件或者物件及其等同,而不排除其他元件或者物件。“连接”或者“相连”等类似的词语并非限定于物理的或者机械的连接,而是可以包括电性的连接,不管是直接的还是间接的。“上”、“下”、“左”、“右”等仅用于表示相对位置关系,当被描述对象的绝对位置改变后,则该相对位置关系也可能相应地改变。为了保持本公开实施例的以下说明清楚且简明,本公开省略了部分已知功能和已知部件的详细说明。
本公开至少一实施例提供一种图像处理方法、图像处理装置、电子设备和非瞬时性计算机可读存储介质,该图像处理方法包括:获取待识别图像;对待识别图像进行识别,以得到多个区域框、与多个区域框一一对应的多个区域信息和多个待排版内容;基于待识别图像和多个区域信息,对多个待排版内容进行排版,以得到与待识别图像对应的排版文档。
本公开至少一实施例提供的图像处理方法通过预先训练的模型对待识别图像进行处理,得到多个待排版内容及其类别信息,从而依据待排版内容的类别信息获取对应的排版格式以对待排版内容排版,最终得到排版文档。通过该图像处理方法可以得到待识别图像对应的电子文档,从而便于对待识别图像进行各种操作,例如存储、传输、管理、备份、打印等操作。
本公开实施例提供的图像处理方法可应用于本公开实施例提供的图像处理装置,该图像处理装置可被配置于电子设备上。该电子设备可以是个人计算机、移动终端等,该移动终端可以是手机、平板电脑等硬件设备。
下面结合附图对本公开的实施例进行详细说明,但是本公开并不限于这些具体的实施例。
图1为本公开至少一实施例提供的一种图像处理方法的示意性流程图。
如图1所示,本公开至少一实施例提供的图像处理方法包括步骤S10至 步骤S30。
步骤S10:获取待识别图像。
步骤S20:对待识别图像进行识别,以得到多个区域框、与多个区域框一一对应的多个区域信息和多个待排版内容。
步骤S30:基于待识别图像和多个区域信息,对多个待排版内容进行排版,以得到与待识别图像对应的排版文档。
在本公开的一些实施例中,步骤S10中的待识别图像为包含题目的图像,例如,待识别图像为试卷、作业、练习册等的图像,试卷、作业、练习册等可以为纸质文件,从而通过本公开实施例提供的图像处理方法得到纸质试卷的电子文档,从而可以对试卷等进行存储、传输、管理、备份、打印等操作。试卷可以为各个学科的试卷,例如,语文、数学、外语(例如,英语等),类似地,练习册也可以为各个学科的练习册等。
例如,题目可以包括文本内容,例如,文本内容可以包括各种语言的文本,例如,中文(汉字和/或拼音)、英文、日文等,例如,文本内容还可以包括各种数字(中文数字、罗马数字、阿拉伯数字等)、符号(例如,大于符号、小于符号、百分号等)、图形(圆形、矩形等),例如,文本内容可以具有各种字体、各种颜色等。例如,文本内容可以包括印刷文本内容和手写文本内容,手写文本内容例如包括手写的单词和字母、手写的数字、手写的符号和图形等。
例如,题目还可以包括图片或表格等其他类型的信息。本公开对待识别图像中的题目所包括的内容不作具体限制。
例如,待识别图像可以为通过图像采集装置(例如,数码相机或手机等)拍摄的图像,待识别图像可以为灰度图像,也可以为彩色图像。需要说明的是,待识别图像是指以可视化方式呈现待处理物体(例如,试卷、作业、练习册等)的形式,例如待处理物体的图片等。又例如,待识别图像也可以通过扫描等方式得到。例如,待识别图像可以为图像采集装置直接采集到的图像,也可以是对采集得到的图像进行预处理之后获得的图像。
例如,排版文档可以为电子文档。
图2为本公开至少一实施例提供的一种待识别图像的示意图。如图2所示,该待识别图像为试卷图像,该待识别图像包含多个题目。例如,图2中示例的黑色方框所划分的多个区域为分别对应多个题目,例如,这里每个题目指代一个小题(图2中由阿拉伯数字引出,例如,“1.在括号里填合适的单 位”等),多个题目构成一个大题(例如“一、填空题”、“二、选择题”等)。例如,如图2中标注的题目1所示,题目1可以包括文本,文本包括符号、文字、数字、手写文本等;例如,如图2中标注的题目2所示,题目2可以包括文本和表格;例如,如图2中标注的题目3所示,题目3可以包括文本和图片。通过对该试卷应用本公开至少一实施例提供的图像处理方法可以得到该试卷对应的排版文档。
需要说明的是,本公开中的待识别图像及待识别图像包含的题目不限于图2所示的情况,待识别图像也可以为练习册等,题目也可以指代其他题目形式,例如,待识别图像包括的每个题目也可以指代一个大题。
例如,可以采用预先训练好的物体检测模型确定待识别图像中的多个区域框及与多个区域框对应的区域信息,例如,区域信息可以包括区域框在待识别图像中的位置信息;之后,通过文字识别模型确定区域框中的文本框对应的文本内容,将文本内容作为待排版内容。又例如,可以采用预先训练好的模型直接获取待识别图像中的多个区域框、与多个区域框对应的区域信息以及区域框中的文本框对应的文本内容,将文本内容作为待排版内容,这里,预先训练好的模型可以完成前述物体检测模型和文字识别模型的功能。也就是说,对待识别图像进行识别的识别模型可以是多个模型,也可以是一个模型,本公开对此不作限制。
例如,步骤S20可以包括:通过物体检测模型对待识别图像进行识别,以得到多个区域框以及多个区域信息,其中,多个区域框包括多个第一文本框;通过文字识别模型对多个第一文本框进行识别,以得到多个第一文本框一一对应的多个文本内容。
例如,多个待排版内容包括多个文本内容中的一个或多个。
例如,步骤S20还可以包括:根据多个区域信息和多个文本内容,确定多个第一文本框一一对应的多个文本类别。
例如,多个第一文本框中的任一第一文本框对应的区域信息包括该任一第一文本框的文本类别。
例如,物体检测模型可以为预先训练好的神经网络的分类模型,例如,Faster R-CNN(Faster Region-Convolutional Neural Networks)、R-FCN(Region-based Fully Convolutional Network)等模型。例如,基于预先训练好的物体检测模型可以将待识别图像中的文本、图片等区域识别出来,并为不同的区域框标注不同的分类,例如,区域框可以包括文本框、图框、题目框 等,例如,题目框可以由至少一个文本框和/或至少一个图框组成,例如,题目框可以为图2中“题目1”所对应的框,也即一个题目对应一个题目框;例如,在一些实施例中,区域框还可以包括学生信息框、标题框等,由于学生信息、标题等通常具有特定的格式、文字特征,因此也可以通过训练物体检测模型,以使得物体检测模型能够直接对待识别图像进行处理得到标题框和学生信息框。
需要说明的是,标题框和学生信息框也可以均属于文本框,也就是说,物体检测模型可以不对学生信息、标题等对应的区域框进行分类,而直接将学生信息、标题等对应的区域框归类为文本框。
例如,当多个区域框还包括至少一个图框时,步骤S20还可以包括:提取至少一个图框分别对应的至少一个待排版图片。此时,多个待排版内容还包括至少一个待排版图片。
例如,在一些实施例中,待识别图像中包含复杂的数学形式或数学公式,例如数学运算中的竖式等,可以将数学形式或数学公式作为图框,以图片的形式将数学形式或数学公式作为排版文档的一部分,从而简化排版文档的生成。例如,对于竖式,也可以获取竖式中每行的文本内容,按照竖式中文本内容之间的位置关系和竖式对应的模板对其进行排版,以生成排版后的竖式,该排版后的竖式作为排版文档的一部分。
例如,在一些实施例中,通过物体检测模型识别得到的图框中可能会出现文本框,也即图框中的一些文本内容也会有相应的文本框,在通过文字识别模型获取文本内容前,需要去除这部分文本框,将图框和图框内的文本内容视为一个整体的图框。
例如,多个区域框还包括至少一个第二文本框,至少一个第二文本框位于至少一个图框所在区域内,至少一个图框中的任一图框包括图片,提取至少一个图框分别对应的至少一个待排版图片,包括:响应于待识别图像中由任一图框所覆盖的区域包括待识别图像中由至少一个第二文本框中的N个第二文本框所覆盖的区域,将N个第二文本框对应的文本内容与任一图框中的图片整体作为任一图框对应的待排版图片,其中,N为正整数。
例如,当多个区域框还包括学生信息框时,步骤S20还可以包括:通过文字识别模型对学生信息框进行识别,以得到学生信息,其中,多个待排版内容还包括学生信息。
例如,当多个区域框还包括标题框时,步骤S20还可以包括:通过文字 识别模型对标题框进行识别,以得到标题信息,其中,多个待排版内容还包括标题信息。
例如,一些待识别图像中可能在页眉和页脚处具有特殊的信息需要在排版文档中显示,通过物体检测模型得到的多个区域框还可以包括页眉框和页脚框,步骤S20还可以包括:通过文字识别模型对页眉框和页脚框中进行识别,以得到页眉信息和页脚信息,其中,多个待排版内容还包括页眉信息和页脚信息。之后在步骤S30中,基于预先设定的页眉和页脚的格式对页眉信息和页脚信息进行排版,生成带有页眉信息和页脚信息的排版文档。
需要说明的是,学生信息、标题信息、页眉信息和页脚信息等的表现形式均可以为文本。
例如,待识别图像中可能存在表格,表格可以根据表格识别模型识别为表格框,并转换为电子表格。例如,多个区域框还包括至少一个表格框,每个表格框对应的区域信息包括表格信息,步骤S20还可以包括:通过表格识别模型对待识别图像进行识别,得到至少一个表格框;基于至少一个表格框和与至少一个表格框对应的表格信息,生成与至少一个表格框对应的至少一个表格内容,其中,多个待排版内容还包括至少一个表格内容。
例如,在一些实施例中,待识别图像中的一个表格对应一个表格框,一个表格框可以包括多个文本框,每个文本框对应该表格中的文本内容,表格信息包括表格框与该表格框所包括的多个文本框之间的位置关系,例如,基于表格的行数和列数生成一个表格,并将文本框中的文本内容填入生成的表格中的相应位置,以生成一个表格内容。需要说明的是,也可以采用其他表格识别及生成方式生成表格内容,本公开对此不作限制。
例如,文字识别模型可以包括基于神经网络的字符识别模型,例如OCR(Optical Character Recognition,光学字符识别)模型等,文字识别模型可以识别文本框的文本内容,例如,文字识别模型可以将识别的文本框的文本内容作为待排版内容,例如这里的文本内容可以包括印刷类型的文本内容和手写类型的文本内容,也即通过文字识别模型可以不作区分的输出印刷类型的文本内容和手写类型的文本内容作为待排版内容。
例如,在一些实施例中,文字识别模型还可以识别文本内容的类型,例如为印刷类型或手写类型。当文本内容的类型为手写类型时,为其设置与印刷类型的文本内容所不同的特定排版格式,例如设置手写类型的文本内容的字体为手写字体等,以生成排版文档。
例如,在一些实施例中,文本类别包括手写文本,当文本内容的类型为手写类型时,该文本框的文本类别为手写文本,例如,响应于多个文本内容中的第一文本内容的文本类别为手写文本,且排版文档不包含第一文本内容,步骤S20还可以包括:将第一文本内容从多个文本内容中删除,以得到剩余的至少一个文本内容,其中,多个待排版内容包括剩余的至少一个文本内容而不包括第一文本内容。
需要说明的是,“印刷类型的文本内容”不仅仅指代通过输入装置在电子设备上输入的文字、字符、图形等内容,印刷类型的文本内容也可以为印刷得到的用户手写的内容。
也就是说,对带有手写内容的试卷、练习册等的待识别图像应用本公开所提供的图像处理方法,可以生成去除手写内容的试卷、练习册等的文档,从而可以方便地对试卷、练习册等进行重复练习。需要说明的是,本公开也可以根据实际需要识别文本类别为手写文本的文本,并将其作为待排版内容的一部分生成排版文档,本公开对此不作限制。
为获得较好的显示效果,需要对不同的文本设置不同的排版格式,例如字体、字号、加粗、斜体、缩进等,可以将文本框进行进一步细分,确定不同文本框的不同文本类别,从而根据该文本类别对应的格式对文本内容进行排版,得到排版效果较好的排版文档。
例如,待识别图像为包含至少一个题目的图像,多个区域框还包括与至少一个题目一一对应的至少一个题目框,每个题目框在待识别图像中覆盖的区域内包括至少一个第一文本框,每个区域信息包括与每个区域信息对应的区域框在待识别图像中的位置信息,根据多个区域信息和多个文本内容,确定多个第一文本框一一对应的多个文本类别,可以包括:根据多个区域信息中的位置信息,确定至少一个题目框和多个第一文本框之间的对应关系;基于对应关系和多个文本内容,确定多个文本类别。
例如,多个文本类别包括大题名称,这里“大题名称”指代如图2中所示的“选择题”、“填空题”等类别性题目分类名称,由于需要对其进行特殊格式处理(例如字体加大、加粗显示等),所以需要从多个文本框中识别出该类文本框。
例如,至少一个题目框包括第一题目框,在第一方向上,第一题目框具有第一边,多个第一文本框包括第一待处理框,基于对应关系和多个文本内容,确定多个文本类别,可以包括:响应于对应关系指示第一待处理框位于 待识别图像中由第一题目框所覆盖的区域内,且第一待处理框与第一边之间不具有任何区域框,确定第一待处理框的文本类别为大题名称;或者,响应于对应关系指示第一待处理框位于待识别图像中由第一题目框所覆盖的区域之外且第一待处理框与第一边之间不具有任何区域框,且确定第一待处理框对应的文本内容包含大题特征信息,确定第一待处理框的文本类别为大题名称。
需要说明的是,“第一待处理框位于待识别图像中由第一题目框所覆盖的区域内”表示在待识别图像上,第一待处理框所覆盖的区域位于第一题目框所覆盖的区域内。在图2所示的示例中,第一方向可以为竖直方向。
例如,当第一待处理框位于待识别图像中由第一题目框所覆盖的区域内时,由于大题名称通常位于第一行,则第一边可以为第一题目框中的上侧边;例如,当第一待处理框位于待识别图像中由第一题目框所覆盖的区域之外时,由于大题名称通常与题目框的上侧边相邻,则第一边可以为第一题目框中的上侧边。例如,大题特征信息表示文本内容包括中文大写数字,例如,文本内容包括“一”、“二”、“三”等内容。
例如,多个文本类别还包括标题,这里“标题”指代如图2中所示的位于试卷头部的文本:“第一单元综合测试卷”,由于需要对其进行特殊格式处理(例如字号较大、加粗显示等),所以需要从多个文本框中识别出该类文本框。
例如,多个第一文本框包括第二待处理框,在第一方向上,第二待处理框具有第一边,待识别图像具有第一边缘,基于对应关系和多个文本内容,确定多个文本类别,包括:在对应关系指示第二待处理框不位于待识别图像中由至少一个题目框所覆盖的区域的情况下,响应于第二待处理框的第一边与待识别图像的第一边缘之间的距离小于预设距离且第二待处理框对应的文本内容包含标题特征信息,确定第二待处理框的文本类别为标题。
由于标题通常位于试卷头部且靠近试卷上侧边,则第一边缘可以为待识别图像的上侧边,第一边可以为第二待处理框的上侧边,从而可以根据第二待处理框在待识别图像中的位置及其对应的文本内容,确定第二待处理框的文本类别是否为标题。
图3为本公开至少一实施例提供的带有区域框的待识别图像的示意图。如图3所示,图3中的多个方框为通过步骤S20对待识别图像进行识别所得到的多个区域框,例如,多个区域框包括图框和文本框,例如,待识别图像 中位于最上方的文本框为文本类别为标题的文本框;例如,每个题目框对应一个大题,例如,题目框1对应第一个大题(如图3中的“一、列竖式计算”),题目框2对应第五个大题(如图3中的“五、…”);例如,位于题目框内第一行的文本框为文本类别为大题名称的文本框(如图3中第二个大题所示);例如,文本内容的类型为手写类型的文本框为文本类别为手写文本的文本框(如图3中文本框“答:小聪家离学校近”)。
需要说明的是,图3所示的区域框仅为区域框的一种示例,在对物体检测模型进行不同的训练时,可以生成不同形式的区域框,例如,文本框可以包含多行文本内容等,本公开对此不作限制。
例如,在获取待排版内容后,可以根据不同待排版内容所对应的不同排版格式对待排版内容进行格式调整,以生成排版文档。
图4A为图1所示的图像处理方法中步骤S30的示例流程图。如图4A所示,图像处理方法中的步骤S30可以具体包括步骤S301-S302。
在步骤S301,基于多个区域信息和待识别图像,确定与多个待排版内容分别对应的多个排版信息。
例如,步骤S301可以包括:通过分类模型对待识别图像进行分类处理,以确定待识别图像的图像类别;根据图像类别获取与图像类别对应的排版模板;根据排版模板和所述多个区域信息,确定多个排版信息。
例如,由于不同学科的文本特点不同,因而可以采用分类模型将待识别图像按照学科进行划分,例如语文、数学、英语等,以获取该学科类别所对应的排版模板。当然,也可以根据需要采用其他的分类方式,本公开对此不作限制。
例如,排版模板可以规定每行字数、字体大小、字体类别、字间距、行间距、段间距等信息,例如,中文的字体类别采用宋体,英文和数字的字体类别采用“Times New Roman”,标题的字号较大(例如,三号等)且加黑、加粗,大题名称的字号较大(例如,小三号等)且加黑、加粗等。
例如,在步骤S301中,根据排版模板和多个区域信息,确定多个排版信息,可以包括:对于多个待排版内容中的第i个待排版内容:响应于第i个待排版内容为文本内容,确定第i个待排版内容对应的区域框的区域信息,根据区域信息确定第i个待排版内容的文本类别;根据排版模板和第i个待排版内容的文本类别,确定第i个待排版内容对应的排版信息,其中,i为正整数,且小于等于多个待排版内容的总个数。
排版信息规定了待排版内容的排版格式,例如,待排版内容的文本类别为大题名称,则其排版信息可以包括每行字数、字间距等通用排版格式,还可以包括字号、字体加黑加粗等针对大题名称所特殊设置的排版格式。
通过获取预先设定的排版模板得到排版信息,以对待排版内容进行排版,减少了排版的复杂性,可以获得排版更加优良的排版文档。
在步骤S302,基于多个排版信息,对多个待排版内容进行排版,以得到排版文档。
例如,图4B为图4A所示的图像处理方法中步骤S302的示例流程图。如图4B所示,图像处理方法中的步骤S302可以具体包括步骤S3021-S3023。
在步骤S3021,对多个待排版内容进行处理,以得到多个显示内容。
在步骤S3022,确定多个显示内容之间的位置关系。
在步骤S3023,基于多个显示内容之间的位置关系和多个排版信息,对多个显示内容进行排版处理,以得到排版文档。
例如,多个待排版内容包括至少一个第一待排版内容和至少一个第二待排版内容,至少一个题目框包括至少一个第一待排版内容,步骤S3021可以包括:对至少一个第一待排版内容进行题号检测处理,以得到至少一个中间显示内容;对至少一个第二待排版内容和至少一个中间显示内容进行格式处理,以得到多个显示内容。
例如,这里第一待排版内容指题目框所包含的待排版内容,第二待排版内容指除题目框所包含的待排版内容外的其他待排版内容,例如标题、学生信息等。
例如,为避免因待识别图像不完整等原因造成遗漏题目,可以对题目框所包含第一待排版内容进行题号检测处理,确定是否存在遗漏题号,找回遗漏的题目,从而获得完整的排版文档。这里题号可以指代大题名称中的题号,例如大写数字“一”、“二”、“三”等,也可以指代每个小题的题号,例如“1”、“2”、“3”等。
例如,对至少一个第一待排版内容进行题号检测处理,以得到至少一个中间显示内容,可以包括:提取至少一个题目框对应的题号信息,以得到至少一个题号信息;确定至少一个题目框之间的位置关系;基于至少一个题目框之间的位置关系和至少一个题号信息,判断是否存在遗漏题号的情况,响应于存在遗漏题号的情况:提取遗漏的题号信息,确定遗漏的题号信息在待识别图像中对应的遗漏区域,基于遗漏区域补全遗漏的题号信息,以得到遗 漏区域对应的遗漏显示内容,将遗漏显示内容和至少一个第一待排版内容作为至少一个中间显示内容,响应于不存在遗漏题号的情况:将至少一个第一待排版内容作为至少一个中间显示内容。
例如,基于至少一个题目框之间的位置关系和至少一个题号信息,判断是否存在遗漏题号的情况,可以包括:基于至少一个题目框之间的位置关系,对至少一个题号信息排序,以得到题号信息序列;响应于题号信息序列中的任意两个相邻的题号信息连续,确定不存在遗漏题号的情况,响应于题号信息序列中的至少两个相邻的题号信息存在不连续的情况,确定存在遗漏题号的情况。
例如,基于遗漏区域补全遗漏的题号信息,以得到遗漏区域对应的遗漏显示内容,可以包括:响应于遗漏区域存在题目框,为题目框补全遗漏的题号信息,以得到遗漏区域对应的遗漏显示内容,其中,遗漏区域对应的遗漏显示内容包括遗漏的题号信息和题目框中的文本内容;响应于遗漏区域不存在题目框,对遗漏区域进行识别,以得到遗漏区域对应的遗漏显示内容,其中,遗漏区域对应的遗漏显示内容包括遗漏区域中的文本内容和遗漏的题号信息。
例如,在一些实施例中,识别的题号信息序列中存在跳号,例如,识别的题号序列为“1、3、4、5…”,则判断题号为“2”的题目遗失,即遗漏的题号信息为“2”,从而确定题号为“1”的题目框和题号为“3”的题目框之间的区域为遗漏区域;之后,检测遗漏区域是否存在题目框,若存在题目框,可以通过遗漏区域中未归入题目框的文本框补全题号信息,或直接基于得到的遗漏的题号信息为题目框补全题号信息;若不存在题目框,可以再次执行步骤S20对遗漏区域进行识别,以得到遗漏区域对应的遗漏显示内容。
在对至少一个第一待排版内容进行题号检测处理后,可以对所得到的至少一个第二待排版内容和至少一个中间显示内容(以下,至少一个第二待排版内容和至少一个中间显示内容统称为待处理内容)进行格式处理,以得到多个显示内容。
例如,由于排版文档中的格式与待识别图像中的排版可能不同,例如,每行字数不同等,而题目中可能包含多行文字,因而需要对同一题目中的不同文本行中的文本内容进行追加、分段,也即是,格式处理可以包括对待处理内容进行文本分段处理,以得到多个显示内容,例如,将属于同一段落的待排版内容划分为一个显示内容,也即一个显示内容可以对应一个段落。每 个段落包括至少一行文本内容。
例如,若题目框中的两个文本框上下相邻,且两个文本框中的文字个数均至少大于预设字数阈值,则判断两个文本框中的文本内容属于同一段落。
例如,由于题目中的题干内容通常为一句话,所以如果一个文本框中的文本内容以句号、问号等符号结束,且在水平方向上该符号之后不存在其他文本内容,可以判断一个段落结束。
例如,可以根据文本内容的长度判断是否需要分段。例如,如果连续的三行文本内容中,第一行文本内容和第三行文本内容都比较长,而第二行文本内容比较短,且在第二行文本内容和第三行文本内容之间没有图片,则判断段落在第二行文本内容处结束,第三行文本内容属于一个新的段落。
例如,可以根据题型特征进行分段。例如,对于选择题,如果文本内容为选项且选项与上侧相邻文本框和下侧相邻文本框位于不同行,则判断选项属于一个新的独立段落。
例如,对于包含多个问题的题目,如果文本内容中出现小题题号,且小题题号不位于上侧相邻文本框中文本内容的水平右侧,则判断包含小题题号的文本内容属于一个新的段落。
需要说明的是,在本公开上面的描述中,以待识别图像中的文本按照横排从上向下的方式进行排版为例进行说明,然而本公开不限于此,若待识别图像中的文本按照竖排从右向左的方式进行排版时,此时上述的“上侧”可以为“右侧”,上述“下侧”可以为“左侧”,上述“水平右侧”可以为“竖直上侧”。
例如,由于文章段落通常存在第一行文本内容缩进N个字符的情况,可以根据待处理内容中的第一行文本内容是否存在缩进N个字符的情况识别出文章段落,以按照文章格式进行排版。例如,格式处理可以包括将存在第一行文本内容缩进N个字符的待处理内容进行文本分段处理,并将第一行文本内容按照预设的缩进格式排版,以得到多个显示内容,这里,N为正整数且大于1。需要说明的是,若文章段落包括多个段落,文章段落中的每一段落的第一行文本内容均按照该预设的缩进格式进行排版。
图4C为本公开至少一实施例提供的包含文章段落的排版文档示意图。如图4C所示,大题名称加粗、加黑显示;大题名称下的第一行(图4C所示的题干1)因为并未缩进两个字符以上,按照题干的排版信息进行排版;英文段落的第一行文本内容缩进了两个字符以上,判断该英文段落为文章段落,因 此将英文段落作为一个显示内容并按照预设的缩进格式进行排版;英文段落后的文本内容(图4C所示的题干2)中出现小题题号,且小题题号不位于前一文本行的水平右侧,则判断每个包含小题题号的文本内容属于一个段落,将该包含小题题号的文本内容作为一个显示内容,按照题干的排版信息进行排版。
例如,由于文字识别模型中可能对分数、上标记和下标记等特殊格式采用特殊的表示方式,所以格式处理还可以包括对包含特殊格式的待处理内容进行格式转换,以得到该包含特殊格式的待处理内容对应的显示内容。例如,
Figure PCTCN2022073310-appb-000001
显示内容显示于排版文档中。
例如,对于分数、上标记和下标记等特殊格式也可以采用专用格式表示,例如采用LaTex表示方法表示数学符号,从而文字识别模型可以直接输出显示内容显示于排版文档中,而不需要进行格式处理。
例如,步骤S3022可以包括:根据多个区域信息中的位置信息,确定多个区域框在待识别图像中的位置;基于多个区域框在待识别图像中的位置,确定多个待排版内容之间的位置关系;根据多个待排版内容之间的位置关系,确定多个显示内容之间的位置关系。
例如,如图2所示,待识别图像中可能出现分栏或分页的情况,每栏或每页称为一个图像分区,例如,试卷通常是一面两页或三页,也即形成了两个或三个图像分区。具有分栏或分页情况的待识别图像需要进行格式处理,以将同一栏或同一页的题目归于排版文档的同一页中,例如,可以基于区域框对应的区域信息中的位置信息完成。
例如,根据多个待排版内容之间的位置关系,确定多个显示内容之间的位置关系,可以包括:根据多个区域框在待识别图像中的位置,确定待识别图像是否包括多个图像分区,响应于待识别图像包括多个图像分区,确定多个图像分区分别对应的多个待排版内容集合,并确定在待识别图像中多个图像分区之间的位置关系,基于多个图像分区之间的位置关系,确定多个待排版内容集合之间的位置关系;基于多个待排版内容集合之间的位置关系和多个待排版内容之间的位置关系,确定多个显示内容之间的位置关系。
例如,确定待识别图像是否包括多个图像分区时,可以根据多个区域框在待识别图像中的位置(例如,在待识别图像中的坐标)来确定。例如,如 图2所示,当待识别图像的内容明显分为两列排列时,题目框的左上角坐标的横坐标值会产生很大间隔,因而可以根据这个特征判断是否存在多个图像分区,并将左上角坐标的横坐标值的差值满足预设阈值的题目框作为一个待排版内容集合,将一个待排版内容集合中的显示内容排列在排版文档中的同一页。
例如,步骤S3023可以包括:基于多个排版信息,对多个显示内容进行排版处理,以得到多个排版后显示内容;将多个排版后显示内容按照多个显示内容之间的位置关系依次排列,得到排版文档。
在获得多个显示内容后,基于排版信息对多个显示内容进行排版处理,包括:按照排版信息规定的排版格式,例如字体、字号、行间距、段间距、字间距等,对多个显示内容进行排版处理,以得到多个排版后显示内容;之后,按照多个显示内容之间的位置关系依次排列多个排版后显示内容,从而得到排版文档。
图4D为本公开一实施例提供的图2所示的待识别图像对应的排版文档的示意图。
如图4D所示,图2所示的待识别图像对应的排版文档的示意图包括三页,分别为页面(1)、页面(2)和页面(3),其中,页面(1)和页面(2)中显示图2所示的待识别图像中左侧图像分区的内容,页面(3)显示图2所示的待识别图像中右侧图像分区的内容。
如图4D所示,不同的文本类别具有不同的格式,标题、大题名称、题干等不同文本内容分别采用不同的格式显示,例如,标题对应的格式为居中、中文字体为“宋体”、英文字体为“Times New Roman”、字体大小为三号且加粗;大题名称对应的格式为左对齐、中文字体为“宋体”、英文字体为“Times New Roman”、字体大小为四号且加粗;题干对应的格式为左对齐、中文字体为“宋体”、英文字体为“Times New Roman”、字体大小为小四号。
如图4D所示,表格、图片均以图片的形式显示于排版文档中(第三个大题的表格未示出),对于表格,也可以应用前述的表格识别模型等方式生成电子表格,这里不再赘述。
此外,可以看到在排版文档中已删除图2中的手写内容,图4D所示的排版文档为对图2所示的待识别对象对应的空白的排版文档,也就是说,用户可以根据需要生成不带有手写内容的试卷的排版文档,从而可以重复练习、备份等。
本公开所提供的图像处理方法可以对待识别图像进行处理,得到待识别图像对应的排版文档,并且针对包含题目的待识别图像(例如通过拍照或扫描等方式获取的试卷、练习册等的图像)所特有的特点进行优化,对于这类待识别图像的识别精度更高,这类待识别图像对应的排版文档的还原度更高,提供了一种高效、便捷地进行试卷管理、试卷存储、错题记录的方法。
本公开至少一实施例还提供一种图像处理装置,图5为本公开至少一实施例提供的一种图像处理装置的示意性框图。
如图5所示,图像处理装置500可以包括:获取单元501、识别单元502和排版单元503。
例如,这些模块可以通过硬件(例如电路)模块、软件模块或二者的任意组合等实现,以下实施例与此相同,不再赘述。例如,可以通过中央处理单元(CPU)、图像处理器(GPU)、张量处理器(TPU)、现场可编程逻辑门阵列(FPGA)或者具有数据处理能力和/或指令执行能力的其它形式的处理单元以及相应计算机指令来实现这些单元。
例如,获取单元501被配置为获取待识别图像。
例如,识别单元502被配置为对所述待识别图像进行识别,以得到多个区域框、与所述多个区域框一一对应的多个区域信息和多个待排版内容。
例如,排版单元503被配置为基于所述待识别图像和所述多个区域信息,对所述多个待排版内容进行排版,以得到与所述待识别图像对应的排版文档。
例如,获取单元501、识别单元502和排版单元503可以包括存储在存储器中的代码和程序;处理器可以执行该代码和程序以实现如上所述的获取单元501、识别单元502和排版单元503的一些功能或全部功能。例如,获取单元501、识别单元502和排版单元503可以是专用硬件器件,用来实现如上所述的获取单元501、识别单元502和排版单元503的一些或全部功能。例如,获取单元501、识别单元502和排版单元503可以是一个电路板或多个电路板的组合,用于实现如上所述的功能。在本申请实施例中,该一个电路板或多个电路板的组合可以包括:(1)一个或多个处理器;(2)与处理器相连接的一个或多个非暂时的存储器;以及(3)处理器可执行的存储在存储器中的固件。
需要说明的是,获取单元501可以用于实现图1所示的步骤S10,识别单元502可以用于实现图1所示的步骤S20,排版单元503可以用于实现图1所示的步骤S30。从而关于获取单元501、识别单元502和排版单元503能够 实现的功能的具体说明可以参考上述图像处理方法的实施例中的步骤S10至步骤S30的相关描述,重复之处不再赘述。此外,图像处理装置500可以实现与前述图像处理方法相似的技术效果,在此不再赘述。
需要注意的是,在本公开的实施例中,该图像处理装置500可以包括更多或更少的电路或单元,并且各个电路或单元之间的连接关系不受限制,可以根据实际需求而定。各个电路或单元的具体构成方式不受限制,可以根据电路原理由模拟器件构成,也可以由数字芯片构成,或者以其他适用的方式构成。
本公开至少一实施例还提供一种电子设备,图6为本公开至少一实施例提供的一种电子设备的示意图。
例如,如图6所示,电子设备包括处理器601、通信接口602、存储器603和通信总线604。处理器601、通信接口602、存储器603通过通信总线604实现相互通信,处理器601、通信接口602、存储器603等组件之间也可以通过网络连接进行通信。本公开对网络的类型和功能在此不作限制。应当注意,图6所示的电子设备的组件只是示例性的,而非限制性的,根据实际应用需要,该电子设备还可以具有其他组件。
例如,存储器603用于非瞬时性地存储计算机可读指令。处理器601用于执行计算机可读指令时,实现根据上述任一实施例所述的图像处理方法。关于该图像处理方法的各个步骤的具体实现以及相关解释内容可以参见上述图像处理方法的实施例,在此不作赘述。
例如,处理器601执行存储器603上所存放的计算机可读指令而实现的图像处理方法的其他实现方式,与前述方法实施例部分所提及的实现方式相同,这里也不再赘述。
例如,通信总线604可以是外设部件互连标准(PCI)总线或扩展工业标准结构(EISA)总线等。该通信总线可以分为地址总线、数据总线、控制总线等。为便于表示,图中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
例如,通信接口602用于实现电子设备与其他设备之间的通信。
例如,处理器601和存储器603可以设置在服务器端(或云端)。
例如,处理器601可以控制电子设备中的其它组件以执行期望的功能。处理器601可以是中央处理器(CPU)、网络处理器(NP)、张量处理器(TPU)或者图形处理器(GPU)等具有数据处理能力和/或程序执行能力的器件;还 可以是数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。中央处理器(CPU)可以为X86或ARM架构等。
例如,存储器603可以包括一个或多个计算机程序产品的任意组合,计算机程序产品可以包括各种形式的计算机可读存储介质,例如易失性存储器和/或非易失性存储器。易失性存储器例如可以包括随机存取存储器(RAM)和/或高速缓冲存储器(cache)等。非易失性存储器例如可以包括只读存储器(ROM)、硬盘、可擦除可编程只读存储器(EPROM)、便携式紧致盘只读存储器(CD-ROM)、USB存储器、闪存等。在所述计算机可读存储介质上可以存储一个或多个计算机可读指令,处理器601可以运行所述计算机可读指令,以实现电子设备的各种功能。在存储介质中还可以存储各种应用程序和各种数据等。
例如,在一些实施例中,电子设备还可以包括图像获取部件。图像获取部件用于获取图像。存储器603还用于存储获取的图像。
例如,图像获取部件可以是智能手机的摄像头、平板电脑的摄像头、个人计算机的摄像头、数码照相机的镜头、或者甚至可以是网络摄像头。
例如,获取的待识别图像可以是图像获取部件直接采集到的原始图像,也可以是对原始图像进行预处理之后获得的图像。预处理可以消除原始图像中的无关信息或噪声信息,以便于更好地对获取的图像进行处理。预处理例如可以包括对原始图像进行图像扩充(Data Augment)、图像缩放、伽玛(Gamma)校正、图像增强或降噪滤波等处理。
例如,关于电子设备执行图像处理的过程的详细说明可以参考图像处理方法的实施例中的相关描述,重复之处不再赘述。
图7为本公开至少一实施例提供的一种非瞬时性计算机可读存储介质的示意图。例如,如图7所示,存储介质700可以为非瞬时性计算机可读存储介质,在存储介质700上可以非暂时性地存储一个或多个计算机可读指令701。例如,当计算机可读指令701由处理器执行时可以执行根据上文所述的图像处理方法中的一个或多个步骤。
例如,该存储介质700可以应用于上述电子设备中,例如,该存储介质700可以包括电子设备中的存储器。
例如,存储介质可以包括智能电话的存储卡、平板电脑的存储部件、个人计算机的硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编 程只读存储器(EPROM)、便携式紧致盘只读存储器(CD-ROM)、闪存、或者上述存储介质的任意组合,也可以为其他适用的存储介质。
例如,关于存储介质700的说明可以参考电子设备的实施例中对于存储器的描述,重复之处不再赘述。
对于本公开,还有以下几点需要说明:
(1)本公开实施例附图只涉及到与本公开实施例涉及到的结构,其他结构可参考通常设计。
(2)为了清晰起见,在用于描述本发明的实施例的附图中,层或结构的厚度和尺寸被放大。可以理解,当诸如层、膜、区域或基板之类的元件被称作位于另一元件“上”或“下”时,该元件可以“直接”位于另一元件“上”或“下”,或者可以存在中间元件。
(3)在不冲突的情况下,本公开的实施例及实施例中的特征可以相互组合以得到新的实施例。
以上所述仅为本公开的具体实施方式,但本公开的保护范围并不局限于此,本公开的保护范围应以所述权利要求的保护范围为准。

Claims (20)

  1. 一种图像处理方法,其特征在于,包括:
    获取待识别图像;
    对所述待识别图像进行识别,以得到多个区域框、与所述多个区域框一一对应的多个区域信息和多个待排版内容;
    基于所述待识别图像和所述多个区域信息,对所述多个待排版内容进行排版,以得到与所述待识别图像对应的排版文档。
  2. 根据权利要求1所述的方法,其特征在于,对所述待识别图像进行识别,以得到多个区域框、与所述多个区域框一一对应的多个区域信息和多个待排版内容,包括:
    通过物体检测模型对所述待识别图像进行识别,以得到所述多个区域框以及所述多个区域信息,其中,所述多个区域框包括多个第一文本框;
    通过文字识别模型对所述多个第一文本框进行识别,以得到所述多个第一文本框一一对应的多个文本内容;
    其中,所述多个待排版内容包括所述多个文本内容中的一个或多个。
  3. 根据权利要求2所述的方法,其特征在于,还包括:
    根据所述多个区域信息和所述多个文本内容,确定所述多个第一文本框一一对应的多个文本类别,
    其中,所述多个第一文本框中的任一第一文本框对应的区域信息包括所述任一第一文本框的文本类别。
  4. 根据权利要求2所述的方法,其特征在于,所述多个区域框还包括至少一个图框,
    对所述待识别图像进行识别,以得到多个区域框、与所述多个区域框一一对应的多个区域信息和多个待排版内容,还包括:
    提取所述至少一个图框分别对应的至少一个待排版图片,
    其中,所述多个待排版内容还包括所述至少一个待排版图片。
  5. 根据权利要求3所述的方法,其特征在于,所述待识别图像为包含至少一个题目的图像,所述多个区域框还包括与所述至少一个题目一一对应的至少一个题目框,
    每个题目框在所述待识别图像中覆盖的区域内包括至少一个第一文本框,每个区域信息包括与所述每个区域信息对应的区域框在所述待识别图像 中的位置信息,
    根据所述多个区域信息和所述多个文本内容,确定所述多个第一文本框一一对应的多个文本类别,包括:
    根据所述多个区域信息中的位置信息,确定所述至少一个题目框和所述多个第一文本框之间的对应关系;
    基于所述对应关系和所述多个文本内容,确定所述多个文本类别。
  6. 根据权利要求5所述的方法,其特征在于,所述至少一个题目框包括第一题目框,在第一方向上,所述第一题目框具有第一边,
    所述多个第一文本框包括第一待处理框,所述多个文本类别包括大题名称,
    基于所述对应关系和所述多个文本内容,确定所述多个文本类别,包括:
    响应于所述对应关系指示所述第一待处理框位于所述待识别图像中由所述第一题目框所覆盖的区域内,且所述第一待处理框与所述第一边之间不具有任何区域框,确定所述第一待处理框的文本类别为所述大题名称;或者,
    响应于所述对应关系指示所述第一待处理框位于所述待识别图像中由所述第一题目框所覆盖的区域之外且所述第一待处理框与所述第一边之间不具有任何区域框,且确定所述第一待处理框对应的文本内容包含大题特征信息,确定所述第一待处理框的文本类别为所述大题名称。
  7. 根据权利要求5所述的方法,其特征在于,所述多个第一文本框包括第二待处理框,所述多个文本类别包括标题,
    在第一方向上,所述第二待处理框具有第一边,所述待识别图像具有第一边缘,
    基于所述对应关系和所述多个文本内容,确定所述多个文本类别,包括:
    在所述对应关系指示所述第二待处理框不位于所述待识别图像中由所述至少一个题目框所覆盖的区域的情况下,响应于所述第二待处理框的第一边与所述待识别图像的第一边缘之间的距离小于预设距离且所述第二待处理框对应的文本内容包含标题特征信息,确定所述第二待处理框的文本类别为所述标题。
  8. 根据权利要求5所述的方法,其特征在于,基于所述待识别图像和所述多个区域信息,对所述多个待排版内容进行排版,以得到与所述待识别图像对应的排版文档,包括:
    基于所述多个区域信息和所述待识别图像,确定与所述多个待排版内容 分别对应的多个排版信息;
    基于所述多个排版信息,对所述多个待排版内容进行排版,以得到所述排版文档。
  9. 根据权利要求8所述的方法,其特征在于,基于所述多个区域信息和所述待识别图像,确定与所述多个待排版内容分别对应的多个排版信息,包括:
    通过分类模型对所述待识别图像进行分类处理,以确定所述待识别图像的图像类别;
    根据所述图像类别获取与所述图像类别对应的排版模板;
    根据所述排版模板和所述多个区域信息,确定所述多个排版信息。
  10. 根据权利要求9所述的方法,其特征在于,根据所述排版模板和所述多个区域信息,确定所述多个排版信息,包括:
    对于所述多个待排版内容中的第i个待排版内容:
    响应于所述第i个待排版内容为文本内容,确定所述第i个待排版内容对应的区域框的区域信息,根据所述区域信息确定所述第i个待排版内容的文本类别;
    根据所述排版模板和所述第i个待排版内容的文本类别,确定所述第i个待排版内容对应的排版信息,
    其中,i为正整数,且小于等于所述多个待排版内容的总个数。
  11. 根据权利要求8所述的方法,其特征在于,基于所述多个排版信息,对所述多个待排版内容进行排版,以得到所述排版文档,包括:
    对所述多个待排版内容进行处理,以得到多个显示内容;
    确定所述多个显示内容之间的位置关系;
    基于所述多个显示内容之间的位置关系和所述多个排版信息,对所述多个显示内容进行排版处理,以得到所述排版文档。
  12. 根据权利要求11所述的方法,其特征在于,所述多个待排版内容包括至少一个第一待排版内容和至少一个第二待排版内容,所述至少一个题目框包括所述至少一个第一待排版内容,
    对所述多个待排版内容进行处理,以得到多个显示内容,包括:
    对所述至少一个第一待排版内容进行题号检测处理,以得到至少一个中间显示内容;
    对所述至少一个第二待排版内容和所述至少一个中间显示内容进行格式 处理,以得到所述多个显示内容。
  13. 根据权利要求12所述的方法,其特征在于,对所述至少一个第一待排版内容进行题号检测处理,以得到至少一个中间显示内容,包括:
    提取所述至少一个题目框对应的题号信息,以得到至少一个题号信息;
    确定所述至少一个题目框之间的位置关系;
    基于所述至少一个题目框之间的位置关系和所述至少一个题号信息,判断是否存在遗漏题号的情况,
    响应于存在遗漏题号的情况:
    提取遗漏的题号信息,确定所述遗漏的题号信息在所述待识别图像中对应的遗漏区域,基于所述遗漏区域补全所述遗漏的题号信息,以得到所述遗漏区域对应的遗漏显示内容;
    将所述遗漏显示内容和所述至少一个第一待排版内容作为所述至少一个中间显示内容,
    响应于不存在遗漏题号的情况:
    将所述至少一个第一待排版内容作为所述至少一个中间显示内容。
  14. 根据权利要求11所述的方法,其特征在于,确定所述多个显示内容之间的位置关系,包括:
    根据所述多个区域信息中的位置信息,确定所述多个区域框在所述待识别图像中的位置;
    基于所述多个区域框在所述待识别图像中的位置,确定所述多个待排版内容之间的位置关系;
    根据所述多个待排版内容之间的位置关系,确定所述多个显示内容之间的位置关系。
  15. 根据权利要求14所述的方法,其特征在于,根据所述多个待排版内容之间的位置关系,确定所述多个显示内容之间的位置关系,包括:
    根据所述多个区域框在所述待识别图像中的位置,确定所述待识别图像是否包括多个图像分区;
    响应于所述待识别图像包括多个图像分区,确定所述多个图像分区分别对应的多个待排版内容集合,并确定在所述待识别图像中所述多个图像分区之间的位置关系;
    基于所述多个图像分区之间的位置关系,确定所述多个待排版内容集合之间的位置关系;
    基于所述多个待排版内容集合之间的位置关系和所述多个待排版内容之间的位置关系,确定所述多个显示内容之间的位置关系。
  16. 根据权利要求11所述的方法,其特征在于,基于所述多个显示内容之间的位置关系和所述多个排版信息,对所述多个显示内容进行排版处理,以得到所述排版文档,包括:
    基于所述多个排版信息,对所述多个显示内容进行排版处理,以得到多个排版后显示内容;
    将所述多个排版后显示内容按照所述多个显示内容之间的位置关系依次排列,得到所述排版文档。
  17. 根据权利要求3所述的方法,其特征在于,所述文本类别包括手写文本,
    响应于所述多个文本内容中的第一文本内容的文本类别为手写文本,且所述排版文档不包含所述第一文本内容,
    对所述待识别图像进行识别,以得到多个区域框、与所述多个区域框一一对应的多个区域信息和多个待排版内容,还包括:
    将所述第一文本内容从所述多个文本内容中删除,以得到剩余的至少一个文本内容,其中,所述多个待排版内容包括所述剩余的至少一个文本内容而不包括所述第一文本内容。
  18. 一种图像处理装置,其特征在于,包括:
    获取单元,配置为获取待识别图像;
    识别单元,配置为对所述待识别图像进行识别,以得到多个区域框、与所述多个区域框一一对应的多个区域信息和多个待排版内容;
    排版单元,配置为基于所述待识别图像和所述多个区域信息,对所述多个待排版内容进行排版,以得到与所述待识别图像对应的排版文档。
  19. 一种电子设备,其特征在于,包括:
    存储器,非瞬时性地存储有计算机可执行指令;
    处理器,配置为运行所述计算机可执行指令,
    其中,所述计算机可执行指令被所述处理器运行时实现根据权利要求1-17任一项所述的图像处理方法。
  20. 一种非瞬时性计算机可读存储介质,其特征在于,所述非瞬时性计算机可读存储介质存储有计算机可执行指令,所述计算机可执行指令被处理器执行时实现根据权利要求1-17中任一项所述的图像处理方法。
PCT/CN2022/073310 2021-01-29 2022-01-21 图像处理方法及装置、电子设备和存储介质 WO2022161293A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110129765.5A CN112801084A (zh) 2021-01-29 2021-01-29 图像处理方法及装置、电子设备和存储介质
CN202110129765.5 2021-01-29

Publications (1)

Publication Number Publication Date
WO2022161293A1 true WO2022161293A1 (zh) 2022-08-04

Family

ID=75813027

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/073310 WO2022161293A1 (zh) 2021-01-29 2022-01-21 图像处理方法及装置、电子设备和存储介质

Country Status (2)

Country Link
CN (1) CN112801084A (zh)
WO (1) WO2022161293A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112801084A (zh) * 2021-01-29 2021-05-14 杭州大拿科技股份有限公司 图像处理方法及装置、电子设备和存储介质
CN114458979A (zh) * 2022-02-10 2022-05-10 珠海读书郎软件科技有限公司 一种用于辅助分页识别的智能台灯、识别方法及其存储介质
CN115690806B (zh) * 2022-10-11 2023-06-13 杭州瑞成信息技术股份有限公司 一种基于图像数据处理的非结构化文档格式识别方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110008944A (zh) * 2019-02-20 2019-07-12 平安科技(深圳)有限公司 基于模板匹配的ocr识别方法及装置、存储介质
CN110414529A (zh) * 2019-06-26 2019-11-05 深圳中兴网信科技有限公司 试卷信息提取方法、系统及计算机可读存储介质
WO2020177584A1 (zh) * 2019-03-01 2020-09-10 华为技术有限公司 一种图文排版方法及其相关装置
CN111931731A (zh) * 2020-09-24 2020-11-13 北京易真学思教育科技有限公司 判题方法、装置、电子设备及存储介质
CN111950557A (zh) * 2020-08-21 2020-11-17 珠海奔图电子有限公司 错题处理方法、图像形成装置及电子设备
CN112801084A (zh) * 2021-01-29 2021-05-14 杭州大拿科技股份有限公司 图像处理方法及装置、电子设备和存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110008944A (zh) * 2019-02-20 2019-07-12 平安科技(深圳)有限公司 基于模板匹配的ocr识别方法及装置、存储介质
WO2020177584A1 (zh) * 2019-03-01 2020-09-10 华为技术有限公司 一种图文排版方法及其相关装置
CN110414529A (zh) * 2019-06-26 2019-11-05 深圳中兴网信科技有限公司 试卷信息提取方法、系统及计算机可读存储介质
CN111950557A (zh) * 2020-08-21 2020-11-17 珠海奔图电子有限公司 错题处理方法、图像形成装置及电子设备
CN111931731A (zh) * 2020-09-24 2020-11-13 北京易真学思教育科技有限公司 判题方法、装置、电子设备及存储介质
CN112801084A (zh) * 2021-01-29 2021-05-14 杭州大拿科技股份有限公司 图像处理方法及装置、电子设备和存储介质

Also Published As

Publication number Publication date
CN112801084A (zh) 2021-05-14

Similar Documents

Publication Publication Date Title
WO2022161293A1 (zh) 图像处理方法及装置、电子设备和存储介质
US10846553B2 (en) Recognizing typewritten and handwritten characters using end-to-end deep learning
CN109858036B (zh) 一种文书划分方法及装置
CN112101367A (zh) 文本识别方法、图像识别分类方法、文档识别处理方法
WO2022166833A1 (zh) 图像处理方法和装置、电子设备和存储介质
CN113486828B (zh) 图像处理方法、装置、设备和存储介质
CN111507330A (zh) 习题识别方法、装置、电子设备及存储介质
CN110889406B (zh) 一种习题数据卡的信息采集方法、系统及终端
WO2022166707A1 (zh) 图像处理方法和装置、电子设备和存储介质
Elanwar et al. Extracting text from scanned Arabic books: a large-scale benchmark dataset and a fine-tuned Faster-R-CNN model
CN113673294A (zh) 文献关键信息的提取方法、装置、计算机设备和存储介质
CN110852131B (zh) 一种考试卡的信息采集方法、系统及终端
CN112036330A (zh) 一种文本识别方法、文本识别装置及可读存储介质
CN116384344A (zh) 一种文档转换方法、装置及存储介质
Lin et al. Multilingual corpus construction based on printed and handwritten character separation
WO2022042181A1 (zh) 对象识别处理方法、处理装置、电子设备和存储介质
CN115050025A (zh) 基于公式识别的知识点抽取方法及装置
CN112686253A (zh) 一种用于电子白板的屏幕文字提取系统及方法
CN113065316A (zh) 将方正小样文件动态转换成html并录入题库、从题库选题组稿并生成小样文件的方法
CN112181231A (zh) 板书输入方法、系统及装置
JP2020053891A (ja) 情報処理装置、情報処理方法及びプログラム
KR102646428B1 (ko) 인공지능 학습 모델을 이용한 유사 글자 추출 방법 및 장치
Panjwani et al. Script-agnostic reflow of text in document images
Henke Building and Improving an OCR Classifier for Republican Chinese Newspaper Text
JP7430219B2 (ja) 文書情報構造化装置、文書情報構造化方法およびプログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22745168

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22745168

Country of ref document: EP

Kind code of ref document: A1