CN115410216B - Ancient book text informatization processing method and system, electronic equipment and storage medium - Google Patents

Ancient book text informatization processing method and system, electronic equipment and storage medium Download PDF

Info

Publication number
CN115410216B
CN115410216B CN202211341307.9A CN202211341307A CN115410216B CN 115410216 B CN115410216 B CN 115410216B CN 202211341307 A CN202211341307 A CN 202211341307A CN 115410216 B CN115410216 B CN 115410216B
Authority
CN
China
Prior art keywords
result
text
ancient book
book text
informatization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211341307.9A
Other languages
Chinese (zh)
Other versions
CN115410216A (en
Inventor
李世杰
马晋
金沛然
闫升
曹承瑞
韩国民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Henan Wenshubao Intelligent Technology Research Institute Co ltd
Xi'an Wenshubao Technology Co ltd
Tianjin Hengda Wenbo Science& Technology Co ltd
Original Assignee
Henan Wenshubao Intelligent Technology Research Institute Co ltd
Xi'an Wenshubao Technology Co ltd
Tianjin Hengda Wenbo Science& Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Henan Wenshubao Intelligent Technology Research Institute Co ltd, Xi'an Wenshubao Technology Co ltd, Tianjin Hengda Wenbo Science& Technology Co ltd filed Critical Henan Wenshubao Intelligent Technology Research Institute Co ltd
Priority to CN202211341307.9A priority Critical patent/CN115410216B/en
Publication of CN115410216A publication Critical patent/CN115410216A/en
Application granted granted Critical
Publication of CN115410216B publication Critical patent/CN115410216B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/414Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/15Cutting or merging image elements, e.g. region growing, watershed or clustering-based techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19107Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19173Classification techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Character Input (AREA)

Abstract

The invention discloses an ancient book text informatization processing method, a system, electronic equipment and a storage medium, wherein the method comprises the following steps: pre-labeling the ancient book text image training sample by using an ancient book text informatization model to obtain a pre-labeling result; carrying out expert verification on the pre-labeling result to obtain a manual labeling result; training the ancient book text informatization model by utilizing a deep neural network to obtain the trained ancient book text informatization model; inputting the ancient book text image verification sample into the trained ancient book text informatization model, and testing the trained ancient book text informatization model to obtain an ancient book text processing result; repeatedly carrying out pre-labeling operation, manual labeling operation, model training operation and model testing operation to obtain a trained ancient book text informatization model; and performing informatization processing on the ancient book text image to be processed by using the trained ancient book text informatization model to obtain an informatization processing result.

Description

Ancient book text informatization processing method and system, electronic equipment and storage medium
Technical Field
The invention relates to the technical field of text positioning and character recognition, in particular to an ancient book text informatization processing method, an ancient book text informatization processing system, electronic equipment and a storage medium.
Background
The ancient books contain brilliant civilization and excellent traditional culture of Chinese nationalities, and the protection of the ancient books has important significance for inheritance, propagation and propagation of the excellent traditional culture of Chinese and culture confidence enhancement. At present, technologies such as Optical Character Recognition (OCR) based on artificial intelligence are widely applied to protection of ancient books, so that rich knowledge contained in the ancient books is shown to the public in a digital form; meanwhile, the ancient books can be better inherited by utilizing OCR based on artificial intelligence through digitalization of the ancient books.
However, in the prior art, the information work of ancient books still has the problems of low character recognition accuracy, disordered typesetting, low searching efficiency and the like.
Disclosure of Invention
In view of the foregoing technical problems, the present invention provides an ancient book text informatization processing method, system, electronic device and storage medium, so as to solve at least one of the above technical problems.
According to a first aspect of the present invention, there is provided an ancient book text informatization processing method, including:
pre-labeling the ancient book text image training sample by using an ancient book text informatization model to obtain a pre-labeling result, wherein the ancient book text informatization model comprises a detection submodule, a filtering submodule, an identification submodule and a layout analysis submodule;
according to a preset check rule, carrying out expert check on the pre-labeling result and carrying out manual labeling on the wrong pre-labeling result again to obtain a manual labeling result;
training the ancient book text informatization model by utilizing a deep neural network according to the manual labeling result to obtain a trained ancient book text informatization model;
inputting the ancient book text image verification sample into the trained ancient book text informatization model, testing the trained ancient book text informatization model according to a preset test rule to obtain an ancient book text processing result output by the tested ancient book text informatization model, and screening the ancient book text processing result to be used as a pre-labeling result of the training sample in the next round of informatization processing process;
repeatedly performing pre-labeling operation, manual labeling operation, model training operation and model testing operation according to preset iteration conditions to obtain a trained ancient book text informatization model;
performing informatization processing on the ancient book text image to be processed by utilizing the trained ancient book text informatization model to obtain an informatization processing result, wherein the informatization processing result comprises a text detection box, a text detection box filtering result, a character recognition result and a layout analysis result;
and according to the retrieval request and the information processing result of the user, finishing customized accurate retrieval and/or fuzzy retrieval request by utilizing the trained ancient book text information model.
According to an embodiment of the present invention, the detection submodule includes a single-stage target detection deep neural network having a channel attention mechanism;
the filtering submodule comprises a pixel-level semantic segmentation network with a text confidence coefficient prediction function;
the identification submodule comprises a preprocessing unit, a feature extraction unit consisting of a depth residual error network and a classification unit consisting of a plurality of loss branches;
the classification unit comprises a classification layer taking cross entropy as a loss function and a feature embedding layer taking triple loss as the loss function;
the layout analysis submodule comprises a graph neural network and/or a clustering unit for text relation regression, wherein the clustering unit is framed layer by using a Yu Wenben line through a clustering method.
According to an embodiment of the present invention, the pre-labeling processing performed on the ancient book text image training sample by using the ancient book text informatization model to obtain a pre-labeling result includes:
processing the ancient book text image training sample by using a detection submodule to obtain a text detection box, wherein the text detection box is used for text positioning of the ancient book text image;
performing pixel-level regression on the ancient book text image training sample by using a filtering submodule to obtain a text region confidence map, performing text confidence calculation on a text detection box by using the text region confidence map, and filtering a calculation result according to a preset filtering threshold value to obtain a file detection box filtering result;
processing the filtering result of the text detection box by using an identification submodule to obtain an ancient book text image block set, and performing character identification on the ancient book text image block set by using an identification submodule to obtain a character identification result;
and processing the ancient book text image training sample by using a layout analysis submodule according to the text detection box filtering result to obtain a layout analysis result, wherein the layout analysis result is used for determining the sequence and the row-column relationship among the characters according to the character position distribution.
According to the embodiment of the present invention, the performing expert verification on the pre-labeling result and manually labeling the wrong pre-labeling result again according to the preset verification rule to obtain the manual labeling result includes:
verifying the pre-labeling result by an expert to obtain a verification result, wherein the verification result comprises a text detection box verification result and a character recognition verification result;
under the condition that the text detection box check result is failed, carrying out text detection box deletion operation and text detection box addition operation on the pre-marked result through an expert;
and under the condition that the character recognition verification result is failed, sorting the character recognition results according to the character confidence degrees of the character recognition results by an expert, and screening the former N character recognition results or directly changing the character recognition results, wherein N is a positive integer.
According to an embodiment of the present invention, the training of the ancient book text information model by using the deep neural network according to the artificial labeling result to obtain the trained ancient book text information model includes:
processing an ancient book text training sample by using a detection submodule, comprising: detecting the manual marking result through a target detection algorithm to obtain an initial text detection box prediction result, comparing the text detection box prediction result with the manual marking result to obtain a first loss value, and training parameters of the detection sub-module through gradient return;
filtering the text detection box prediction result by using a filtering submodule, comparing the filtering result with a manual labeling result to obtain a second loss value, and training the parameters of the filtering submodule through gradient return;
and performing feature extraction and character classification on the manual labeling result by using an identification submodule, inputting the manual labeling result, the feature extraction result and the character classification result into a loss function to obtain a third loss value, and training parameters of the identification submodule through gradient feedback.
According to the embodiment of the invention, the ancient book text informatization processing method further comprises the step of carrying out pixel-level segmentation on the ancient book text image to be processed by utilizing the trained ancient book text informatization model according to the image segmentation requirements and the informatization processing results of the user, so as to obtain the customized segmentation results.
According to an embodiment of the present invention, the performing, according to the user image segmentation requirement and the information processing result, pixel-level segmentation on the ancient book text image to be processed by using the trained ancient book text information model to obtain a customized segmentation result includes:
according to the user image segmentation requirements and the information processing results, preprocessing the text detection box filtering results in the information processing results by using the trained ancient book text information model to obtain ancient book text image blocks, performing maximum inter-class variance local binarization on the ancient book text image blocks, and processing the binarization results to obtain customized segmentation results.
According to a second aspect of the present invention, there is provided an ancient book text informatization processing method, including:
the system comprises a pre-labeling module, a pre-labeling module and a pre-labeling module, wherein the pre-labeling module is used for performing pre-labeling processing on an ancient book text image training sample by using an ancient book text informatization model to obtain a pre-labeling result, and the ancient book text informatization model comprises a detection submodule, a filtering submodule, an identification submodule and a layout analysis submodule;
the marking module is used for carrying out expert verification on the pre-marking result and carrying out manual marking on the wrong pre-marking result again according to a preset verification rule to obtain a manual marking result;
the training module is used for training the ancient book text informatization model by utilizing the deep neural network according to the manual labeling result to obtain the trained ancient book text informatization model;
the test module is used for inputting the ancient book text image verification sample into the trained ancient book text informatization model, testing the trained ancient book text informatization model according to a preset test rule to obtain an ancient book text processing result output by the tested ancient book text informatization model, and screening the ancient book text processing result to be used as a pre-labeling result of the training sample in the next round of informatization processing process;
the version control module is used for repeatedly performing pre-labeling operation, manual labeling operation, model training operation and model testing operation according to preset iteration conditions to obtain a trained ancient book text informatization model;
the system comprises an informatization processing module, a page analysis module and a page analysis module, wherein the informatization processing module is used for performing informatization processing on an ancient book text image to be processed by utilizing a trained ancient book text informatization model to obtain an informatization processing result, and the informatization processing result comprises a text detection box, a text detection box filtering result, a character recognition result and a page analysis result;
and the retrieval module is used for completing customized accurate retrieval and/or fuzzy retrieval requests by utilizing the trained ancient book text informatization model according to the retrieval requests and the informatization processing results of the users.
According to a third aspect of the present invention, there is provided an electronic apparatus comprising:
one or more processors;
a storage device for storing one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method for ancient book text informatization processing described above.
According to a fourth aspect of the present invention, there is provided a computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to implement the ancient book text informatization processing method described above.
According to the ancient book text informatization processing method provided by the invention, the ancient book text informatization processing model is trained and tested by using the ancient book text image training samples obtained by multiple rounds of pre-labeling and expert checking, the obtained ancient book text informatization processing model can improve the accuracy of character recognition on the ancient book text image, meanwhile, the typesetting sequence of characters in the original ancient book text is ensured, and the retrieval efficiency and the retrieval convenience of the ancient book by a user can be greatly improved by using the output result of the ancient book text image informatization processing model.
Drawings
Fig. 1 is a flowchart of an ancient book text informatization processing method according to an embodiment of the invention;
FIG. 2 is a schematic diagram of an ancient book text informatization processing procedure according to an embodiment of the invention;
FIG. 3 is a schematic diagram of an ancient book text informatization processing strategy according to an embodiment of the invention;
FIG. 4 is a flow chart of obtaining pre-annotated results according to an embodiment of the invention;
FIG. 5 is a flow diagram for obtaining manual annotation results according to an embodiment of the invention;
FIG. 6 is a flow chart of obtaining a trained cadastral text informatization model according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of an ancient book text informatization processing system according to an embodiment of the invention;
FIG. 8 is a schematic diagram of an ancient book text informatization processing system according to another embodiment of the present invention;
FIG. 9 is a graphical user interface diagram of a system 800 according to another embodiment of the invention;
FIG. 10 is a diagram illustrating a pre-labeling effect according to another embodiment of the present invention;
FIG. 11 is a schematic diagram of the graphical user interface of the detect annotation submodule, according to another embodiment of the present invention;
FIG. 12 is a schematic diagram of a graphical user interface of a segmentation annotation sub-module according to another embodiment of the invention;
FIG. 13 is a pictorial representation of the graphical user interface of the identify label submodule, in accordance with another embodiment of the present invention;
FIG. 14 is a schematic diagram of a network structure of a filter (partition) network training submodule according to another embodiment of the present invention;
FIG. 15 is a flow chart of the operation of a recognizer training submodule, according to another embodiment of the present invention;
FIG. 16 (a) is a diagram illustrating the ancient book text detection test result of the test module 850 according to another embodiment of the present invention;
FIG. 16 (b) is a diagram illustrating the filtering test results of the ancient book text of the testing module 850 according to another embodiment of the present invention;
FIG. 16 (c) is a diagram illustrating the ancient book text recognition test result of the testing module 850 according to another embodiment of the present invention;
FIG. 16 (d) is a diagram illustrating the ancient book text layout analysis result of the testing module 850 according to another embodiment of the present invention;
FIG. 17 is a diagram illustrating the retrieval function of the retrieval module 870 according to another embodiment of the present invention;
FIG. 18 is a schematic diagram of the operation of a segmented panel 960 according to another embodiment of the present invention;
fig. 19 is a block diagram of an electronic device adapted to implement an ancient book text informatization processing method according to an embodiment of the present invention.
Detailed Description
In order that the objects, technical solutions and advantages of the present invention will become more apparent, the present invention will be further described in detail with reference to the accompanying drawings in conjunction with the following specific embodiments.
The ancient books are the cultural treasure in China, and the ancient book texts are digitalized and informationized, so that the ancient books can be protected, and the propagation of the ancient book culture in the public can be promoted. The invention can improve the recognition accuracy of the ancient book characters and give correct and clear typesetting of the corresponding ancient books through the ancient book text informatization processing method.
Fig. 1 is a flowchart of an ancient book text informatization processing method according to an embodiment of the invention.
As shown in FIG. 1, the ancient book text information processing method includes operations S110 to S170.
In operation S110, an ancient book text information model is used to perform a pre-labeling process on the ancient book text image training sample, so as to obtain a pre-labeling result, wherein the ancient book text information model includes a detection sub-module, a filtering sub-module, an identification sub-module, and a layout analysis sub-module.
The pre-labeling result comprises a text detection box, a text detection box filtering result, a character recognition result and a layout analysis result, wherein the text detection box is used for positioning characters in the ancient book text and providing information for subsequent ancient book text processing; the text detection box filtering result can be used for filtering the positioned characters, so that the identification accuracy of the information model can be improved; the character recognition result is the character recognized by the information model. The ancient book text information processing model provided by the invention can identify various fonts, such as seal script, golden script, regular script, running script or cursive script; the layout analysis results are used to determine the order between recognized words, the order between lines, etc. The text detection box, the text detection box filtering result, the character recognition result and the layout analysis result can be regarded as output labels of ancient book text image training samples after pre-labeling processing.
In operation S120, according to a preset verification rule, performing expert verification on the pre-labeling result and manually labeling the wrong pre-labeling result again to obtain a manual labeling result.
And the manual labeling result is that manual intervention is performed on the pre-labeling result, and an error result in the pre-labeling result is changed, so that the accuracy of the label of the ancient book text image training sample input into the ancient book text information model is improved.
In operation S130, the ancient book text information model is trained by using the deep neural network according to the manual labeling result, so as to obtain a trained ancient book text information model.
The trained ancient book text informatization model can output a training result, the training result is the same as the contents contained in the pre-labeling result and the manual labeling result, and the training result also comprises a text detection box, a text detection box filtering result, a character recognition result and a layout analysis result. The output labels of the training results are the text detection box, the text detection box filtering result, the character recognition result and the layout analysis result corresponding to the ancient book text image training sample.
In operation S140, the ancient book text image verification sample is input into the trained ancient book text information model, and the trained ancient book text information model is tested according to the preset test rule, so as to obtain the ancient book text processing result output by the tested ancient book text information model, and the ancient book text processing result is screened and then used as the pre-labeling result of the training sample of the next round of information processing process.
The ancient book text processing result and the pre-labeling result, the manual labeling result and the training result have the same contents, namely the text detection box, the text detection box filtering result, the character recognition result and the layout analysis result, and can be used as a label of the ancient book text image training sample. The verification label of the ancient book text verification sample with the verification label comprises a text detection box corresponding to the ancient book text verification sample, a text detection box filtering result, a character recognition result and a layout analysis result. The ancient book text processing result can be used as a training sample of the next round of training process (a part of the test result can be extracted immediately as a training sample of the next round), and can also be used for customizing segmentation requirements and/or customizing query and retrieval for users.
In operation S150, according to the preset iteration condition, the pre-labeling operation, the manual labeling operation, the model training operation, and the model testing operation are repeatedly performed to obtain the trained ancient book text information model.
In operation S160, the trained ancient book text information model is used to perform information processing on the ancient book text image to be processed, so as to obtain an information processing result, where the information processing result includes a text detection box, a text detection box filtering result, a character recognition result, and a layout analysis result.
Fig. 2 is a schematic diagram of an ancient book text informatization processing procedure according to an embodiment of the invention.
As shown in fig. 2, inputting the ancient book text image to be processed into the trained ancient book text information model, respectively performing text position detection on the ancient book text image to be processed by using the trained ancient book text information model to determine the positions of characters in the ancient book text image, and obtaining a text detection box for enclosing the characters; secondly, filtering the character positions by using a text detection box so as to identify characters, and completing the layout analysis and sequencing of the ancient book text images by using a layout analysis submodule in the trained ancient book text informatization model so as to ensure that the identified characters can be displayed to a terminal user according to a correct sequence; and meanwhile, the information of the identified text information is extracted so as to facilitate the retrieval and query of the user.
In operation S170, a customized accurate search and/or fuzzy search request is performed using the trained ancient book text informatization model according to the user search request and the information processing result.
According to the ancient book text informatization processing method provided by the invention, the ancient book text informatization processing model is trained and tested by using the ancient book text image training samples obtained through multiple rounds of pre-labeling and expert checking, the obtained ancient book text informatization processing model fully utilizes the prediction capability of a deep neural network, the labeling efficiency is greatly improved, the accuracy of character recognition on the ancient book text image can be improved, the typesetting sequence of characters in the original ancient book text is ensured, and the ancient book retrieval efficiency and the retrieval convenience of a user can be greatly improved by using the output result of the ancient book text image informatization processing model.
According to an embodiment of the present invention, the detection sub-module includes a single-stage target detection deep neural network with a channel attention mechanism.
The filtering sub-module includes a pixel level semantic segmentation network with text confidence prediction function.
The identification submodule comprises a preprocessing unit, a feature extraction unit consisting of a depth residual error network and a classification unit consisting of a plurality of loss branches.
The classification unit comprises a classification layer taking cross entropy as a loss function and a feature embedding layer taking triple loss as the loss function.
The layout analysis submodule comprises a graph neural network and/or a clustering unit for text relation regression, wherein the clustering unit is framed layer by using a Yu Wenben line through a clustering method.
The detection network, the segmentation network, the identification network and the layout analysis network can carry out all-around analysis, identification and information extraction on the ancient book text image to be processed, and the reliability of the ancient book text informatization processing result is ensured.
In order to better explain the ancient book text informatization processing method provided by the invention, the technical scheme of the invention is further explained in detail with reference to fig. 3.
Fig. 3 is a schematic diagram of an ancient book text informatization processing strategy according to an embodiment of the invention.
As shown in fig. 3, DNN is used to represent text detection network and textOne of the present segmentation network and character recognition network, DNN iter_3 The training data of the classifier of each training is not only benefited by the increase of the accumulated amount of the labeling result of each time, but also benefited by the improvement of the manual correction efficiency caused by the increase of the accuracy rate of the depth network.
Through training optimization of the deep neural network in the sub-modules, the ancient book text information model with high character recognition accuracy can be obtained.
FIG. 4 is a flow chart of obtaining pre-annotated results according to an embodiment of the invention.
As shown in fig. 4, the pre-labeling of the ancient book text image training sample by using the ancient book text informatization model to obtain the pre-labeling result includes operations S410 to S440.
In operation S410, the ancient book text image training sample is processed by the detection sub-module to obtain a text detection box, wherein the text detection box is used for text positioning of the ancient book text image.
In operation S420, the filtering submodule performs pixel-level regression on the ancient book text image training sample to obtain a text region confidence map, performs text confidence calculation on the text detection box by using the text region confidence map, and filters the calculation result according to a preset filtering threshold to obtain a file detection box filtering result.
In operation S430, the recognition sub-module is used to process the filtering result of the text detection box to obtain an image block set of the ancient book text, and the recognition sub-module is used to perform character recognition on the image block set of the ancient book text to obtain a character recognition result.
In operation S440, the ancient book text image training sample is processed by the layout analysis submodule according to the file detection box filtering result, so as to obtain a layout analysis result, where the layout analysis result is used to determine the sequence and row-column relationship between the characters according to the character position distribution.
Through the pre-labeling, the workload of manual labeling can be reduced, and the test result of the previous round can be utilized, so that the accuracy of the pre-labeling result is improved, and the labeling progress is accelerated.
FIG. 5 is a flow chart for obtaining manual annotation results according to an embodiment of the invention.
As shown in fig. 5, the performing expert verification on the pre-annotation result and manually annotating the incorrect pre-annotation result again according to the preset verification rule to obtain the manual annotation result includes operations S510 to S530.
In operation S510, the pre-labeling result is checked by the expert to obtain a checking result, where the checking result includes a text detection box checking result and a character recognition checking result.
In operation S520, in case that the text detection box verification result is failed, the text detection box deleting operation and the text detection box adding operation are performed on the pre-labeling result by the expert.
In operation S530, in the case that the character recognition verification result is failed, the expert sorts the character recognition results according to the text confidence of the character recognition results, and filters the first N character recognition results or directly changes the character recognition results, where N is a positive integer.
Through the expert verification and manual labeling, the ancient book text image training samples are marked with labels with higher accuracy, and the ancient book text informatization processing models are trained by the ancient book text training samples with the labels with high reliability, so that the models with higher identification accuracy are obtained.
FIG. 6 is a flow chart of obtaining a trained cadastral text informatization model according to an embodiment of the invention.
As shown in fig. 6, the training of the ancient book text information model by using the deep neural network according to the manual labeling result to obtain the trained ancient book text information model includes operations S610 to S630.
In operation S610, processing an ancient book text training sample with a detection sub-module, including: and detecting the manual marking result through a target detection algorithm to obtain an initial text detection box prediction result, comparing the text detection box prediction result with the manual marking result to obtain a first loss value, and training the parameters of the detection submodule through gradient return.
Operation S620 is performed, the filtering submodule is used to filter the text detection box prediction result, the filtering result is compared with the manual labeling result to obtain a second loss value, and the parameter of the filtering submodule is trained through gradient feedback.
Operation S630, the manual labeling result is subjected to feature extraction and text classification by using the identifier sub-module, and the feature extraction result and the text classification result are input into the loss function to obtain a third loss value, and the parameter of the identifier sub-module is trained through gradient feedback.
According to the embodiment of the invention, the ancient book text informatization processing method further comprises the step of carrying out pixel-level segmentation on the ancient book text image to be processed by utilizing the trained ancient book text informatization model according to the image segmentation requirements and the informatization processing results of the user, so as to obtain the customized segmentation results.
According to an embodiment of the present invention, the performing, according to the user image segmentation requirement and the informatization processing result, pixel-level segmentation on the ancient book text image to be processed by using the trained ancient book text informatization model to obtain the customized segmentation result includes: according to the user image segmentation requirement and the information processing result, preprocessing a text detection box filtering result in the information processing result by using a trained ancient book text information model to obtain an ancient book text image block, performing maximum inter-class variance local binarization on the ancient book text image block, and processing the binarization result to obtain a customized segmentation result.
Customized segmentation results include, but are not limited to: text display format, background display format.
Fig. 7 is a schematic structural diagram of an ancient book text informatization processing system according to an embodiment of the invention.
As shown in fig. 7, the ancient book text informatization processing system 700 includes a pre-labeling module 710, a labeling module 720, a training module 730, a testing module 740, a version control module 750, an informatization processing module 760 and a retrieval module 770.
And the pre-labeling module 710 is configured to perform pre-labeling processing on the ancient book text image training samples by using an ancient book text informatization model to obtain a pre-labeling result, where the ancient book text informatization model includes a detection submodule, a filtering submodule, an identification submodule, and a layout analysis submodule.
And the marking module 720 is configured to perform expert verification on the pre-marking result according to a preset verification rule and perform manual marking again on the wrong pre-marking result to obtain a manual marking result.
And the training module 730 is used for training the ancient book text information model by using the deep neural network according to the artificial labeling result to obtain the trained ancient book text information model.
The testing module 740 is configured to input the ancient book text image verification sample into the trained ancient book text informatization model, test the trained ancient book text informatization model according to a preset testing rule, obtain an ancient book text processing result output by the tested ancient book text informatization model, and filter the ancient book text processing result to obtain a pre-labeling result of the training sample of the next round of informatization processing process.
And the version control module 750 is configured to repeatedly perform a pre-labeling operation, a manual labeling operation, a model training operation, and a model testing operation according to a preset iteration condition to obtain a trained ancient book text informatization model.
The informatization processing module 760 is configured to perform informatization processing on the ancient book text image to be processed by using the trained ancient book text informatization model to obtain an informatization processing result, where the informatization processing result includes a text detection box, a text detection box filtering result, a character recognition result, and a layout analysis result.
The retrieving module 770 is used for completing customized accurate retrieval and/or fuzzy retrieval request by using the trained ancient book text informatization model according to the user retrieval request and the informatization processing result.
The ancient book text informatization system provided by the invention has the characteristics of high training parallelism, high labeling efficiency, high recognition result accuracy and the like, can be applied to ancient book text informatization of different fonts (such as seal script, golden script, regular script, running script and cursive script), and greatly expands the application scene of the system.
In order to better illustrate the solution according to the invention, another example is provided below to illustrate the advantages of the solution according to the invention from a number of perspectives.
Fig. 8 is a schematic diagram of an ancient book text informatization processing system according to another embodiment of the invention.
As shown in fig. 8, the ancient book text informatization processing system 800 of the embodiment includes a version control module 810, a pre-labeling module 820, a labeling module 830, a training module 840, a testing module 850, a segmentation module 860 and a retrieval module 870. The ancient book text information processing system 800 provided by another embodiment of the invention has a plurality of advantages: firstly, the test serialization is realized, the extraction of the text information in the ancient book picture can be divided into a plurality of relatively independent stages for processing, and the design has the advantages that the training data and the training process of each module are relatively independent, and can be respectively optimized on the training data and the network structure, so that the overall effect is improved; if the final effect is not ideal, each stage can be independently examined so as to quickly determine the bottleneck and solve the problem with pertinence, wherein the detection submodule is used for positioning the text, the filtering network adopts a segmentation network to carry out pixel level regression on a text box area to serve as text confidence coefficient for filtering, the identification network carries out character type determination on each detected position, and layout analysis determines the sequence and row-column relationship among the characters according to character position distribution; secondly, the training parallelism is high, and as can be seen from fig. 8, although the labeling module and the module testing stage are in sequence, the classifier training can be processed in parallel, so that the system efficiency is improved; and thirdly, the marking efficiency is high, the prediction capability of the depth network is utilized as much as possible, and the workload of manual marking is reduced, so that a progressive marking system is designed, for example, under the condition that the manual marking amount is insufficient in the earlier stage of the system, the marking result is subjected to non-accurate prediction, and the manual marking can be used as accurate marking of the next iteration only by correcting on the basis.
Fig. 9 is a graphical user interface diagram of a system 800 according to another embodiment of the invention.
As shown in fig. 9, a Graphical User Interface (GUI) of the system 800 is mainly divided into 9 panel areas in the actual operation process, which are: a test reasoning panel 900 mainly used for testing and reasoning; a version control panel 910, which mainly controls the number of times of the ancient book text informatization processing and can obtain the version number of the system 800; a detection labeling panel 920, which mainly includes the functions of detection-related pre-labeling, manual labeling, detection training, detection testing, and the like; a segmentation label panel 930, which mainly includes segmentation manual correction, segmentation training, and segmentation test/pre-label functions related to segmentation (filtering); a layout analysis panel 940, which mainly includes layout analysis calculation, automatic layout analysis and sequence labeling functions; the identification marking panel 950 mainly includes functions of batch pre-identification, identification marking, identification training, test evaluation, and the like; a segmentation panel 960 for customized text segmentation by the set binarization threshold; the retrieval panel 970 has the main function of retrieving the text recognition result of the ancient book text according to the preset word to be retrieved; an image list display and retrieval area 980 which mainly comprises the functions of displaying, selecting and searching a current image list to be processed; an image annotation display area 990, which mainly visually represents the current image and information such as annotation and prediction results, so as to facilitate further manual operation, analysis and debugging; .
Each of the modules of the system 800 is described in detail below in conjunction with the system operator interface diagram shown in fig. 9.
The version control module 810 is configured to reduce the workload of labeling, wherein manual labeling for each iteration is corrected based on pre-labeling of a classifier trained using all labeled data as training data; the method is characterized in that training data and a classifier are not available at first, labeling can be carried out only from the beginning (called as labeling round 0), the labeling result is set as 'labeling v = 0', the version numbers of classifier training, testing and pre-labeling on the basis are all 1 version, and each version needs to be stored by a separate storage space; starting from the first round of labeling, the labeling result includes the labeling result copy of the 0 th round, so that repeated labeling is avoided; meanwhile, the marking work only needs to be corrected on the basis of the prediction result (pre-marking) of the newly trained first-round depth network, so that the marking efficiency is gradually improved; the marked data can be rapidly increased along with the accumulation of the marked data and the improvement of the prediction accuracy of the depth network, after the nth round of marking, the marked result covers all the data to be marked, the depth network trained in the (n + 1) th round is an automatic identification system which is finally needed, and the test result of the (n + 1) th round is finally output; it should be noted that not all data from each round of labeling is trained in the DNN (DNN represents one of the detection, segmentation and recognition networks), and about 20% of the data is separated out to be left as a validation set in order to test the "health" of the current DNN.
The pre-labeling module 820 is mainly used for testing the current data set by using the detection, segmentation and recognition model trained by the previous version so as to perform pre-labeling work on the labeling work of the current version and reduce the workload of manpower. The detection labeling panel 920 performs detection, segmentation (or filtering), recognition, and layout analysis step by step to finally output the detection, recognition, and other results of the image to be tested (or to be pre-labeled), but the main purpose of this module is to provide prior information for labeling work, so that the result is taken out separately.
FIG. 10 is a diagram illustrating a pre-labeling effect according to another embodiment of the present invention.
As shown in fig. 10, after the "test/inference" button is clicked on the input image, the current image is subjected to a plurality of stages of cascade processing, and information storage, display and output are performed on the detection result (black typewriting shown in the image label display area 990), the segmentation result (a black and gray area shown in the image label display area 990), the recognition result corresponding to the detection result, the white text located at the lower right corner of the detection result in the image label display area 990, and the layout analysis result (text sequence is indicated by a white vertical line and used for indicating text sequence); and finally, displaying the information extraction result in the left text interaction area in line and sequence.
The labeling module 830 includes a detection labeling sub-module, a segmentation labeling sub-module, and an identification labeling sub-module.
FIG. 11 is a schematic diagram of a GUI of a sub-module for detecting annotation according to another embodiment of the present invention.
The detection labeling sub-module is located in the "detection frame manual correction" button of the "detection" panel at the manual labeling function module of the GUI shown in fig. 10. As shown in fig. 11, which is an exemplary diagram of manual labeling of a text box in a GUI, for distinction, the current labeling result is not enclosed by a dashed box, i.e. the "scroll" word (the last line of the 5 th column from right to left) in the image label display area 990, and at the same time, the image label display area 990 can display the result just labeled, the earlier labeled result is the text enclosed by the dashed box, for example, all the text in the second column from right to left is the earlier labeled result. After the 1 st round of labeling, manual automatic correction can be performed on the results of the machine automatic labeling, in the basic manner: and clicking a detection frame marking button, loading and displaying a pre-detection result while the system loads the current picture, and adjusting an inaccurate detection frame by a mouse.
The segmentation labeling submodule is different from the segmentation of the text at the pixel level, and the segmentation is referred to as the pixel level segmentation of the detection box, namely, the pixel-by-pixel regression inside the detection box is used as the text confidence of the current detection box, so that the detection result can be filtered. Therefore, the standard image of the segmentation label can be generated by directly imaging the label of the detection frame. However, in consideration of the application of a specific scene, for example, incomplete characters at the edge of an image do not need to be detected, the detection can be realized by adding or deleting the marks of the detection frame in the step of dividing the marks.
FIG. 12 is a schematic diagram of a graphical user interface of a segmentation annotation sub-module according to another embodiment of the present invention.
On the GUI shown in fig. 9, the current image can be segmented and labeled by clicking the "segmentation manual correction" button, as shown in fig. 12, the current image and the result of detecting the label are loaded first, then add/delete is performed on the basis, and the current label result is saved and the segmentation label of the unlabeled image is transferred by right clicking with a mouse, for example, the "volume" character (the last character in the 5 th column from right to left) shown in the image label display area 990 in fig. 12 can be segmented and labeled manually.
FIG. 13 is a schematic diagram of the operation of a graphical user interface of an identify label submodule according to another embodiment of the invention.
The main functions of the identification tag submodule are explained in detail below with reference to fig. 13.
Fig. 13 shows a GUI of the recognition and labeling submodule for ancient book text recognition, and the main functions of the recognition and labeling submodule include the following aspects: the system loads the first image which is not identified and labeled by clicking an identification label button; performing automatic layout analysis based on the labeling results of the detection boxes to generate text sequence and row-column distribution information; copying the detection frame area with the sequence number of 1 to an 'identification' panel small picture area; if the label is the 0 th round label and no pre-recognition process exists, manual input (such as a pinyin input method) needs to be carried out on the current character, and the label of the next character is automatically entered according to the carriage return; if the number 1 and the subsequent marking (or the number 0 but the pre-recognition result), the system respectively displays the pre-recognized candidate results in 10 selected text edit boxes from high to low according to the confidence level, the candidate characters can be copied into the recognition marking edit box by inputting through a keyboard of 'F1-F10' or clicking through a right mouse button, and the marking result is determined by returning and the marking of the next character is transferred; whether a pre-recognition result exists or not, clicking the inside of a certain text box in a large picture area through a mouse, selecting characters to be marked, and carrying out jumping marking; or the data can be input in the large text area in sequence, the system automatically divides lines and automatically matches the labeling result to the large image area. In any mode, each labeling area (mainly comprising two areas, namely a large graph and a large edit box) synchronously displays the labeling result; for example, if the traditional word "" (the first word in the first column from right to left) of the image annotation display area 990 is subjected to identification annotation, all the identification results of "" are displayed on the identification annotation panel 950, and the first 10 ranked identification results, such as "", "", etc., are typically displayed.
The training module 840 includes a detector training sub-module, a filter (partition) network training sub-module, and a recognizer training sub-module.
For the detector training submodule, because the detection performance of the single-stage deep neural network Yolov5 algorithm is superior, the network structure of the detector training submodule is constructed on the basis of Yolov 5. Yolov5 realizes the setting of a large receptive field in a local area by the fusion of a large number of features of different resolutions of a backbone network, so that the network can generate more accurate reasoning under the condition that characters are not clearly displayed. But performs poorly for non-uniform distribution of words (e.g., a small number of small words embedded between a group of large words). Therefore, a channel attention mechanism is added in the C3 module, so that the judgment of the resolution of large and small words in a certain area has more specific channel attention expression, and the detection result is improved.
Fig. 14 is a schematic diagram of a network structure of a filtering (partitioning) network training submodule according to another embodiment of the present invention.
The filtering (segmentation) network training sub-module is described in detail below with reference to fig. 14, and as shown in fig. 14, a text confidence calculation step based on a segmentation network is added to perform text filtering in consideration of the possibility of the presence of characters on the back of the paper, incomplete characters at the edges of the image, and other noise in the ancient book picture. The network structure adopts U2-Net, the main structure of which is shown in figure 14, and the main structure is characterized in that the U-Net structure firstly expresses the characteristics from high resolution to low resolution, then samples up to the original resolution step by step, and the same level of resolution contains characteristic fusion. Each segmentation unit is in a U-net form, in the head part, each up-sampling resolution returns to a segmentation graph, and the final segmentation result is generated through a feature fusion operation.
FIG. 15 is a flow chart of the operation of the recognizer training submodule, according to another embodiment of the present invention.
The recognizer training submodule is described in detail below in conjunction with FIG. 15. As shown in fig. 15, since the cross entropy loss commonly used in the classification network has a limited generalization, the present invention intends to learn a feature scale space through triple loss, and then achieve the purpose of text classification by one classification loss; extracting the features of the text picture blocks through a feature extraction network, calculating triple loss by combining text class numbers, and calculating cross entropy loss by passing the features through a classification layer; the CNN feature extraction layers include, but are not limited to, resNet18, resNet34, resNet50, resNet101, resNet151, resNext. Classification networks include, but are not limited to, fully connected layer, multi-layer sensing mechanisms.
FIG. 16 (a) is a diagram illustrating the ancient book text detection test result of the test module 850 according to another embodiment of the present invention.
Fig. 16 (b) is a schematic diagram illustrating the ancient book text filtering test result of the testing module 850 according to another embodiment of the invention.
FIG. 16 (c) is a diagram illustrating the ancient book text recognition test result of the testing module 850 according to another embodiment of the present invention.
Fig. 16 (d) is a diagram illustrating the ancient book text layout analysis result of the testing module 850 according to another embodiment of the present invention.
The test module 850 will be described in detail with reference to fig. 16 (a) to 16 (d). The test module 850 includes a text detection sub-module, a text filtering sub-module, a text recognition sub-module, and a layout analysis sub-module. In the GUI interface, "detect" panel, click "detect test/pre-label" button, then detect test can be performed on the current image (or all images in the list, depending on whether the "single-image" check box in the "function panel" is checked), and the image is displayed by a dotted box, as shown in fig. 16 (a), the dotted box in fig. 16 (a) is the ancient book text position obtained by using the trained ancient book detection neural network to perform forward reasoning on the current ancient book text image; the text filtering submodule is used for performing a module test of a segmentation stage on a current image through a segmentation test pre-labeling button in a segmentation labeling panel in an ancient book text image, as shown in fig. 16 (b), a dark background (namely an area where characters in the ancient book text are located) in fig. 16 (b) indicates, a trained ancient book text filtering network is used for performing forward reasoning on the current ancient book image, a pixel-by-pixel text confidence coefficient regression result is performed, and the background depth indicates the text confidence coefficient, so that the average confidence coefficient of a text detection box for filtering can be calculated subsequently; the text recognition sub-module is used for testing the recognition module of the current picture through a batch pre-recognition button in a recognition label panel, but the current picture has a detection test result on the premise that the current picture is shown in a figure 16 (c), a white word beside a text in the figure 16 (c) indicates the current picture, and a trained ancient book text recognition network is used for carrying out forward reasoning on picture blocks in each detection frame of the current picture to obtain a text category; the layout analysis submodule is configured to perform context sequential arrangement on the text boxes according to the position information of the text boxes, where specific algorithms include, but are not limited to GNN, KNN, LSTM, and the like, as shown in fig. 16 (d), and white thin lines in fig. 16 (d) indicate the text boxes, and by using the layout analysis algorithm of the present patent, sequential calculation is performed on all detection boxes of the current image, and the connection is performed on the center points of the detection boxes according to the arrangement sequence.
The segmentation of the segmentation module 860 is different from the text region segmentation described earlier, but for segmentation at the text pixel level (like that shown in fig. 16 (b)) although the recognition module has no direct effect on the recognition of the ancient document, segmentation of the text at the pixel level contributes to data enhancement; also, text pixel-level segmentation is useful for the needs of a particular user. The segmentation algorithm uses the detection frame information and performs OTSU local binarization on the detection frame based on the assumption that the luminance of the text pixel in the detection frame is substantially consistent, and the segmentation result shown in fig. 16 (b) can be obtained by graying and local binarization on the original image shown in fig. 16 (a).
The retrieval module 870 mainly functions to facilitate information query (or retrieval) by users, and the retrieval includes both precise keyword retrieval (all text positions of all pictures completely containing keywords are shown in a list) and fuzzy retrieval (pictures and text positions containing retrieval information are sorted according to retrieval similarity and are shown in a list).
The function of the search module 870 is described in further detail below with reference to fig. 17.
Fig. 17 is a diagram illustrating the retrieving function of the retrieving module 870 according to another embodiment of the present invention.
As shown in fig. 17, when searching all the documents containing the "old" character in the database, the "old" character is input into the text box of the "word to be searched" in the "search panel 970" and the user returns, the system searches all the documents containing the current character and displays the documents in the document list area in the "search panel", selects any one of the documents in the list, and clicks the right button, the system displays the current image in the large area on the right side, and identifies all the areas containing the word to be searched by the solid line frame.
Fig. 18 is a schematic diagram of the operation of a segmented panel 960 according to another embodiment of the present invention.
As shown in fig. 18, if the "text segmentation" button of the "segmentation panel 960" can be clicked to obtain the segmentation result of the current image, the system performs text segmentation based on local binarization on the current image by using the text detection information. When any character is clicked on the right large image area by a mouse, the segmentation (or binarization) result of the current text is displayed in the graphic display area of the 'segmentation panel'. As shown in FIG. 18 for the mouse-clicked "" word, the current word is identified by the solid box, and the segmentation result is displayed in the graphic display area in "segmentation panel 960".
Fig. 19 schematically shows a block diagram of an electronic device adapted to implement the ancient book text informatization processing method according to an embodiment of the present invention.
As shown in fig. 19, an electronic device 1900 according to an embodiment of the present invention includes a processor 1901 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 1902 or a program loaded from a storage section 1908 into a Random Access Memory (RAM) 1903. The processor 1901 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), among others. The processor 1901 may also include onboard memory for caching purposes. The processor 1901 may comprise a single processing unit or multiple processing units for performing the different actions of the method flows according to embodiments of the present invention.
In the RAM 1903, various programs and data necessary for the operation of the electronic apparatus 1900 are stored. The processor 1901, ROM 1902, and RAM 1903 are connected to each other via a bus 1904. The processor 1901 performs various operations of the method flow according to embodiments of the present invention by executing programs in the ROM 1902 and/or RAM 1903. Note that the programs may also be stored in one or more memories other than the ROM 1902 and the RAM 1903. The processor 1901 may also perform various operations of method flows according to embodiments of the present invention by executing programs stored in the one or more memories.
According to an embodiment of the invention, electronic device 1900 may also include an input/output (I/O) interface 1905, input/output (I/O) interface 1905 also being connected to bus 1904. Electronic device 1900 may also include one or more of the following components connected to I/O interface 1905: an input portion 1906 including a keyboard, a mouse, and the like; an output section 1907 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 1908 including a hard disk and the like; and a communication section 1909 including a network interface card such as a LAN card, a modem, or the like. The communication section 1909 performs communication processing via a network such as the internet. Drivers 1910 are also connected to I/O interface 1905 as needed. A removable medium 1911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1910 as necessary, so that a computer program read out therefrom is mounted in the storage section 1908 as necessary.
The present invention also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement the method according to an embodiment of the present invention.
According to embodiments of the present invention, the computer readable storage medium may be a non-volatile computer readable storage medium, which may include, for example but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the invention, a computer-readable storage medium may include the ROM 1902 and/or the RAM 1903 described above and/or one or more memories other than the ROM 1902 and the RAM 1903.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
It will be appreciated by a person skilled in the art that various combinations and/or combinations of features described in the various embodiments and/or in the claims of the invention are possible, even if such combinations or combinations are not explicitly described in the invention. In particular, various combinations and/or combinations of the features recited in the various embodiments and/or claims of the present invention may be made without departing from the spirit or teaching of the invention. All such combinations and/or associations are within the scope of the present invention.
The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only examples of the present invention, and should not be construed as limiting the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (7)

1. An ancient book text informatization processing method, wherein the ancient book text font comprises seal script, regular script, running script or cursive script, and the ancient book text informatization processing method is characterized by comprising the following steps:
pre-labeling the ancient book text image training sample by using an ancient book text informatization model to obtain a pre-labeling result, wherein the ancient book text informatization model comprises a detection submodule, a filtering submodule, an identification submodule and a layout analysis submodule;
according to a preset check rule, carrying out expert check on the pre-labeling result and carrying out manual labeling on the wrong pre-labeling result again to obtain a manual labeling result;
training the ancient book text informatization model by utilizing a deep neural network according to the manual labeling result to obtain a trained ancient book text informatization model;
inputting an ancient book text image verification sample into the trained ancient book text informatization model, testing the trained ancient book text informatization model according to a preset test rule to obtain an ancient book text processing result output by the tested ancient book text informatization model, and screening the ancient book text processing result to be used as a pre-labeling result of a training sample in the next round of informatization processing process;
repeatedly performing pre-labeling operation, manual labeling operation, model training operation and model testing operation according to preset iteration conditions to obtain a trained ancient book text informatization model;
performing informatization processing on the ancient book text image to be processed by using the trained ancient book text informatization model to obtain an informatization processing result, wherein the informatization processing result comprises a text detection box, a text detection box filtering result, a character recognition result and a layout analysis result;
according to the user retrieval request and the information processing result, completing customized accurate retrieval and/or fuzzy retrieval requests by utilizing the trained ancient book text information model;
according to the image segmentation requirements of the user and the informatization processing result, pixel-level segmentation is carried out on the ancient book text image to be processed by using the trained ancient book text informatization model, and a customized segmentation result is obtained;
the detection submodule comprises a single-stage target detection deep neural network with a channel attention mechanism, wherein the single-stage target detection deep neural network is constructed on the basis of YoloV 5;
wherein the filtering submodule comprises a pixel level semantic segmentation network with a text confidence prediction function;
the identification submodule comprises a preprocessing unit, a feature extraction unit consisting of a depth residual error network and a classification unit consisting of a plurality of loss branches;
the classification unit comprises a classification layer taking cross entropy as a loss function and a feature embedding layer taking triple loss as the loss function;
the layout analysis submodule comprises a graph neural network and/or a clustering unit for text relation regression, wherein the clustering unit is used for framing text lines layer by layer through a clustering method, and the layout analysis submodule is used for sequentially arranging the text detection boxes up and down through GNN, KNN and LSTM;
according to the image segmentation requirements of the user and the information processing result, pixel-level segmentation is carried out on the ancient book text image to be processed by using the trained ancient book text information model, and the customized segmentation result is obtained by the method comprising the following steps:
and preprocessing a text detection box filtering result in the information processing result by using the trained ancient book text information model according to the user image segmentation requirement and the information processing result to obtain an ancient book text image block, performing maximum inter-class variance local binarization on the ancient book text image block, and processing a binarization result to obtain a customized segmentation result.
2. The method of claim 1, wherein the pre-labeling of the training sample of the ancient book text image by the ancient book text informatization model to obtain a pre-labeling result comprises:
processing the ancient book text image training sample by using the detection submodule to obtain a text detection box, wherein the text detection box is used for text positioning of the ancient book text image;
performing pixel-level regression on the ancient book text image training sample by using a filtering submodule to obtain a text region confidence map, performing text confidence calculation on the text detection box by using the text region confidence map, and filtering a calculation result according to a preset filtering threshold value to obtain a file detection box filtering result;
processing the filtering result of the text detection box by using the identification submodule to obtain an ancient book text image block set, and performing character identification on the ancient book text image block set by using the identification submodule to obtain a character identification result;
and processing the ancient book text image training sample by using the layout analysis submodule according to the text detection box filtering result to obtain a layout analysis result, wherein the layout analysis result is used for determining the sequence and row-column relationship among the characters according to the character position distribution.
3. The method according to claim 1, wherein the performing expert verification on the pre-labeling result and manually labeling the wrong pre-labeling result again according to a preset verification rule to obtain a manual labeling result comprises:
verifying the pre-labeling result by an expert to obtain a verification result, wherein the verification result comprises a text detection box verification result and a character recognition verification result;
under the condition that the text detection box verification result is failed, the expert deletes the text detection box and adds the text detection box to the pre-marked result;
and under the condition that the character recognition verification result is failed, sorting the character recognition results according to the character confidence degrees of the character recognition results through the experts, and screening the former N character recognition results or directly changing the character recognition results, wherein N is a positive integer.
4. The method of claim 1, wherein the training of the ancient book text information model by using a deep neural network according to the artificial labeling result to obtain the trained ancient book text information model comprises:
processing the ancient book text training sample by using the detection submodule, wherein the ancient book text training sample comprises: detecting the artificial labeling result through a target detection algorithm to obtain an initial text detection box prediction result, comparing the text detection box prediction result with the artificial labeling result to obtain a first loss value, and training parameters of the detection submodule through gradient return;
filtering the text detection box prediction result by using the filtering submodule, comparing the filtering result with the manual labeling result to obtain a second loss value, and training the parameters of the filtering submodule through gradient return;
and performing feature extraction and character classification on the manual labeling result by using the recognition submodule, inputting the manual labeling result, the feature extraction result and the character classification result into a loss function to obtain a third loss value, and training parameters of the recognition submodule through gradient return.
5. An ancient book text informatization processing system, wherein, the typeface of ancient book text includes seal script, regular script, running script or cursive script, its characterized in that includes:
the system comprises a pre-labeling module, a text information model processing module and a text analysis module, wherein the pre-labeling module is used for pre-labeling an ancient book text image training sample by using an ancient book text information model to obtain a pre-labeling result, and the ancient book text information model comprises a detection submodule, a filtering submodule, an identification submodule and a layout analysis submodule;
the marking module is used for carrying out expert verification on the pre-marking result and carrying out manual marking on the wrong pre-marking result again according to a preset verification rule to obtain a manual marking result;
the training module is used for training the ancient book text informatization model by utilizing a deep neural network according to the manual labeling result to obtain a trained ancient book text informatization model;
the test module is used for inputting the ancient book text image verification sample into the trained ancient book text informatization model, testing the trained ancient book text informatization model according to a preset test rule to obtain an ancient book text processing result output by the tested ancient book text informatization model, and screening the ancient book text processing result to be used as a pre-labeling result of the training sample in the next round of informatization processing process;
the version control module is used for repeatedly performing pre-labeling operation, manual labeling operation, model training operation and model testing operation according to preset iteration conditions to obtain a trained ancient book text informatization model;
the information processing module is used for performing information processing on the ancient book text image to be processed by utilizing the trained ancient book text information model to obtain an information processing result, wherein the information processing result comprises a text detection box, a text detection box filtering result, a character recognition result and a layout analysis result;
the retrieval module is used for completing customized accurate retrieval and/or fuzzy retrieval requests by utilizing the trained ancient book text informatization model according to the user retrieval requests and the informatization processing results;
according to the image segmentation requirements of the user and the information processing result, performing pixel-level segmentation on the ancient book text image to be processed by using the trained ancient book text information model to obtain a customized segmentation result;
the detection submodule comprises a single-stage target detection deep neural network with a channel attention mechanism, wherein the single-stage target detection deep neural network is constructed on the basis of YoloV 5;
wherein the filtering submodule comprises a pixel level semantic segmentation network with a text confidence prediction function;
the identification submodule comprises a preprocessing unit, a feature extraction unit consisting of a depth residual error network and a classification unit consisting of a plurality of loss branches;
the classification unit comprises a classification layer taking cross entropy as a loss function and a feature embedding layer taking triple loss as the loss function;
the layout analysis submodule comprises a graph neural network and/or a clustering unit for text relation regression, wherein the clustering unit is used for framing text lines layer by layer through a clustering method, and the layout analysis submodule is used for sequentially arranging the text detection boxes up and down through GNN, KNN and LSTM;
according to the image segmentation requirements of the user and the information processing result, pixel-level segmentation is carried out on the ancient book text image to be processed by using the trained ancient book text information model, and the customized segmentation result is obtained by the method comprising the following steps:
and according to the user image segmentation requirements and the information processing results, preprocessing a text detection box filtering result in the information processing results by using the trained ancient book text information model to obtain an ancient book text image block, performing maximum inter-class variance local binarization on the ancient book text image block, and processing a binarization result to obtain a customized segmentation result.
6. An electronic device, comprising:
one or more processors;
a storage device for storing one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1~4.
7. A computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to implement the method of any one of claims 1-4.
CN202211341307.9A 2022-10-31 2022-10-31 Ancient book text informatization processing method and system, electronic equipment and storage medium Active CN115410216B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211341307.9A CN115410216B (en) 2022-10-31 2022-10-31 Ancient book text informatization processing method and system, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211341307.9A CN115410216B (en) 2022-10-31 2022-10-31 Ancient book text informatization processing method and system, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN115410216A CN115410216A (en) 2022-11-29
CN115410216B true CN115410216B (en) 2023-02-10

Family

ID=84168758

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211341307.9A Active CN115410216B (en) 2022-10-31 2022-10-31 Ancient book text informatization processing method and system, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115410216B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117115117B (en) * 2023-08-31 2024-02-09 南京诺源医疗器械有限公司 Pathological image recognition method based on small sample, electronic equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108985293A (en) * 2018-06-22 2018-12-11 深源恒际科技有限公司 A kind of image automation mask method and system based on deep learning
CN112949648A (en) * 2021-03-12 2021-06-11 上海眼控科技股份有限公司 Method and equipment for acquiring training sample data set of image segmentation model

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111507351B (en) * 2020-04-16 2023-05-30 华南理工大学 Ancient book document digitizing method
CN111985462B (en) * 2020-07-28 2024-08-06 天津恒达文博科技股份有限公司 Ancient character detection, identification and retrieval system based on deep neural network
CN113158808B (en) * 2021-03-24 2023-04-07 华南理工大学 Method, medium and equipment for Chinese ancient book character recognition, paragraph grouping and layout reconstruction
CN113989484A (en) * 2021-11-02 2022-01-28 古联(北京)数字传媒科技有限公司 Ancient book character recognition method and device, computer equipment and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108985293A (en) * 2018-06-22 2018-12-11 深源恒际科技有限公司 A kind of image automation mask method and system based on deep learning
CN112949648A (en) * 2021-03-12 2021-06-11 上海眼控科技股份有限公司 Method and equipment for acquiring training sample data set of image segmentation model

Also Published As

Publication number Publication date
CN115410216A (en) 2022-11-29

Similar Documents

Publication Publication Date Title
US7801358B2 (en) Methods and systems for analyzing data in media material having layout
CN112446351B (en) Intelligent identification method for medical bills
KR20190123790A (en) Extract data from electronic documents
RU2760471C1 (en) Methods and systems for identifying fields in a document
Rong et al. Recognizing text-based traffic guide panels with cascaded localization network
CN111309912A (en) Text classification method and device, computer equipment and storage medium
US20040202349A1 (en) Automated techniques for comparing contents of images
CN112434691A (en) HS code matching and displaying method and system based on intelligent analysis and identification and storage medium
CN111985462B (en) Ancient character detection, identification and retrieval system based on deep neural network
US12051256B2 (en) Entry detection and recognition for custom forms
CN112036295B (en) Bill image processing method and device, storage medium and electronic equipment
US12118813B2 (en) Continuous learning for document processing and analysis
CN109189965A (en) Pictograph search method and system
CN115410216B (en) Ancient book text informatization processing method and system, electronic equipment and storage medium
CN113673294B (en) Method, device, computer equipment and storage medium for extracting document key information
US12118816B2 (en) Continuous learning for document processing and analysis
JP7282989B2 (en) text classification
CN116225956A (en) Automated testing method, apparatus, computer device and storage medium
US12094232B2 (en) Automatically determining table locations and table cell types
CN112464892A (en) Bill region identification method and device, electronic equipment and readable storage medium
RU2774653C1 (en) Methods and systems for identifying fields in a document
RU2764705C1 (en) Extraction of multiple documents from a single image
Yuadi et al. Evaluation for Optical Character Recognition of Mobile Application
CN118799888A (en) Automatic accessory information extraction and calibration method and device for maintenance work order
CN115880682A (en) Image text recognition method, device, equipment, medium and product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant