CN116416629B - Electronic file generation method, device, equipment and medium - Google Patents

Electronic file generation method, device, equipment and medium Download PDF

Info

Publication number
CN116416629B
CN116416629B CN202310684514.2A CN202310684514A CN116416629B CN 116416629 B CN116416629 B CN 116416629B CN 202310684514 A CN202310684514 A CN 202310684514A CN 116416629 B CN116416629 B CN 116416629B
Authority
CN
China
Prior art keywords
target
character
archive
file
content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310684514.2A
Other languages
Chinese (zh)
Other versions
CN116416629A (en
Inventor
刘鹏
郑蔚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Quantum Weiye Information Technology Co ltd
Original Assignee
Beijing Quantum Weiye Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Quantum Weiye Information Technology Co ltd filed Critical Beijing Quantum Weiye Information Technology Co ltd
Priority to CN202310684514.2A priority Critical patent/CN116416629B/en
Publication of CN116416629A publication Critical patent/CN116416629A/en
Application granted granted Critical
Publication of CN116416629B publication Critical patent/CN116416629B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/19007Matching; Proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/24Character recognition characterised by the processing or recognition method
    • G06V30/242Division of the character sequences into groups prior to recognition; Selection of dictionaries
    • G06V30/244Division of the character sequences into groups prior to recognition; Selection of dictionaries using graphical properties, e.g. alphabet type or font
    • G06V30/245Font recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/24Character recognition characterised by the processing or recognition method
    • G06V30/242Division of the character sequences into groups prior to recognition; Selection of dictionaries
    • G06V30/244Division of the character sequences into groups prior to recognition; Selection of dictionaries using graphical properties, e.g. alphabet type or font
    • G06V30/2455Discrimination between machine-print, hand-print and cursive writing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The present application relates to the field of data processing, and in particular, to a method, an apparatus, a device, and a medium for generating an electronic file, where the method includes: acquiring an image corresponding to a file to be processed; determining recorder information according to the file catalog page image, and determining target handwriting information corresponding to the recorder information according to the recorder information; splitting the character images of the archives pages to obtain radical images of the character images of the archives pages; according to the matching of the component images and the target component writing library, determining target component corresponding to the component images; determining target characters corresponding to the archival content page character images according to target components corresponding to the archival content page character images; and generating an electronic archive based on the target characters and the recorder information corresponding to the character images of the archive content pages. The application has the effect of accurately generating the electronic file.

Description

Electronic file generation method, device, equipment and medium
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a method, an apparatus, a device, and a medium for generating an electronic file.
Background
The archive refers to various forms of original records with preservation value, which are directly formed by people in various social activities, and the utilization value of the archive for social development and human life is an important expression where the archive can meet social demands. Therefore, the file preservation is also of great significance.
In order to facilitate the storage of the archives, the prior art uses OCR (optical character recognition ) technology to scan and recognize the handwritten archives and generate the electronic document according to the recognized content, but because the characters of the handwritten archives may have the conditions of continuous writing and nonstandard writing, the problem that the recognition of the handwritten archives is inaccurate is easily caused, and the generated electronic archives are inaccurate.
Therefore, how to accurately identify the content of the handwriting file and generate an accurate electronic file is a technical problem to be solved by those skilled in the art.
Disclosure of Invention
In order to generate an electronic archive more accurately, the application provides an electronic archive generation method, an electronic archive generation device, electronic archive generation equipment and an electronic archive generation medium.
In a first aspect, the present application provides a method for generating an electronic file, which adopts the following technical scheme:
an electronic archive generation method, comprising:
acquiring an image corresponding to a file to be processed, wherein the image corresponding to the file to be processed at least comprises a file catalog page image and a file content page image, and the file content page image comprises a plurality of file content page character images;
determining recorder information according to the archive catalog page images, and determining target handwriting information corresponding to the recorder information according to the recorder information, wherein the target handwriting information comprises a target component writing library;
Splitting the archival content page character image to obtain a radical image of the archival content page character image; according to the component images and the target component writing library, matching is carried out, and target component corresponding to the component images is determined;
determining a target character corresponding to the archival content page character image according to a target component corresponding to the archival content page character image; and generating an electronic archive based on the target characters corresponding to the archive content page character images and the recorder information.
The present application may be further configured in a preferred example to:
the generating an electronic archive based on the target characters corresponding to the plurality of archive content page character images and the recorder information includes:
identifying file types according to the file catalog page images, determining file types corresponding to the files to be processed, and determining target electronic templates according to the file types;
generating target archive contents according to target characters corresponding to the archive content page character images;
and filling the contents of the target archive and the information of the recorder according to the target electronic template to generate the electronic archive.
The present application may be further configured in a preferred example to:
the archival content page character image carries character codes related to the arrangement sequence of the archival content page character image, the target character corresponds to the character codes carried by the archival content page character image corresponding to the target character one by one,
the generating the target archive content according to the target characters corresponding to the archive content page character images respectively includes:
performing punctuation recognition in all target characters to obtain all target punctuation marks; arranging the character codes corresponding to all the target punctuation marks according to the size, and determining the punctuation codes corresponding to all the target punctuation marks;
grouping all character codes according to all punctuation codes, and determining a plurality of character code groups;
for each character coding group, determining the corresponding code of the character coding group according to punctuation codes; sequentially arranging the target characters corresponding to the character codes in the character code groups to obtain target sentences;
and according to the corresponding codes of the character coding groups corresponding to all the target sentences, sequentially arranging all the target sentences to obtain the content of the target file.
The present application may be further configured in a preferred example to:
The step of filling the contents of the target archive and the information of the recorder into the contents according to the target electronic template to generate the electronic archive comprises the following steps:
splitting the file subclass content according to the target file content by using the target electronic template, and determining the sub-content corresponding to each file subclass and the format information of the sub-content;
obtaining standard format information corresponding to each archive subclass in the target electronic template;
judging whether the format information of the sub-content corresponding to the file subclass accords with the standard format information or not according to each file subclass; if the sub-content is not in conformity with the sub-content, determining similar content similar to the sub-content by utilizing semantic analysis according to the standard format information, and replacing the sub-content according to the similar content to obtain replaced sub-content serving as target sub-content; if yes, taking the sub-content corresponding to the file subclass as target sub-content;
and filling in contents according to the target electronic templates and generating the electronic archive according to all target sub-contents corresponding to the target archive contents and the recorder information.
The present application may be further configured in a preferred example to:
the determining the target character corresponding to the archival content page character image according to the target component corresponding to the archival content page character image comprises the following steps:
Determining at least two initial target characters corresponding to the archive content page character images according to target components corresponding to the archive content page character images;
aiming at each initial target character, acquiring adjacent characters of the archival content page character image corresponding to the target character; determining an initial word according to the target character and the adjacent character;
determining valid words from the at least two initial words; and taking the initial target characters corresponding to all the effective words as target characters corresponding to the character images of the archive content page.
The present application may be further configured in a preferred example to:
the determining the target character corresponding to the archival content page character image according to the target component corresponding to the archival content page character image comprises the following steps:
determining the font structure of the characters according to the character image of the archive content page;
and carrying out character combination according to the font structure and the target component corresponding to the archival content page character image to obtain the target character corresponding to the archival content page character image.
The present application may be further configured in a preferred example to:
before determining the font structure of the characters according to the character image of the archive content page, the method further comprises the following steps:
Character combination is carried out on the target components to obtain at least one initial character;
judging whether the number of the initial characters is larger than 1;
if not, taking the initial character as a target character;
correspondingly, the determining the font structure of the character according to the character image of the archive content page comprises the following steps:
if yes, determining the font structure of the characters according to the character image of the archive content page.
In a second aspect, the present application provides an electronic file generating apparatus, which adopts the following technical scheme:
an electronic archive generating device, comprising:
the file image acquisition module is used for acquiring images corresponding to files to be processed, wherein the images corresponding to the files to be processed at least comprise file catalog page images and file content page images, and the file content page images comprise a plurality of file content page character images;
the target handwriting information determining module is used for determining recorder information according to the archive catalog page image and determining target handwriting information corresponding to the recorder information according to the recorder information, wherein the target handwriting information comprises a target component writing library;
the component determining module is used for splitting the archival content page character image to obtain a component image of the archival content page character image; according to the component images and the target component writing library, matching is carried out, and target component corresponding to the component images is determined;
The electronic archive generating module is used for determining target characters corresponding to the archive content page character images according to target components corresponding to the archive content page character images; and generating an electronic archive based on the target characters corresponding to the archive content page character images and the recorder information.
In a third aspect, the present application provides an electronic device, which adopts the following technical scheme:
at least one processor;
a memory;
at least one application program, wherein the at least one application program is stored in the memory and configured to be executed by the at least one processor, the at least one application program configured to: an electronic archive generating method as claimed in any one of the first aspects is performed.
In a fourth aspect, the present application provides a computer readable storage medium, which adopts the following technical scheme:
a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the electronic archive generation method of any of the first aspects.
In summary, the application at least comprises the following beneficial technical effects:
in the embodiment of the application, compared with the case of directly recognizing the handwriting file by utilizing the OCR technology in the related technology, the problem that the generated electronic file is inaccurate due to inaccurate handwriting character recognition can occur because characters of the handwriting file are possibly connected and written in an nonstandard manner; according to the scheme, the handwritten content in the file to be processed is obtained by obtaining the image corresponding to the file to be processed; because writing habits of different people are different, writing modes of the same character are different, the writing habits of the recorder are determined by acquiring target handwriting information corresponding to the recorder information, and accuracy of target component determined based on a target component writing library in the target handwriting information can be improved; the content of the handwriting file is accurately identified by improving the accuracy of the target character determined by the target component; and accurately generating the electronic file based on the target characters and the recorder information corresponding to the character images of the plurality of file content pages.
Drawings
Fig. 1 is a schematic application scenario diagram of an electronic file generating method according to an embodiment of the present application.
Fig. 2 is a flowchart of an electronic file generating method according to an embodiment of the present application.
Fig. 3 is a schematic structural diagram of an electronic file generating apparatus according to an embodiment of the present application.
Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The present application will be described in further detail with reference to fig. 1 to 4.
The present embodiment is merely illustrative of the present application and is not intended to limit the present application, and those skilled in the art, after having read the present specification, may make modifications to the present embodiment without creative contribution as necessary, but are protected by patent laws within the scope of the present application.
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions in the embodiments of the present application are clearly and completely described, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
In addition, the term "and/or" herein is merely an association relationship describing an association object, and means that three relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist together, and B exists alone. In this context, unless otherwise specified, the term "/" generally indicates that the associated object is an "or" relationship.
Embodiments of the application are described in further detail below with reference to the drawings.
The embodiment of the application provides an electronic file generation method which is executed by electronic equipment, wherein the electronic equipment can be a server or terminal equipment, and the server can be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server for providing cloud computing service. The terminal device may be a smart phone, a tablet computer, a notebook computer, a desktop computer, etc., but is not limited thereto, and the terminal device and the server may be directly or indirectly connected through a wired or wireless communication manner, which is not limited herein.
As shown in fig. 1, a user sends an electronic file generation request carrying a file image to be processed to an electronic device by using a terminal, and the electronic device is provided with the electronic file generation method, so that an electronic file can be obtained based on the file image to be processed in the electronic file generation request.
As shown in fig. 2, the method includes steps S101 to S104, wherein:
step S101: and obtaining an image corresponding to the file to be processed, wherein the image corresponding to the file to be processed at least comprises a file catalog page image and a file content page image, and the file content page image comprises a plurality of file content page character images.
And acquiring an image corresponding to the file to be processed through the camera equipment, or acquiring the image corresponding to the file to be processed uploaded by the user through the terminal by the electronic equipment.
In the implementation manner of acquiring the current corresponding image to be processed by the image capturing apparatus, step S101 may specifically include: the terminal prompts a user to place a target page of the file to be processed at a target position in a preset placing mode, wherein the preset placing mode and the target page are different when a file catalog page image and a file content page image are acquired, the target position is the same, the target page is a file catalog page when the file catalog page image is acquired, and the target page is a file content page when the file content page is acquired; the camera equipment collects an initial image of a target page and transmits the initial image back to the electronic equipment; the electronic equipment determines a target recognition range corresponding to the target page by recognizing the edge of the target page in the target page image, and cuts the initial image based on the target recognition range to obtain an image corresponding to the target page.
The file catalog page image comprises information such as recorder identification information, file names, recording time, file catalogues in files and the like, wherein the recorder identification information can be any identification information which can represent the identities of the recorders, such as recorder numbers, recorder names and the like; the archive content page image includes information such as file name, file content, and the like. The recorder identification information, the file name, the recording time, the file directory in the file, the file name and the file content all comprise a plurality of character images, and the character images can be handwritten character images or printing plate character images.
Step S102: and determining recorder information according to the archive catalog page images, and determining target handwriting information corresponding to the recorder information according to the recorder information, wherein the target handwriting information comprises a target component writing library.
When the record persons are different, the handwriting in the handwriting file is different, and the standard presentation patterns of the different handwriting of the same character are different, and the characters corresponding to the same standard presentation patterns may be different, so the scheme determines the standard presentation patterns of the handwriting in the file to be processed by determining the identity of the record persons, thereby reducing the problem of character errors caused by the standard presentation patterns of the handwriting, and improving the accuracy of character recognition. The standard presentation style of each character is a common style of the recorder when handwriting the character, can be obtained by collecting a large number of handwriting files of the recorder and identifying the handwriting files, and can be stored in the electronic equipment in advance.
Target handwriting information corresponding to the recorder information is determined according to the recorder information, and the target handwriting information can be realized through matching. The target component writing library is the corresponding relation between the handwriting component image and the standard presentation style of the handwriting, and the handwriting component image and the standard presentation style of the corresponding handwriting correspond to the unique component character.
Step S103: splitting the character images of the archives pages to obtain radical images of the character images of the archives pages; and according to the matching of the component images and the target component writing library, determining the target component corresponding to the component images.
According to the matching of the component images and the target component writing library, the target component corresponding to the component images is determined, which specifically comprises: matching the component image with a target component writing library to obtain the similarity of the component image and the standard presentation style of each handwriting; and taking the standard presentation style of the handwriting with the maximum similarity as a target standard presentation style, wherein the radical characters corresponding to the target standard presentation style are target radicals.
It should be noted that, for the same archival content page character image, if there are more than one target components corresponding to each component image that are the same, the archival content page character image corresponding components include at least two identical target components.
Step S104: determining target characters corresponding to the archival content page character images according to target components corresponding to the archival content page character images; and generating an electronic archive based on the target characters and the recorder information corresponding to the character images of the archive content pages.
In the embodiment of the application, compared with the case of directly recognizing the handwriting file by utilizing the OCR technology in the related technology, the problem that the generated electronic file is inaccurate due to inaccurate handwriting character recognition can occur because characters of the handwriting file are possibly connected and written in an nonstandard manner; according to the scheme, the handwritten content in the file to be processed is obtained by obtaining the image corresponding to the file to be processed; because writing habits of different people are different, writing modes of the same character are different, the writing habits of the recorder are determined by acquiring target handwriting information corresponding to the recorder information, and accuracy of target component determined based on a target component writing library in the target handwriting information can be improved; the target with higher accuracy can be obtained from more accurate target components; the content of the handwriting file is accurately identified by improving the accuracy of target characters determined by targets and target components; and accurately generating the electronic file based on the target characters and the recorder information corresponding to the character images of the plurality of file content pages.
In one possible implementation manner of the embodiment of the present application, step S104, generating an electronic archive based on target characters and recorder information corresponding to each of the plurality of archive content page character images, may specifically include steps SA1 to SA3 (not shown in the drawings), where:
step SA1: and carrying out file type identification according to the file catalog page image, determining the file type corresponding to the file to be processed, and determining the target electronic template according to the file type.
Based on different rules, different files can be divided into different types, the embodiment of the application does not limit the division rules any more, and a technician can set the division rules according to actual requirements, and a plurality of file types and corresponding division rules are prestored in the electronic equipment.
Specifically, characters in a preset file name recognition range in the file directory page image are recognized to obtain file names, and the preset file name recognition range can be preset by technicians according to a common range of historical file names and stored in electronic equipment; extracting keywords from the file names to obtain target keywords, wherein the target keywords can be keywords such as data or personnel; determining the corresponding file type of the target keyword in the corresponding relation between the preset file type and the keyword, wherein the corresponding relation between the preset file type and the keyword complies with the set rule; taking the file type corresponding to the target keyword as the file type; and determining a target electronic template corresponding to the archive type in all the electronic templates.
Step SA2: and generating target archive contents according to the target characters corresponding to the archive content page character images.
The target archive content comprises a plurality of plate names and respective corresponding plate contents which are sequentially arranged. For each plate name and corresponding plate content, the plate names and the corresponding plate content are adjacent, and the plate names are arranged in the front and the plate content are arranged in the back.
Step SA3: and filling the contents of the target file and the information of the recorder according to the target electronic template to generate an electronic file.
In the embodiment of the application, the target electronic template corresponding to the file to be processed is determined by obtaining the file type corresponding to the file to be processed; after the target file content is generated by all target characters, the target file content and the recorder information are filled in according to the target electronic template, so that an electronic file is generated, and compared with the situation that the target file content is filled in irregularly, the electronic file obtained by the scheme is higher in normalization.
In one possible implementation manner of the embodiment of the present application, the archival content page character image carries a character code related to the arrangement sequence of the archival content page character image, the target character corresponds to the character code carried by the archival content page character image corresponding to the target character one by one, and step SA2 may specifically include steps SA2-1 to SA2-4 (not shown in the figure), where:
Step SA2-1: performing punctuation recognition in all target characters to obtain all target punctuation marks; and arranging the character codes corresponding to all the target punctuation marks according to the size, and determining the punctuation codes corresponding to all the target punctuation marks.
The punctuation code characterizes the order in which the punctuation codes correspond to the occurrence of the target punctuation marks in all target punctuation marks.
For example, all target characters are "you", "good", "very", "open", "heart", "recognize", "know", "you", "and". "; "you" corresponds to character code 1, "good" corresponds to character code 2, "," corresponds to character code 3, "very" corresponds to character code 4, "on" corresponds to character code 5, "heart" corresponds to character code 6, "recognize" corresponds to character code 7, "recognize" corresponds to character code 8, "you" corresponds to character code 9, ". "corresponding character code 10; "," and ". "is the target punctuation mark,", corresponding to punctuation code 1, ". "corresponds to punctuation code 2.
Step SA2-2: and grouping all character codes according to all punctuation codes to determine a plurality of character code groups.
Specifically, a character code which is not larger than a first character code is used as a first character code group, wherein the first character code is a character code corresponding to a target punctuation mark with the minimum punctuation code; and taking the character codes between the character codes corresponding to the two adjacent punctuation codes and the smaller character code in the two adjacent punctuation codes as a second character code group.
The character encoding set includes a first character encoding set and/or a second character encoding set. When the content of the file has only one sentence, the character code group has a unique first character code group, and the punctuation appears to be a short sentence, and each character code group corresponds to each short sentence.
For example, "you", "hello", "very", "open", "heart", "recognize", "identify", "you", "and" on ". In the method, the punctuation code is the target punctuation symbol with the minimum punctuation code, the corresponding character code 3, the character code 1, the character code 2 and the character code 3 are the first character code group; character codes 3 and 10 are character codes between the character codes corresponding to two adjacent punctuation codes, and character codes 4 to 10 are second character code groups.
Step SA2-3: for each character coding group, determining the corresponding code of the character coding group according to punctuation codes; and sequentially arranging the target characters corresponding to the character codes in the character code groups to obtain target sentences.
Specifically, the punctuation codes are used as character code group corresponding codes, and the character code group corresponding codes represent the arrangement order of phrases in the file content; and sequentially arranging target characters in the character coding group by taking the character codes as the standard to obtain a target sentence.
For example, "you", "hello", "very", "open", "heart", "recognize", "identify", "you", "and" on ". In the method, the first character encoding group is correspondingly encoded to be 1, and the second character encoding group is correspondingly encoded to be 2; the characters are arranged according to the first character encoding group, so that a target sentence ' hello ' is obtained, and the target sentence ' is obtained by the same thing, so that people know you very happily. ".
Step SA2-4: and according to the corresponding codes of the character coding groups corresponding to all the target sentences, sequentially arranging all the target sentences to obtain the content of the target file.
For example, "you", "hello", "very", "open", "heart", "recognize", "identify", "you", "and" on ". In the method, target sentences of which the character code groups correspond to codes 1 are arranged at the front, and target sentences of which the character code groups correspond to codes 2 are arranged at the rear, so that 'hello' is obtained, and people feel very happy. ".
In the embodiment of the application, compared with the case that the number of the target characters is large, the time consumption of the process of obtaining the target file content by directly arranging each target character in sequence according to the character codes is long; the scheme is that all character codes are grouped to obtain a character code group corresponding to target characters forming the same sentence; sequentially arranging the character codes in the character code groups corresponding to the target characters according to each character code group to obtain target sentences, so that a plurality of target sentences are obtained by arranging the target sentences at the same time, and the total time length of all target sentences can be saved; and arranging all target sentences in sequence to obtain target archive contents so as to obtain the target archive contents more quickly.
In one possible implementation manner of the embodiment of the present application, step SA3, filling the content of the target archive and the information of the recorder according to the target electronic template to generate the electronic archive may specifically include steps SA3-1 to SA3-6 (not shown in the figure), where:
step SA3-1: and splitting the file subclass content according to the target file content by the target electronic template, and determining the sub-content corresponding to each file subclass and the format information of the sub-content.
The target electronic template comprises a plurality of preset plates, each preset plate at least comprises a preset plate name and a preset plate content filling position, the preset plate name is provided with corresponding preset format information, and the preset format information comprises information such as font formats, paragraph formats, word numbers and the like.
Specifically, the plate content adjacent to the plate name position and at the rear relative position in the target file content is used as the plate content corresponding to the plate name; and taking the plate content corresponding to the plate name as sub-content, and identifying the format information of each sub-content.
The preset plate is equivalent to a file subclass.
Step SA3-2: and obtaining standard format information corresponding to each archive subclass in the target electronic template.
Specifically, a target preset plate name corresponding to each plate name is determined, and preset format information corresponding to the target preset plate name is used as standard format information of the sub-content.
Step SA3-3: and judging whether the format information of the sub-content corresponding to the file subclass accords with the standard format information according to each file subclass.
Wherein the same is conforming and the different is non-conforming.
Step SA3-4: if the sub-content is not in conformity with the sub-content, determining similar content similar to the sub-content by utilizing semantic analysis according to the standard format information, and replacing the sub-content according to the similar content to obtain the replaced sub-content as the target sub-content.
Wherein, the sub-content which does not accord with the standard format information representation file subclass needs to adjust the format; the similar characterization semantics are similar, namely the semantic similarity is larger than the preset semantic similarity, and the preset semantic similarity can be determined by a large amount of historical data of semantic analysis; the number of words of similar content needs to meet the preset number of words requirement.
Specifically, if the content is not matched with the sub-content, determining similar content similar to the sub-content through semantic analysis, and controlling the word number of the similar content within a preset word number range; converting the format of the similar content within the preset word number range into a preset format to obtain the similar content conforming to the standard format information; and replacing the similar content with the similar content corresponding sub-content to serve as target sub-content.
Step SA3-5: if so, the sub-content corresponding to the archive sub-class is used as the target sub-content.
Wherein, the sub-content corresponding to the file subclass is characterized by the standard format information without adjusting the format.
Step SA3-6: and filling in the content according to the target electronic template and generating an electronic file according to all target sub-content and recorder information corresponding to the target file content.
In the embodiment of the application, aiming at each file subclass, the accuracy of the sub-content in terms of format is ensured by determining that the format information of the sub-content corresponding to each file subclass in the target file content accords with the standard format information; and filling in the target electronic template by utilizing the target archive content and the recorder information which are composed of the sub-content with the accurate format, so that the electronic archive with the more standard format can be obtained.
It may be understood that when the number of target components is not less than 3, the number of initial target characters corresponding to the archival content page character image is at least two, and in one possible implementation manner of the embodiment of the present application, in step S104, the target characters corresponding to the archival content page character image are determined according to the target components and the targets corresponding to the archival content page character image, which may specifically include step S1041a (not shown in the figure), step S1042a (not shown in the figure), and step S1043a (not shown in the figure), where:
Step S1041a: and determining at least two initial target characters corresponding to the archival content page character images according to the target components corresponding to the archival content page character images.
For example, the target components are "mouth", and the at least two initial target characters are "lu", "", "back", and the like, respectively.
Step S1042a: aiming at each initial target character, acquiring adjacent characters of the archival content page character image corresponding to the target character; and determining an initial word according to the target character and the adjacent characters.
Step S1043a: determining valid words from at least two initial words; and taking the initial target characters corresponding to all the effective words as target characters corresponding to the character images of the archive content page.
Specifically, determining valid words from at least two initial words may include: the historical occurrence times of each initial word are obtained, the initial word with the larger historical occurrence times is used as an effective word, and the historical occurrence times can be determined through the file content of the existing electronic file in the electronic equipment.
In the embodiment of the application, by determining at least two initial target characters corresponding to the archival content page character image, compared with directly determining a unique target character, the scheme can determine various possible characters of the handwritten character corresponding to the archival content page character image, and reduce the probability of target character determination errors caused by direct determination errors; after determining the initial word related to the adjacent character of the initial target character for each initial target character, determining the effective word from at least two initial words, and selecting the initial word with higher rationality as the effective word by utilizing the semantic relation between the context so as to reduce the probability of determining the error of the target character.
In step S104, determining, according to the target component corresponding to the archival content page character image, the target character corresponding to the archival content page character image may specifically include S1041b (not shown in the figure) and step S1042b (not shown in the figure), where:
step S1041b: and determining the font structure of the characters according to the character images of the archive content page.
Specifically, determining the relative positions of targets in the character images of the archives content pages, wherein the relative positions of the targets comprise up and down, left and right and the like; determining a font structure corresponding to the target relative position in a corresponding relation between the preset relative position and the font structure, and taking the font structure corresponding to the target relative position as the font structure of the character, wherein the corresponding relation between the preset relative position and the font structure can be prestored in the electronic equipment by a technician according to the font writing specification.
Step S1042b: and carrying out character combination according to the font structure and the target components corresponding to the archival content page character image to obtain the target characters corresponding to the archival content page character image.
Specifically, a placement position is determined for each target component according to the font structure; and combining the characters based on all the placement positions to determine the target character.
In the embodiment of the application, compared with the problem that the error rate of the obtained target character is larger when the target component and the target are randomly combined; according to the scheme, the target character can be obtained more quickly and accurately by combining the target components through the font structure of the character.
One possible implementation manner of the embodiment of the present application may specifically further include, before step S1041b, step SB1 to step SB3 (not shown in the figure), where:
step SB1: and combining the characters of the target components to obtain at least one initial character.
Specifically, combining target components according to each preset font structure to obtain an initial reference character corresponding to each font structure; screening the non-existing initial reference characters to obtain at least one initial character.
Step SB2: it is determined whether the number of initial characters is greater than 1.
Step SB3: if not, the initial character is taken as the target character.
Correspondingly, step S1041b, determining the font structure of the character according to the character image of the archive content page may specifically include:
if yes, determining the font structure of the characters according to the character image of the archive content page.
In the embodiment of the application, whether the number of initial characters is larger than 1 is judged before the target components are combined according to the font structure; the time consumed in the process of obtaining the target character according to the font structure when only one combination exists in the target component can be reduced.
In a possible implementation manner of the embodiment of the present application, after generating the electronic archive based on the target characters and the recorder information corresponding to each of the plurality of archive content page character images in step S104, step SC1 (not shown in the figure) and step SC2 (not shown in the figure) may be further specifically included, where:
step SC1: and acquiring an update instruction carrying the file number and the update requirement.
Step SC2: and determining a target electronic file according to the file number, and updating the target electronic file according to the updating requirement.
Specifically, the storage position information of the file corresponding to the file number is obtained by searching the file number; taking the content stored in the position corresponding to the storage position information as a target electronic file; and replacing the electronic file identified and automatically generated by the handwriting file to the corresponding position of the storage position information so as to finish file updating.
In the embodiment of the application, the electronic file of the handwriting file is automatically generated, and then the content of the handwriting file of the successfully generated electronic file is automatically updated to the target electronic file, so that the manual intervention degree can be reduced, and the updating efficiency of the target electronic file is improved.
The above embodiment describes an electronic file generating method from the viewpoint of a method flow, and the following embodiment describes an electronic file generating apparatus from the viewpoint of a virtual module or a virtual unit, specifically the following embodiment.
An embodiment of the present application provides an electronic file generating apparatus, as shown in fig. 3, where the electronic file generating apparatus specifically may include:
the to-be-processed archive image obtaining module 201 is configured to obtain an image corresponding to an archive to be processed, where the image corresponding to the archive to be processed at least includes an archive catalog page image and an archive content page image, and the archive content page image includes a plurality of archive content page character images;
the target handwriting information determining module 202 is configured to determine recorder information according to the archive catalog page image, and determine target handwriting information corresponding to the recorder information according to the recorder information, where the target handwriting information includes a target component writing library;
the component determining module 203 is configured to split the archival content page character image to obtain a component image of the archival content page character image; according to the matching of the component images and the target component writing library, determining target component corresponding to the component images;
the electronic archive generating module 204 is configured to determine a target character corresponding to the archive content page character image according to a target component corresponding to the archive content page character image; and generating an electronic archive based on the target characters and the recorder information corresponding to the character images of the archive content pages.
In one possible implementation manner of the embodiment of the present application, when executing the generation of the electronic archive based on the target characters and the recorder information corresponding to the character images of the plurality of archive content pages, the electronic archive generation module 204 is specifically configured to:
identifying file types according to the file catalog page images, determining file types corresponding to files to be processed, and determining target electronic templates according to the file types;
generating target archive content according to target characters corresponding to the archive content character images;
and filling the contents of the target file and the information of the recorder according to the target electronic template to generate an electronic file.
In one possible implementation manner of the embodiment of the present application, the archive content page character image carries character codes related to the arrangement sequence of archive content page character images, the target characters correspond to the character codes carried by the archive content page character images corresponding to the target characters one by one, and the electronic archive generating module 204 is specifically configured to, when executing the generation of the target archive content according to the target characters corresponding to the plurality of archive content character images, generate the target archive content:
performing punctuation recognition in all target characters to obtain all target punctuation marks; arranging the character codes corresponding to all the target punctuation marks according to the size, and determining the punctuation codes corresponding to all the target punctuation marks;
Grouping all character codes according to all punctuation codes, and determining a plurality of character code groups;
for each character coding group, determining the corresponding code of the character coding group according to punctuation codes; sequentially arranging the target characters corresponding to the character codes in the character code groups to obtain target sentences;
and according to the corresponding codes of the character coding groups corresponding to all the target sentences, sequentially arranging all the target sentences to obtain the content of the target file.
In one possible implementation manner of the embodiment of the present application, when the electronic archive generating module 204 performs content filling on the target archive content and the recorder information according to the target electronic template, the electronic archive generating module is specifically configured to:
splitting file subclasses according to target file contents according to the target electronic templates, and determining sub-contents corresponding to each file subclass and format information of the sub-contents;
obtaining standard format information corresponding to each archive subclass in a target electronic template;
judging whether the format information of the sub-content corresponding to the file subclass accords with the standard format information or not according to each file subclass; if the sub-content is not in conformity with the sub-content, determining similar content similar to the sub-content by utilizing semantic analysis according to the standard format information, and replacing the sub-content according to the similar content to obtain replaced sub-content serving as target sub-content; if yes, taking the sub-content corresponding to the file subclass as target sub-content;
And filling in the content according to the target electronic template and generating an electronic file according to all target sub-content and recorder information corresponding to the target file content.
In one possible implementation manner of the embodiment of the present application, when the electronic archive generating module 204 determines the target character corresponding to the archive content page character image according to the target component corresponding to the archive content page character image, the electronic archive generating module is specifically configured to:
determining at least two initial target characters corresponding to the archival content page character images according to target components corresponding to the archival content page character images;
aiming at each initial target character, acquiring adjacent characters of the archival content page character image corresponding to the target character; determining an initial word according to the target character and the adjacent characters;
determining valid words from at least two initial words; and taking the initial target characters corresponding to all the effective words as target characters corresponding to the character images of the archive content page.
In one possible implementation manner of the embodiment of the present application, when executing the target component corresponding to the character image of the archive content page, the electronic archive generating module 204 further includes:
A font structure determining unit for determining the font structure of the characters according to the character image of the archive content page;
and the target character determining unit is used for carrying out character combination according to the font structure and the target components corresponding to the archival content page character image to obtain target characters corresponding to the archival content page character image.
In one possible implementation manner of the embodiment of the present application, the electronic archive generating device further includes:
the initial character determining module is used for combining the characters of the target components to obtain at least one initial character;
the initial character quantity judging module is used for judging whether the quantity of initial characters is larger than 1; when the number of the initial characters is not more than 1, triggering a direct determination module; triggering a font structure determining unit when the number of initial characters is not more than 1;
and the direct determination module is used for taking the initial character as a target character.
It will be clearly understood by those skilled in the art that, for convenience and brevity of description, a specific working process of the electronic file generating apparatus described above may refer to a corresponding process in the foregoing method embodiment, which is not described herein again.
In an embodiment of the present application, as shown in fig. 4, an electronic device 300 shown in fig. 4 includes: a processor 301 and a memory 303. Wherein the processor 301 is coupled to the memory 303, such as via a bus 302. Optionally, the electronic device may also include a transceiver 304. It should be noted that, in practical applications, the transceiver 304 is not limited to one, and the structure of the electronic device is not limited to the embodiment of the present application.
The processor 301 may be a CPU (Central Processing Unit ), general purpose processor, DSP (Digital Signal Processor, data signal processor), ASIC (Application Specific Integrated Circuit ), FPGA (Field Programmable Gate Array, field programmable gate array) or other programmable logic device, transistor logic device, hardware components, or any combination thereof. Which may implement or perform the various exemplary logic blocks, modules and circuits described in connection with this disclosure. Processor 301 may also be a combination that implements computing functionality, e.g., comprising one or more microprocessor combinations, a combination of a DSP and a microprocessor, etc.
Bus 302 may include a path to transfer information between the components. Bus 302 may be a PCI (Peripheral Component Interconnect, peripheral component interconnect Standard) bus or an EISA (Extended Industry Standard Architecture ) bus, or the like. Bus 302 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in fig. 4, but not only one bus or type of bus.
The Memory 303 may be, but is not limited to, a ROM (Read Only Memory) or other type of static storage device that can store static information and instructions, a RAM (Random Access Memory ) or other type of dynamic storage device that can store information and instructions, an EEPROM (Electrically Erasable Programmable Read Only Memory ), a CD-ROM (Compact Disc Read Only Memory, compact disc Read Only Memory) or other optical disk storage, optical disk storage (including compact discs, laser discs, optical discs, digital versatile discs, blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
The memory 303 is used for storing application program codes for executing the inventive arrangements and is controlled to be executed by the processor 301. The processor 301 is configured to execute the application code stored in the memory 303 to implement what is shown in the foregoing method embodiments.
Among them, electronic devices include, but are not limited to: mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and the like, and stationary terminals such as digital TVs, desktop computers, and the like. But may also be a server or the like. The electronic device shown in fig. 4 is only an example and should not be construed as limiting the functionality and scope of use of the embodiments of the application.
Embodiments of the present application provide a computer-readable storage medium having a computer program stored thereon, which when run on a computer, causes the computer to perform the corresponding method embodiments described above. Compared with the related art, when the handwriting file is recognized by the OCR technology, the problem that the generated electronic file is inaccurate due to inaccurate handwriting character recognition can occur because characters of the handwriting file can be connected and written inaccurately in the related art; according to the scheme, the handwritten content in the file to be processed is obtained by obtaining the image corresponding to the file to be processed; because writing habits of different people are different, writing modes of the same character are different, the writing habits of the recorder are determined by acquiring target handwriting information corresponding to the recorder information, and accuracy of target component determined based on a target component writing library in the target handwriting information can be improved; the content of the handwriting file is accurately identified by improving the accuracy of the target character determined by the target component; and accurately generating the electronic file based on the target characters and the recorder information corresponding to the character images of the plurality of file content pages.
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.
The foregoing is only a partial embodiment of the present application, and it should be noted that it will be apparent to those skilled in the art that modifications and adaptations can be made without departing from the principles of the present application, and such modifications and adaptations should and are intended to be comprehended within the scope of the present application.

Claims (9)

1. An electronic archive generating method, comprising:
acquiring an image corresponding to a file to be processed, wherein the image corresponding to the file to be processed at least comprises a file catalog page image and a file content page image, and the file content page image comprises a plurality of file content page character images;
Determining recorder information according to the archive catalog page images, and determining target handwriting information corresponding to the recorder information according to the recorder information, wherein the target handwriting information comprises a target component writing library;
splitting the archival content page character image to obtain a radical image of the archival content page character image; according to the component images and the target component writing library, matching is carried out, and target component corresponding to the component images is determined;
determining a target character corresponding to the archival content page character image according to a target component corresponding to the archival content page character image; generating an electronic file based on the target characters corresponding to the file content page character images and the recorder information;
the determining the target character corresponding to the archival content page character image according to the target component corresponding to the archival content page character image comprises the following steps:
determining at least two initial target characters corresponding to the archive content page character images according to target components corresponding to the archive content page character images;
for each initial target character, acquiring adjacent characters of the archival content page character image corresponding to the initial target character; determining an initial word according to the initial target character and the adjacent character;
Acquiring the historical occurrence times of each initial word, and taking the initial word corresponding to the largest historical occurrence times as an effective word, wherein the historical occurrence times can be determined through the file content of the existing electronic file; and taking the initial target characters corresponding to all the effective words as target characters corresponding to the character images of the archive content page.
2. The electronic archive generation method of claim 1, wherein the generating an electronic archive based on the target characters corresponding to each of the plurality of archive content page character images and the recorder information includes:
identifying file types according to the file catalog page images, determining file types corresponding to the files to be processed, and determining target electronic templates according to the file types;
generating target archive content according to the target characters corresponding to the archive content character images;
and filling the contents of the target archive and the information of the recorder according to the target electronic template to generate the electronic archive.
3. The electronic archive generating method of claim 2 wherein the archive content page character images carry character codes associated with the archive content page character image arrangement order, the target characters and the character codes carried by the archive content page character images to which the target characters correspond are in one-to-one correspondence,
The generating the target archive content according to the target characters corresponding to the archive content character images respectively includes:
performing punctuation recognition in all target characters to obtain all target punctuation marks; arranging the character codes corresponding to all the target punctuation marks according to the size, and determining the punctuation codes corresponding to all the target punctuation marks;
grouping all character codes according to all punctuation codes, and determining a plurality of character code groups;
for each character coding group, determining the corresponding code of the character coding group according to punctuation codes; sequentially arranging the target characters corresponding to the character codes in the character code groups to obtain target sentences;
and according to the corresponding codes of the character coding groups corresponding to all the target sentences, sequentially arranging all the target sentences to obtain the content of the target file.
4. The method for generating an electronic archive according to claim 2, wherein said filling in contents of said target archive content and said recorder information according to said target electronic template to generate said electronic archive comprises:
splitting the file subclass content according to the target file content by using the target electronic template, and determining the sub-content corresponding to each file subclass and the format information of the sub-content;
Obtaining standard format information corresponding to each archive subclass in the target electronic template;
judging whether the format information of the sub-content corresponding to the file subclass accords with the standard format information or not according to each file subclass; if the sub-content is not in conformity with the sub-content, determining similar content similar to the sub-content by utilizing semantic analysis according to the standard format information, and replacing the sub-content according to the similar content to obtain replaced sub-content serving as target sub-content; if yes, taking the sub-content corresponding to the file subclass as target sub-content;
and filling in contents according to the target electronic templates and generating the electronic archive according to all target sub-contents corresponding to the target archive contents and the recorder information.
5. The electronic archive generation method of claim 1, wherein the determining the target character corresponding to the archive content page character image from the target component corresponding to the archive content page character image includes:
determining the font structure of the characters according to the character image of the archive content page;
and carrying out character combination according to the font structure and the target component corresponding to the archival content page character image to obtain the target character corresponding to the archival content page character image.
6. The electronic archive generation method of claim 5 further comprising, prior to said determining a font structure of characters from said archive content page character image:
character combination is carried out on the target components to obtain at least one initial character;
judging whether the number of the initial characters is larger than 1;
if not, taking the initial character as a target character;
correspondingly, the determining the font structure of the character according to the character image of the archive content page comprises the following steps:
if yes, determining the font structure of the characters according to the character image of the archive content page.
7. An electronic archive generating device, comprising:
the file image acquisition module is used for acquiring images corresponding to files to be processed, wherein the images corresponding to the files to be processed at least comprise file catalog page images and file content page images, and the file content page images comprise a plurality of file content page character images;
the target handwriting information determining module is used for determining recorder information according to the archive catalog page image and determining target handwriting information corresponding to the recorder information according to the recorder information, wherein the target handwriting information comprises a target component writing library;
The component determining module is used for splitting the archival content page character image to obtain a component image of the archival content page character image; according to the component images and the target component writing library, matching is carried out, and target component corresponding to the component images is determined;
the electronic archive generating module is used for determining target characters corresponding to the archive content page character images according to target components corresponding to the archive content page character images; generating an electronic file based on the target characters corresponding to the file content page character images and the recorder information;
the electronic archive generating module is used for determining target characters corresponding to the archive content page character images when executing target components corresponding to the archive content page character images, wherein the target components are used for executing the target characters corresponding to the archive content page character images:
determining at least two initial target characters corresponding to the archive content page character images according to target components corresponding to the archive content page character images; for each initial target character, acquiring adjacent characters of the archival content page character image corresponding to the initial target character; determining an initial word according to the initial target character and the adjacent character; acquiring the historical occurrence times of each initial word, and taking the initial word corresponding to the largest historical occurrence times as an effective word, wherein the historical occurrence times can be determined through the file content of the existing electronic file; and taking the initial target characters corresponding to all the effective words as target characters corresponding to the character images of the archive content page.
8. An electronic device, comprising:
at least one processor;
a memory;
at least one application program, wherein the at least one application program is stored in the memory and configured to be executed by the at least one processor, the at least one application program configured to: an electronic archive generating method according to any one of claims 1 to 6.
9. A computer-readable storage medium, having stored thereon a computer program which, when executed in a computer, causes the computer to perform the electronic archive generation method of any one of claims 1 to 6.
CN202310684514.2A 2023-06-12 2023-06-12 Electronic file generation method, device, equipment and medium Active CN116416629B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310684514.2A CN116416629B (en) 2023-06-12 2023-06-12 Electronic file generation method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310684514.2A CN116416629B (en) 2023-06-12 2023-06-12 Electronic file generation method, device, equipment and medium

Publications (2)

Publication Number Publication Date
CN116416629A CN116416629A (en) 2023-07-11
CN116416629B true CN116416629B (en) 2023-08-29

Family

ID=87049648

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310684514.2A Active CN116416629B (en) 2023-06-12 2023-06-12 Electronic file generation method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN116416629B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000330546A (en) * 1999-05-25 2000-11-30 Hitachi Ltd Font forming device and storage medium for forming font image
CN1324068A (en) * 2000-03-29 2001-11-28 松下电器产业株式会社 Explanatory and search for handwriting sloppy Chinese characters based on shape of radicals
CN107832768A (en) * 2017-11-23 2018-03-23 盐城线尚天使科技企业孵化器有限公司 Efficient method to go over files and marking system based on deep learning
CN109697905A (en) * 2017-10-20 2019-04-30 深圳市鹰硕技术有限公司 A kind of exam paper marking system
CN111523537A (en) * 2020-04-13 2020-08-11 联讯益康医疗信息技术(武汉)有限公司 Character recognition method, storage medium and system
CN115761781A (en) * 2023-01-06 2023-03-07 江苏狄诺尼信息技术有限责任公司 Note image data identification system for engineering electronic archives

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8428358B2 (en) * 2005-05-31 2013-04-23 Microsoft Corporation Radical-base classification of East Asian handwriting

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000330546A (en) * 1999-05-25 2000-11-30 Hitachi Ltd Font forming device and storage medium for forming font image
CN1324068A (en) * 2000-03-29 2001-11-28 松下电器产业株式会社 Explanatory and search for handwriting sloppy Chinese characters based on shape of radicals
CN109697905A (en) * 2017-10-20 2019-04-30 深圳市鹰硕技术有限公司 A kind of exam paper marking system
CN107832768A (en) * 2017-11-23 2018-03-23 盐城线尚天使科技企业孵化器有限公司 Efficient method to go over files and marking system based on deep learning
CN111523537A (en) * 2020-04-13 2020-08-11 联讯益康医疗信息技术(武汉)有限公司 Character recognition method, storage medium and system
CN115761781A (en) * 2023-01-06 2023-03-07 江苏狄诺尼信息技术有限责任公司 Note image data identification system for engineering electronic archives

Also Published As

Publication number Publication date
CN116416629A (en) 2023-07-11

Similar Documents

Publication Publication Date Title
AU2020279921B2 (en) Representative document hierarchy generation
US8843815B2 (en) System and method for automatically extracting metadata from unstructured electronic documents
CN105631393A (en) Information recognition method and device
US11734341B2 (en) Information processing method, related device, and computer storage medium
CN110083832B (en) Article reprint relation identification method, device, equipment and readable storage medium
CN112784009A (en) Subject term mining method and device, electronic equipment and storage medium
CN111814432A (en) Method and apparatus for determining standard diagnostic codes for diseases
CN112783825A (en) Data archiving method, data archiving device, computer device and storage medium
CN116416629B (en) Electronic file generation method, device, equipment and medium
CN116860747A (en) Training sample generation method and device, electronic equipment and storage medium
CN114168715A (en) Method, device and equipment for generating target data set and storage medium
US20230177266A1 (en) Sentence extracting device and sentence extracting method
CN112560849B (en) Neural network algorithm-based grammar segmentation method and system
US20180307669A1 (en) Information processing apparatus
CN111046629B (en) Outline display method, device and equipment
US20230342385A1 (en) Method for analyzing document for desired content and exracting same, electronic device employing method, and non-transitory storage medium
CN116383346B (en) Retrieval understanding method and electronic equipment
US20230196007A1 (en) Method and system for exemplar learning for templatizing documents across data sources
CN114692573A (en) Text structuring method, apparatus, computer device, medium, and product
CN117953522A (en) Method and device for extracting running form, electronic equipment and storage medium
CN117831053A (en) Method and device for extracting table contents in image and electronic equipment
CN112434141A (en) Information processing method, information processing device, electronic equipment and storage medium
CN117217235A (en) Intent recognition method and system based on large language model
CN114764437A (en) User intention identification method and device and electronic equipment
CN115617951A (en) Contract information extraction method, contract information extraction device, computer apparatus, contract information extraction medium, and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant