CN117350249B - Control configuration method and system for automatically inputting electronic document data - Google Patents

Control configuration method and system for automatically inputting electronic document data Download PDF

Info

Publication number
CN117350249B
CN117350249B CN202311653676.6A CN202311653676A CN117350249B CN 117350249 B CN117350249 B CN 117350249B CN 202311653676 A CN202311653676 A CN 202311653676A CN 117350249 B CN117350249 B CN 117350249B
Authority
CN
China
Prior art keywords
filled
pattern
matching
item
character string
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311653676.6A
Other languages
Chinese (zh)
Other versions
CN117350249A (en
Inventor
杨春
李龙
匡坪
钟家骏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baimus Chengdu Digital Technology Co ltd
Original Assignee
Baimus Chengdu Digital Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Baimus Chengdu Digital Technology Co ltd filed Critical Baimus Chengdu Digital Technology Co ltd
Priority to CN202311653676.6A priority Critical patent/CN117350249B/en
Publication of CN117350249A publication Critical patent/CN117350249A/en
Application granted granted Critical
Publication of CN117350249B publication Critical patent/CN117350249B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/14Tree-structured documents
    • G06F40/143Markup, e.g. Standard Generalized Markup Language [SGML] or Document Type Definition [DTD]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/80Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
    • G06F16/81Indexing, e.g. XML tags; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/80Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
    • G06F16/83Querying
    • G06F16/835Query processing
    • G06F16/8373Query execution

Abstract

The invention belongs to the technical field of electronic information data processing, and provides a control configuration method and a control configuration system for automatically inputting electronic document data, which aim to solve the problems of low efficiency and high error rate of manual setting of blank filling controls in an HTML document in program development of an MES system, and the main scheme of the invention comprises the following steps: configuring a pattern matching library, and constructing a corresponding relation between the pattern matching library and an input control library; the mode identification scanner scans the HTML document, automatically identifies the type of the item to be filled and labels the item to be filled, and generates the HTML document and label record after labeling; the control configurator scans the HTML document marked with the label, automatically selects the control corresponding to the label from the input control library based on the label record, and calls the selected control to carry out control configuration on the item to be filled.

Description

Control configuration method and system for automatically inputting electronic document data
Technical Field
The invention relates to the technical field of electronic information data processing, in particular to a control configuration method and a control configuration system for automatically inputting electronic document data.
Background
In the industrial production process, various forms or some data in texts in the production process need to be filled, the types of the filled data include texts, numerical values, pictures and the like, and paper documents or electronic format documents are generally adopted by the existing enterprises. Along with the popularization of the informatization and intellectualization technology in production management, a production manufacturing management system (MES) is increasingly paid attention to, and when a production enterprise purchases or customizes MES system software, the production enterprise often requires that the format of management files in the system must follow the original format, and the files relate to production information and data filling of each production link, such as material consumption of a production form, name of production quality spot check, detection data, date, live record of the running state of production equipment, and the like. The number of the files is large, the types of the filled data are different, and for common biological product enterprises with only one product line, the data filled documents have at least one hundred pages.
In developing MES system software, a B/S mode is generally adopted, and it is required to perform web page processing on an existing electronic management file (such as wps, word, excel, etc.), keep an original format, and mark a blank place where information and data need to be filled, and link a corresponding processing module through a specific mark, for example: and embedding a read-write instruction at the mark or a special filling control, and popping up a special read-write page to fill in information and data when the system runs the HTML documents, or interfacing a data interface through a processing module to import data generated by other systems or devices. How to mark the HTML document at the filling item to be filled is currently a common method that one or more tags are set manually, but in the patent literature referred to, no specific implementation method is disclosed for how to set the tags.
Disclosure of Invention
The invention aims to provide a control configuration method and system for automatically inputting electronic document data, and aims to solve the problems of low efficiency and high error rate of manual setting of blank filling controls in an HTML document in program development of an MES system.
The invention solves the technical problems and adopts the following technical scheme:
in one aspect, the invention provides a control configuration method for automatically inputting electronic document data, which comprises the following steps:
configuring a pattern matching library, and constructing a corresponding relation between the pattern matching library and an input control library;
the mode identification scanner scans the HTML document, automatically identifies the type of the item to be filled and labels the item to be filled, and generates the HTML document and label record after labeling;
the control configurator scans the HTML document marked with the label, automatically selects the control corresponding to the label from the input control library based on the label record, and calls the selected control to carry out control configuration on the item to be filled.
As further optimization, the configuration of the pattern matching library, and the construction of the corresponding relation between the pattern matching library and the input control library means that:
the pattern matching library comprises pattern character strings, identification symbols and control corresponding relations, wherein the pattern character strings comprise general content and content character strings which need to be filled according to actual demands, the content which needs to be filled according to the actual demands is to be filled items, and if actual services or scenes are different, the general content of the pattern character strings and the to-be-filled items are different.
As a further optimization, the types of the pattern strings include date, time, text, numerical value, picture and voice, and each type of pattern string is configured with a set of symmetrical symbols; and selecting a pattern character string corresponding to the matching type configuration from the pattern matching library, setting a corresponding mark symbol after the matching type is selected, adding the corresponding pattern character string, and matching the to-be-filled item in the HTML document according to the set pattern character string when the HTML document is used.
As further optimization, the pattern recognition scanner scans the HTML document, automatically recognizes the type of the item to be filled and labels the item to be filled, and generates the HTML document and label record after labeling, which means that:
the method comprises the steps that a mode character string which is configured in a comparison mode is scanned by a mode recognition scanner for the items to be filled in an HTML document, after the mode character string is successfully recognized and matched, the matching type and the mark symbol of the corresponding mode character string are read, the mark symbol is marked before and after the matching content of the mode character string is met, corresponding marking information is recorded in a mark symbol marking record table, the number of the same matching type in the HTML document is counted, and the marking information is used as a basis for controlling a matcher to scan, wherein the marking of the label of the single item to be filled is completed;
and the pattern recognition scanner repeatedly repeats the label marking of the single item to be filled, scans the content of the HTML document until the document is finally marked, and generates the HTML document and marking record after marking the labels of all the items to be filled.
As further optimization, the control configurator scans the HTML document marked with the tag, automatically selects the control corresponding to the tag from the input control library based on the tag record, and invokes the selected control to perform control configuration on the item to be filled, which means that:
after generating HTML documents and mark records after all the mark labels of the items to be filled, starting a control configurator to start scanning the generated HTML documents after all the mark labels of the items to be filled, reading corresponding mark records in a mark record table of a corresponding type when the first conforming mark is identified, searching the corresponding control type according to the mark records, adding controls at the corresponding mark positions of the HTML documents by a calling control insertion function, and updating the state in the corresponding mark records to be configured, wherein the configuration of the controls of the single items to be filled is completed;
and the pattern recognition scanner repeatedly performs control configuration of the single item to be filled, scans the content of the HTML document until the document is last, and generates the HTML document with the control configuration completed.
As further optimization, after the control configuration of all items to be filled in the HTML document marked with the labels is completed, the unidentified items to be filled are re-marked through a configuration checking function, the record of the label which is not configured is re-scanned, and a positioning information prompt is provided for manual control configuration.
As a further optimization, the configuration pattern matching library constructs a corresponding relationship between the pattern matching library and the input control library, and further includes: setting parameters supporting matching similarity of a pattern recognition scanner and setting accurate matching or fuzzy matching of pattern character strings;
the similarity parameter is that the number of the matched characters is divided by the number of the characters to be filled, when the similarity between the matched content and the content of the pattern character string is larger than the set similarity parameter after the similarity parameter is set, the matching is successful, the corresponding character string to be filled is brought into the pattern matching library, and the content of the pattern matching library is updated;
the setting of the pattern character string accurate matching or fuzzy matching indicates whether the to-be-filled item can be set to completely match the set pattern character string, when the setting is accurate matching, the to-be-filled item needs to completely match the pattern character string, and when the setting is fuzzy matching, fuzzy recognition is carried out according to the set similarity parameters.
As a further optimization, the mode recognition scanner scans the HTML document, automatically recognizes the type of the item to be filled and labels the item to be filled, and when generating the HTML document and the label record after labeling, the mode recognition scanner further includes:
judging whether the to-be-filled item and the pattern character string are completely matched;
calculating the similarity between the to-be-filled item and the pattern character string, and judging whether the to-be-filled item and the pattern character string are of a matched type;
the character string of the item to be filled is included in the pattern matching library.
As a further optimization, the determining whether the to-be-filled item and the pattern string are completely matched means that: the method comprises the steps that a mode character string which is configured in a comparison mode is scanned by a mode recognition scanner, the matching similarity is calculated after the mode character string is successfully recognized and matched, the number of matched characters is divided by the number of characters of the item to be filled, and when the similarity is equal to 1, the complete matching is indicated;
the step of calculating the similarity between the to-be-filled item and the pattern character string and judging whether the matching type is in accordance with the matching type is that: for the mode character string set as fuzzy matching in the mode matching library, when the mode identification scanner identifies the item to be filled which accords with the content of the mode character string in the processing process of the HTML document, the mode identification scanner performs similarity check, calculates the ratio of the number which can be matched to the item to be filled, further compares the ratio with the set fuzzy matching similarity parameter, if the similarity is larger than the set parameter, performs the next processing to bring the character string corresponding to the item to be filled into the mode matching library, directly inserts the character string into the corresponding mapping table, correspondingly generates HTML document marks and label records, and if the similarity is smaller than the set parameter, skips the current matching content, considers that the matching is failed, and non-target matching character string;
if a plurality of items to be filled are scanned in the same document and the similarity is inconsistent, selecting the label and matching with the maximum similarity and incorporating the corresponding character string into a pattern matching library under the condition that the similarity is larger than a set parameter;
the character strings of the items to be filled are brought into a pattern matching library, which means that: and incorporating the character strings corresponding to the items to be filled into a pattern matching library, directly inserting the character strings into a corresponding mapping table, and automatically perfecting the pattern matching library according to a similarity increment learning mode based on document contents.
On the other hand, the invention also provides a control configuration system for automatically inputting the electronic document data, which is applied to the control configuration method for automatically inputting the electronic document data, and comprises the following steps:
the pattern and control library comprises a pattern matching library and an input control library, wherein the pattern matching library is used for identifying the region and the type of a to-be-filled item in an HTML document and matching corresponding identifiers, enumerating the characters, numbers or region types of the to-be-filled item to form a set, corresponding to different identification symbols, forming a library file in a system, and realizing scanning comparison by establishing a standard pattern character string and corresponding symbols thereof and constructing a mapping table; the input control library is used for defining corresponding read-write functions, different identifiers and callable instructions according to the data category of the item to be filled, forming different filling controls, storing the controls and constructing a control library for automatic configuration and selection;
the pattern recognition scanner is used for scanning the HTML document, if the to-be-filled item is found, the enumerated character strings in the pattern recognition library are matched, after the matching is successful, the corresponding identifiers are read, the identifiers are marked before and after the to-be-filled item and serve as a configuration point, a recognition record is formed, the recognition record is called a recognition tag record, the recognition tag record is used for recording a plurality of pieces of information for automatically recognizing the to-be-filled item, and the tag record field comprises: the matched identification symbol, the types of the items to be filled, codes, configuration point numbers and configuration states, different types of the items to be filled have different counts, and the count of the codes of the items to be filled of each type starts to count from 1 and is used for counting the number of the types of the items to be filled of the same type in the document;
the control configurator is used for scanning the HTML document, pausing after finding the identifier of the configuration point, reading the configuration symbol record information in the identification tag record file, selecting a corresponding operation control from the input control library to correspond, configuring the control at the corresponding configuration point position, modifying the configuration state in the identification tag record, and changing the configuration state into the configured state.
The beneficial effects of the invention are as follows: by the control configuration method and the control configuration system for automatically inputting the data of the electronic document, the automatic configuration of the filling blank data input control which is converted into the HTML management file can be realized in the development of the software system of the MES, a computer automatic program system is built, the automatic configuration of the data input control is completed, the workload of a programmer on the part is reduced, the processing efficiency of the programmer on the HTML document is improved, and errors, missed configuration and mismatching conditions are reduced.
Drawings
FIG. 1 is a system configuration block diagram of a control configuration system for automatic entry of electronic document data in embodiment 1 of the present invention;
FIG. 2 is a flowchart of a control configuration method for automatically inputting electronic document data in embodiment 2 of the present invention;
FIG. 3 is a flowchart of a control configuration method for automatically inputting electronic document data in embodiment 3 of the present invention;
FIG. 4 is a flowchart of a control configuration method for automatically inputting electronic document data in embodiment 4 of the present invention;
fig. 5 is a flowchart of an incremental learning automatic configuration of an electronic document data entry control based on similarity determination according to embodiment 5 of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Example 1
The embodiment provides a control configuration system for automatically inputting electronic document data, and a system structure block diagram thereof is shown in fig. 1, wherein the system comprises:
the pattern and control library comprises a pattern matching library and an input control library, wherein the pattern matching library is used for identifying the region and the type of a to-be-filled item in an HTML document and matching corresponding identifiers, enumerating the characters, numbers or region types of the to-be-filled item to form a set, corresponding to different identification symbols, forming a library file in a system, and realizing scanning comparison by establishing a standard pattern character string and corresponding symbols thereof and constructing a mapping table; the input control library is used for defining corresponding read-write functions, different identifiers and callable instructions according to the data category of the item to be filled, forming different filling controls, storing the controls and constructing a control library for automatic configuration and selection;
wherein the area and type of the item to be filled can be divided into date, text, numerical value, special symbol, etc., each type enumerates all possible expressions, such as date type includes: a year, a month and a day,Year of lifeMonth of moonDay, time:etc.; the text types include: () [ solution ]]、Etc.
The identification symbol is a set of symmetrical symbols, for example, the date type may be: "≡ &", signature type "%", value type is: "#" and so on
In the production and manufacturing process, the types of the to-be-filled items are more, the expression forms are different, the types of the to-be-filled items are identified through a character string comparison method, and meanwhile, corresponding mark symbols are established, so that the following problems exist:
(1) the patterns to be filled which are of the same type may be categorized into a plurality of different types, for example: a year, a month and a day,Year of lifeMonth of moonDay, time:month and day 2023, so that different characters and formats are different, and the character strings are different in pattern matching libraryTo store a plurality of corresponding character strings. If a character string is used to correspond to a marking symbol, a plurality of symbols are needed to mark the date type, so that the established mapping table of the type of the item to be filled and the marking symbol is complex, the whole process consumes resources, and the efficiency is reduced.
(2) The character strings constructed in the pattern matching library are imperfect, and the new types of the items to be filled can be completely met in the process of numerous documents, but the pattern matching library does not exist, so that on one hand, the types of the items to be filled and the identifier mapping table can be manually supplemented and entered through system prompt, but the system efficiency and experience degree are affected.
In order to solve the above problem, the present embodiment proposes an incremental learning method based on similarity determination, which includes the following steps:
step 1: initializing a pattern matching library, and constructing a common and known item type to be filled and a corresponding mark symbol mapping table thereof, wherein each item type to be filled corresponds to a unique identifier and is a many-to-one relation mapping table.
Step 2: and scanning the HTML document to obtain character strings of the items to be filled, and searching whether the mapping tables are matched word by word. If there is a match, the following operation (2.1-2.3) is performed, and if there is no match, step 3 is performed.
2.1: calculating similarity: the number of characters that can be matched divided by the number of refill characters (the character string contains the number of characters)
2.2: setting a similarity judgment value and judging: if there is a similarity equal to 1, indicating a perfect match, then 2.3 is performed; if the similarity is larger than 0.5 (settable), executing 2.3, and incorporating the substitution and filling item character string into a pattern matching library, and inserting the character string into a mapping table after confirmation, wherein the character string corresponds to the corresponding type mark symbol. Otherwise, executing the step 3.
2.3 And reading the corresponding mark symbol in the mapping table, and marking the item to be filled (the item fragment to be filled).
Step 3: and when the similarity is 0 or less than 0.5 (which can be set), prompting the newly added category and defining the corresponding identifier. After confirmation, a new category and a corresponding identifier are added in the original mapping table, and the mapping table is updated. This process is an incremental learning.
Step 4: continuing scanning, and repeating the step 2.
For the input control library, the input control library can be divided into text, date, numerical value, picture and voice types according to the data category of the 'to-be-filled item', corresponding read-write functions are defined, different identifiers and callable instructions are defined, different filling controls are formed, and the controls are stored and built into a control library so as to be automatically configured and selected. The control library may support other types of controls or custom controls, such as: may be a read-in program of interface data. And, entering the control gallery may include:
text entry control: reading and selecting document content for writing are supported, and manual input is also supported;
date entry control: the system date can be read, the date can be manually input or selected, and writing of various date formats is supported;
numerical value input control: supporting the reading and writing of single data, a group of data, historical data, continuous trend data and discrete point data into configuration points;
a picture input control: support scanning pictures, photos, support online signature pictures to read and write configuration points;
voice entry control: supporting voice recognition, having voice conversion text, reading and writing configuration points;
after clicking the corresponding Input box, the controls can pop up the keyboard Input or voice dialogue Input, convert the corresponding Input information into control content and display, register the operation time, select the data source and pre-process the data to a certain extent.
In the embodiment of the control library, a standard data mapping set can be constructed, and the different control components are matched with the type identifiers in the library correspondingly to establish a category code of the item to be filled and corresponding control operation.
The pattern recognition scanner is used for scanning the HTML document, if the to-be-filled item is found, the enumerated character strings in the pattern recognition library are matched, after the matching is successful, the corresponding identifiers are read, the identifiers are marked before and after the to-be-filled item and serve as a configuration point, a recognition record is formed, the recognition record is called a recognition tag record, the recognition tag record is used for recording a plurality of pieces of information for automatically recognizing the to-be-filled item, and the tag record field comprises: the matched identification symbol, the types of the items to be filled, codes, configuration point numbers and configuration states, different types of the items to be filled have different counts, and the count of the codes of the items to be filled of each type starts to count from 1 and is used for counting the number of the types of the items to be filled of the same type in the document;
the control configurator is used for scanning the HTML document, pausing after finding the identifier of the configuration point, reading the configuration symbol record information in the identification tag record file, selecting a corresponding operation control from the input control library to correspond, configuring the control at the corresponding configuration point position, modifying the configuration state in the identification tag record, and changing the configuration state into the configured state.
Example 2
The embodiment provides a control configuration method for automatically inputting electronic document data, a flow chart of which is shown in fig. 2, wherein the method comprises the following steps:
s1, configuring a pattern matching library, and constructing a corresponding relation between the pattern matching library and an input control library;
s2, scanning the HTML document by a pattern recognition scanner, automatically recognizing the type of the item to be filled and labeling the item to be filled, and generating the HTML document and a label record after labeling the label;
and S3, the control configurator scans the HTML document marked with the label, automatically selects the control corresponding to the label from the input control library based on the label record, and invokes the selected control to perform control configuration on the item to be filled.
Example 3
The embodiment provides a control configuration method for automatically inputting electronic document data, the flow chart of which is shown in fig. 3, and the embodiment can be suitable for automatically configuring the input content of an HTML file.
As shown in fig. 3, the embodiment specifically includes the following steps:
s11, configuring a pattern matching library, and constructing a corresponding relation between the pattern library and a control library, namely, establishing a mapping table, and completing configuration.
Specifically, the pattern matching library is composed of pattern character strings, identification symbols and control corresponding relations, and the pattern character strings can refer to content character strings containing general content and needing to be filled according to actual requirements, wherein the content needing to be filled according to the actual requirements is to be filled items. The actual service or scene is different, and the general content and the to-be-filled item of the pattern character string are also different. For example, the pattern string may be a date or time to be filled out, commonly known as "_____ hours _____ minutes", "date: the filling is expected on the days such as 'year, month, day', '__________, ________, ________, ________, ________ minutes', and the like; the pattern string may be of the text type, common'"OR" "; as another example, the pattern string may be a signature, with "operators" for signature types common to factory production scenarios: "," recheck person: "," clear field responsible person: "and" QA review: "etc.;
the pattern string may be configured with a symmetrical set of symbols, for example of text type "+|! The following is carried out ", date type" & -signature type is "%";
and selecting a mode character string corresponding to the matching type configuration from the mode matching library, selecting the matching type as ' date type ', setting a corresponding marking symbol as ' and adding a corresponding mode character string. When in use, the to-be-filled items in the file are matched according to the set mode character strings.
S12, scanning the HTML document, namely marking the label, by the pattern recognition scanner, and generating a marking label file and a label record.
HTML is a markup language that includes a series of tags by which document formats can be unified, making discrete network resources connected as a logical whole. The HTML page can provide document online preview service, and after the text template file with a set format is converted into the HTML format, the document online preview service can directly preview in the browser without downloading or opening by means of local office software, so that the text downloading time can be saved, the occupied internal memory of the machine can be reduced, and the text making efficiency can be improved.
And scanning the matched pattern character strings by the pattern recognition scanner for the items to be filled in the HTML file, reading the matching types and the marking symbols of the corresponding pattern character strings after successful recognition and matching, marking the marking symbols before and after the matching content of the pattern character strings, recording corresponding marking information in the recognition tag record table, counting the number of the same matching types in the document, and taking the number as the basis for controlling the matcher to scan, wherein the pattern recognition scanning of the single items to be filled is completed. The pattern recognition scanner repeats the above actions to scan the content of the HTML document until the document is last, and generates an HTML file after pattern recognition processing.
For example, the date type flag is set to "&", and the date type pattern string is configured with "_____ when _____ minutes" and "date: "_____ minutes &" and "& date" appear where the HTML matches the matching content after pattern recognition scanner processing: and (3). In addition, the pattern string and the matching content itself are text description or paraphrasing content. _____ minutes &' _____ identifies that the item to be filled is the hour and minute information that needs to be filled, "& date: the &' indicates that the item to be filled is the information of the year, month and day to be filled;
s13, the control configurator scans the marked HTML document, so that the control corresponds to the label, and the processed document is obtained.
After the pattern recognition scanner is processed, starting a control configurator to start scanning the processed HTML document, when the first conforming mark symbol is recognized, reading a corresponding record in a mark symbol mark record table of a corresponding type, searching the corresponding control type according to the record, calling a corresponding mark position of a control insertion function HTML to add a control, and updating the state in the corresponding mark record to be configured; at this point, the control configuration of the single item to be filled is complete. And the pattern recognition scanner repeats the above actions to scan the HTML document content until the document is last, and generates an HTML file with the control configured.
Finding out a correct representative daily expected filling item according to a marking symbol "&" configured by a date type, and controlling a configurator to insert a date control at a corresponding marking position, wherein a corresponding date control style is provided after the completion of processing; the control configuration can be added on the basis of the HTML file with the control completed, such as whether the control is required to be filled, a date formatting style, a date input mode and the like; the control rule configuration items can be preset or default, and can be configured or changed by a user.
Example 4
Fig. 4 is a flowchart of a control configuration method for automatically entering electronic document data according to embodiment 4 of the present invention. The present embodiment is optimized based on the above embodiment 3, and an automatic or manual processing mode for configuration verification is added based on automatic configuration of the input content of the HTML file. It should be noted that, technical details not described in detail in this embodiment can be found in any of the above-described embodiment 3.
As shown in fig. 4, the embodiment specifically includes the following steps:
s21, configuring a pattern matching library, and constructing a corresponding relation between the pattern library and a control library.
S22, the pattern recognition scanner scans the HTML file labeling label to generate a labeling label file and a label record.
S23, the control configurator scans the marked HTML document to enable the control to correspond to the label.
S24, configuration verification, namely re-marking unprocessed items to be filled, and re-scanning the unconfigured label records; the system provides the document with the positioning information which is manually processed to obtain the configured document.
After the pattern recognition scanner recognizes and controls the configurator to correspond to the recognition control, unidentified content of the to-be-filled item can be matched and marked again through the configuration checking function, and the control is re-corresponding to the pattern character string content marked with the label. The configuration check adopts a fuzzy matching processing form, and marks recognized by the configuration check function are possible to be filled by scanning the to-be-filled items conforming to the content of the pattern character string in the HTML document, marking symbols conforming to the set in the pattern matching library and marking records which are not converted into the configured state in the tag records. The processing procedure also comprises filtering the interference items in the text template file, in particular, the interference items comprise the content of the conforming item to be filled and the content of one of the marked pattern character string identifiers, and the interference items are easy to be identified by mistake in the process of searching the item to be filled, so that the interference items can be filtered. For example, the method includes marking the to-be-filled item which contains the front and back mark symbol (such as date type mark symbol using "&") but does not contain the content of the pattern character string, is not recognized by the pattern recognition scanner, but is not recorded in the tag record table by the control configurator; to improve the accuracy of HTML page conversion. The HTML can be marked and then manually processed.
Example 5
Fig. 5 is a flowchart of incremental learning automatic configuration of an electronic document data entry control based on similarity determination according to an embodiment of the present invention. The embodiment is to optimize on the basis of the embodiment 4, and to add a mode character string increment learning processing mode based on similarity on the basis of automatic configuration of the input content of the HTML file. It should be noted that technical details which are not described in detail in this embodiment can be found in any of embodiment 3 and embodiment 4 described above.
As shown in fig. 5, this embodiment specifically includes the following steps:
s31, configuring a pattern matching library, and constructing a corresponding relation between the pattern library and a control library.
Compared with the configuration mode of the embodiment 3, the configuration mode of the method for matching the pattern recognition scanner increases the parameter setting for supporting the matching similarity of the pattern recognition scanner and the setting for precisely matching or fuzzy matching of the pattern character strings.
The similarity parameter means that the number of matched characters is divided by the number of characters to be filled, for example, the similarity is set to 0.7, which means that when the similarity between the matched content and the content of the pattern character string is greater than 70%, the matching is successful, the corresponding character string to be filled is brought into the pattern matching library, and the content of the matching library is updated.
Setting the accurate matching or fuzzy matching of the pattern character strings, namely, whether the to-be-filled items completely match the set pattern character strings or not can be set, and when the to-be-filled items are set to be the accurate matching, the to-be-filled items need to completely match the pattern character strings; when the fuzzy matching is set, fuzzy recognition can be performed according to the set similarity parameters.
S32, scanning the HTML file labeling label by the pattern recognition scanner.
S33, judging whether the to-be-filled item and the pattern character string are completely matched.
The method comprises the steps that a mode character string which is configured in a comparison mode is scanned by a mode recognition scanner, and after successful recognition and matching, the matching similarity is calculated, and the number of matched characters is divided by the number of characters of the to-be-filled item; when the similarity is equal to 1, a perfect match is indicated. Reading the matching type and the mark symbol of the corresponding pattern character string, marking the mark symbol before and after the matching content of the pattern character string, recording the corresponding marking information in an identification tag record table, and counting the number of the same matching type in the changed document, wherein the number is used as the basis for controlling the matcher to scan, and the pattern identification scanning of the single item to be filled is completed at the moment. The pattern recognition scanner repeats the above actions to scan the content of the HTML document until the document is last, and generates an HTML file after pattern recognition processing.
For example, the signature type flag is set to "%", and the pattern string of the signature type is configured with "operator: "and" recheck person: ", and are all accurate matches, after the pattern recognition scanner processes, only the to-be-filled items completely conforming to the content of the pattern character string appear"% operators: % and "% rechecks: "%". If matching failure of the item to be filled in the HTML original document is the operator, the identification of "%" is not added before and after the content.
S34, calculating the similarity between the to-be-filled item and the pattern character string, and judging whether the matching type is the matched type.
And for the pattern character string set as fuzzy matching in the pattern matching library, when the scanner identifies the to-be-filled item conforming to the content of the pattern character string in the processing process of the HTML document, performing similarity check, calculating the ratio of the number capable of being matched to the to-be-filled item, and comparing with the set fuzzy matching similarity parameter, if the similarity is larger than the set parameter, in the embodiment, the set parameter can be 0.6, performing the next processing to bring the character string corresponding to the to-be-filled item into the pattern matching library, directly inserting the character string into the corresponding mapping table, and correspondingly generating the HTML document mark and the label record. If the similarity is smaller than the set parameter, skipping over the current matching content, and considering that the matching fails and the character string is not matched.
For example, the date type flag is set to "&", and the date type pattern string is configured with "_____ time _____ minutes", "date: "," year and month date "," __________ year ________ and ________ date ________, ________ minutes ", and are all fuzzy matching, the similarity parameter is set to be 0.6, and when the fuzzy matching similarity is greater than 0.6 after being processed by the pattern recognition scanner, such as" date "and" date: when the matching similarity is about 0.67 and is greater than the set 0.6, the matching is successful, the character string date is taken as the content of the to-be-filled item to be included in the pattern matching library, and the pattern recognition scanner can repeat the above actions to scan the HTML document content by using the new pattern matching library when executing next time, so that the character string date can be directly recognized as the completely similar character string.
And if a plurality of items to be filled are scanned in the same document and the similarity is inconsistent, selecting the label and matching with the maximum similarity and incorporating the corresponding character string into a pattern matching library under the condition that the similarity is larger than a set parameter.
If a plurality of possibly matched character strings are scanned in a document, the character strings are sequenced from large to small according to the similarity, and the mode character strings corresponding to the maximum similarity are used for configuration, so that the accuracy of fuzzy matching of the mode recognition scanner is optimized. The schema string is set to "date" and "date: "through the traversal recognition of the matching content to the pattern character strings, the similarity is calculated to be 0.66 and 0.9 respectively, and the pattern character string which is most in line with the to-be-filled content in the pattern matching library is selected, namely, the date with the similarity of 0.9: and marking labels at the corresponding positions of the HTML documents and recording the newly added labels.
S35, the character strings to be filled are brought into a pattern matching library.
And the character strings corresponding to the items to be filled are brought into a pattern matching library, and are directly inserted into a corresponding mapping table, so that the pattern matching library is gradually perfected according to the document content, manual supplementary input is reduced, and a similarity increment learning-based mode is achieved.
S36, generating a labeling label file and a label record.
S37, the control configurator scans the marked HTML document to enable the control to correspond to the label.
S38, configuration verification, namely re-marking unprocessed items to be filled, and re-scanning the unconfigured label records; the system provides the document with the positioning information which is manually processed to obtain the configured document.
The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (3)

1. A control configuration method for automatically inputting electronic document data is characterized by comprising the following steps:
configuring a pattern matching library, and constructing a corresponding relation between the pattern matching library and an input control library;
the mode identification scanner scans the HTML document, automatically identifies the type of the item to be filled and labels the item to be filled, and generates the HTML document and label record after labeling;
the control configurator scans the HTML document marked with the label, automatically selects the control corresponding to the label from the input control library based on the label record, and invokes the selected control to perform control configuration on the item to be filled;
the configuration pattern matching library is used for constructing the corresponding relation between the pattern matching library and the input control library, and the configuration pattern matching library is as follows:
the pattern matching library comprises pattern character strings, identification symbols and control corresponding relations, wherein the pattern character strings comprise general content and content character strings which need to be filled according to actual demands, the content which needs to be filled according to the actual demands is to be filled items, and if actual services or scenes are different, the general content of the pattern character strings and the to-be-filled items are also different;
the pattern character strings comprise date, time, text, numerical value, picture and voice, and each pattern character string is configured with a group of symmetrical symbols; selecting a pattern character string corresponding to the matching type configuration in a pattern matching library, setting a corresponding mark symbol after the matching type is selected, adding the corresponding pattern character string, and matching the to-be-filled item in the HTML document according to the set pattern character string when the HTML document is used;
the pattern recognition scanner scans the HTML document, automatically recognizes the type of the item to be filled and labels the item to be filled, and generates the HTML document and label record after labeling, which means that:
the method comprises the steps that a mode character string which is configured in a comparison mode is scanned by a mode recognition scanner for the items to be filled in an HTML document, after the mode character string is successfully recognized and matched, the matching type and the mark symbol of the corresponding mode character string are read, the mark symbol is marked before and after the matching content of the mode character string is met, corresponding marking information is recorded in a mark symbol marking record table, the number of the same matching type in the HTML document is counted, and the marking information is used as a basis for controlling a matcher to scan, wherein the marking of the label of the single item to be filled is completed;
the pattern recognition scanner repeats the label marking of the single item to be filled, scans the content of the HTML document until the document is last, and generates the HTML document and marking record after marking the labels of all the items to be filled;
the configuration pattern matching library constructs a corresponding relation between the pattern matching library and the input control library, and the configuration pattern matching library further comprises: setting parameters supporting matching similarity of a pattern recognition scanner and setting accurate matching or fuzzy matching of pattern character strings;
the similarity parameter is that the number of the matched characters is divided by the number of the characters to be filled, when the similarity between the matched content and the content of the pattern character string is larger than the set similarity parameter after the similarity parameter is set, the matching is successful, the corresponding character string to be filled is brought into the pattern matching library, and the content of the pattern matching library is updated;
the setting of the pattern character string accurate matching or fuzzy matching indicates whether the to-be-filled item can be set to completely match the set pattern character string, when the to-be-filled item is set to be accurate matching, the to-be-filled item needs to completely match the pattern character string, and when the to-be-filled item is set to be fuzzy matching, fuzzy recognition is carried out according to the set similarity parameter;
the pattern recognition scanner scans the HTML document, automatically recognizes the type of the item to be filled and labels the item to be filled, and when the HTML document and the label record after label marking are generated, the pattern recognition scanner further comprises:
judging whether the to-be-filled item and the pattern character string are completely matched;
calculating the similarity between the to-be-filled item and the pattern character string, and judging whether the to-be-filled item and the pattern character string are of a matched type;
the character strings of the items to be filled are brought into a pattern matching library;
the judging whether the to-be-filled item and the mode character string are completely matched is that: the method comprises the steps that a mode character string which is configured in a comparison mode is scanned by a mode recognition scanner, the matching similarity is calculated after the mode character string is successfully recognized and matched, the number of matched characters is divided by the number of characters of the item to be filled, and when the similarity is equal to 1, the complete matching is indicated;
the step of calculating the similarity between the to-be-filled item and the pattern character string and judging whether the matching type is in accordance with the matching type is that: for the mode character string set as fuzzy matching in the mode matching library, when the mode identification scanner identifies the item to be filled which accords with the content of the mode character string in the processing process of the HTML document, the mode identification scanner performs similarity check, calculates the ratio of the number which can be matched to the item to be filled, further compares the ratio with the set fuzzy matching similarity parameter, if the similarity is larger than the set parameter, performs the next processing to bring the character string corresponding to the item to be filled into the mode matching library, directly inserts the character string into the corresponding mapping table, correspondingly generates HTML document marks and label records, and if the similarity is smaller than the set parameter, skips the current matching content, considers that the matching is failed, and non-target matching character string;
if a plurality of items to be filled are scanned in the same document and the similarity is inconsistent, selecting the label and matching with the maximum similarity and incorporating the corresponding character string into a pattern matching library under the condition that the similarity is larger than a set parameter;
the character strings of the items to be filled are brought into a pattern matching library, which means that: and incorporating the character strings corresponding to the items to be filled into a pattern matching library, directly inserting the character strings into a corresponding mapping table, and automatically perfecting the pattern matching library according to a similarity increment learning mode based on document contents.
2. The control configuration method for automatically inputting electronic document data according to claim 1, wherein the control configurator scans the HTML document labeled with the tag, automatically selects a control corresponding to the tag from the input control library based on the tag record, and invokes the selected control to perform control configuration on the item to be filled, which means that:
after generating HTML documents and mark records after all the mark labels of the items to be filled, starting a control configurator to start scanning the generated HTML documents after all the mark labels of the items to be filled, reading corresponding mark records in a mark record table of a corresponding type when the first conforming mark is identified, searching the corresponding control type according to the mark records, adding controls at the corresponding mark positions of the HTML documents by a calling control insertion function, and updating the state in the corresponding mark records to be configured, wherein the configuration of the controls of the single items to be filled is completed;
and the pattern recognition scanner repeatedly performs control configuration of the single item to be filled, scans the content of the HTML document until the document is last, and generates the HTML document with the control configuration completed.
3. The control configuration method for automatically inputting electronic document data according to claim 1, wherein after the control configuration of all items to be filled in the tagged HTML document is completed, the unidentified items to be filled are re-marked by a configuration check function, the unconfigured tag record is re-scanned, and a positioning information prompt is provided for manual control configuration.
CN202311653676.6A 2023-12-05 2023-12-05 Control configuration method and system for automatically inputting electronic document data Active CN117350249B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311653676.6A CN117350249B (en) 2023-12-05 2023-12-05 Control configuration method and system for automatically inputting electronic document data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311653676.6A CN117350249B (en) 2023-12-05 2023-12-05 Control configuration method and system for automatically inputting electronic document data

Publications (2)

Publication Number Publication Date
CN117350249A CN117350249A (en) 2024-01-05
CN117350249B true CN117350249B (en) 2024-02-09

Family

ID=89365338

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311653676.6A Active CN117350249B (en) 2023-12-05 2023-12-05 Control configuration method and system for automatically inputting electronic document data

Country Status (1)

Country Link
CN (1) CN117350249B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102096588A (en) * 2011-03-24 2011-06-15 南京朗睿软件科技有限公司 Control-containing page establishing method
EP2717178A1 (en) * 2012-10-04 2014-04-09 Tata Consultancy Services Limited Analysis and specification creation for web documents
CN109145260A (en) * 2018-08-24 2019-01-04 北京科技大学 A kind of text information extraction method
CN109840318A (en) * 2019-01-04 2019-06-04 上海上湖信息技术有限公司 A kind of filling method and system of form item
CN112860242A (en) * 2021-03-02 2021-05-28 大连海事大学 Automatic mapping method for interaction data of wheel simulator
CN112949274A (en) * 2021-03-04 2021-06-11 廖凌浩 Document data entry method and system
CN113094617A (en) * 2021-03-30 2021-07-09 厦门立林科技有限公司 Web element positioning method and application and storage medium thereof
CN114861623A (en) * 2022-05-09 2022-08-05 深圳市富途网络科技有限公司 Protocol template generation method and device, electronic equipment and storage medium
CN117113957A (en) * 2023-08-04 2023-11-24 欧冶工业品股份有限公司 Method and system for generating on-line structure digital document template

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102096588A (en) * 2011-03-24 2011-06-15 南京朗睿软件科技有限公司 Control-containing page establishing method
EP2717178A1 (en) * 2012-10-04 2014-04-09 Tata Consultancy Services Limited Analysis and specification creation for web documents
CN109145260A (en) * 2018-08-24 2019-01-04 北京科技大学 A kind of text information extraction method
CN109840318A (en) * 2019-01-04 2019-06-04 上海上湖信息技术有限公司 A kind of filling method and system of form item
CN112860242A (en) * 2021-03-02 2021-05-28 大连海事大学 Automatic mapping method for interaction data of wheel simulator
CN112949274A (en) * 2021-03-04 2021-06-11 廖凌浩 Document data entry method and system
CN113094617A (en) * 2021-03-30 2021-07-09 厦门立林科技有限公司 Web element positioning method and application and storage medium thereof
CN114861623A (en) * 2022-05-09 2022-08-05 深圳市富途网络科技有限公司 Protocol template generation method and device, electronic equipment and storage medium
CN117113957A (en) * 2023-08-04 2023-11-24 欧冶工业品股份有限公司 Method and system for generating on-line structure digital document template

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于B/S的跨平台用户界面可配置算法研究;赵双双 等;《浙江理工大学学报》;第33卷(第05期);399-404 *
深层网查询表单标签识别技术研究;陈丽君;;电脑开发与应用(第02期);66-68 *

Also Published As

Publication number Publication date
CN117350249A (en) 2024-01-05

Similar Documents

Publication Publication Date Title
US6456740B1 (en) System and method for identifying form type in a handwriting recognition based form completion system
CN110232177B (en) Bidding document generation system and method in government field
WO2006002009A2 (en) Document management system with enhanced intelligent document recognition capabilities
CN108776824A (en) Bar-code label generation method, device and electronic equipment
CN108762743A (en) Data table operation code generation method and device
CN111159982A (en) Document editing method and device, electronic equipment and computer readable storage medium
CN117350249B (en) Control configuration method and system for automatically inputting electronic document data
CN110175005A (en) A kind of bar code printing method of combination digital printing technologies
JP2568180B2 (en) Image processing method
CN101635711B (en) Programmable character communication method
CN111399900B (en) API document automatic generation method and system based on python and regular expression
JP6870159B1 (en) Data processing equipment, data processing methods and programs
US20100023517A1 (en) Method and system for extracting data-points from a data file
US6498964B1 (en) Job processing system using job designation sheet
CN113590115A (en) Method and device for automatically generating service system code
US8380690B2 (en) Automating form transcription
CN113779939B (en) Document hot patch generation method, document hot patch application method and document hot patch Ding Zhuangzhi
CN111459904B (en) TPM document automatic management system and method
CN114035726B (en) Method and system for robot flow automatic page element identification process
JPH08202746A (en) Drawing registering method
CN115062252B (en) Method for solving format conflict of webpage generation file when WPS and Word are opened
CN114186549A (en) Docx document service processing and data utilization system and method
JP3938753B2 (en) External character operation method and recording medium storing external character information
CN117010340A (en) Method and device for making mark and storage medium
JP2692196B2 (en) Test data editing method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant