CN116702727A - Form processing method, device, equipment and medium - Google Patents

Form processing method, device, equipment and medium Download PDF

Info

Publication number
CN116702727A
CN116702727A CN202310500574.4A CN202310500574A CN116702727A CN 116702727 A CN116702727 A CN 116702727A CN 202310500574 A CN202310500574 A CN 202310500574A CN 116702727 A CN116702727 A CN 116702727A
Authority
CN
China
Prior art keywords
initial
data
target
header
column data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310500574.4A
Other languages
Chinese (zh)
Inventor
刘子龙
师军港
胡少群
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Baiqiu New Online Commerce Digital Technology Co ltd
Original Assignee
Shanghai Baiqiu New Online Commerce Digital Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Baiqiu New Online Commerce Digital Technology Co ltd filed Critical Shanghai Baiqiu New Online Commerce Digital Technology Co ltd
Priority to CN202310500574.4A priority Critical patent/CN116702727A/en
Publication of CN116702727A publication Critical patent/CN116702727A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/177Editing, e.g. inserting or deleting of tables; using ruled lines
    • G06F40/18Editing, e.g. inserting or deleting of tables; using ruled lines of spreadsheets
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Machine Translation (AREA)

Abstract

The application provides a form processing method, a form processing device, form processing equipment and a form processing medium, wherein the form processing method comprises the following steps: acquiring an initial form; determining a target relationship table from the relationship tables based on the category of the initial table; extracting header data in an initial table to obtain initial header data; performing word segmentation on the initial header data to obtain header data subjected to word segmentation; extracting column data corresponding to the initial header data to obtain initial column data; analyzing the data type of the initial column data to obtain the initial column data type; comparing the header data after word segmentation processing with the initial column data type with a target relation table to obtain a comparison result; under the condition that the comparison result characterizations are matched, taking the matching result with the highest matching degree in the sample header of the target relation table as a target header, and taking the initial column data as target column data corresponding to the target header. According to the scheme, automatic processing of the table data can be realized aiming at the irregular header, and the processing efficiency is improved.

Description

Form processing method, device, equipment and medium
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a method, an apparatus, a device, and a medium for processing a table.
Background
The general presentation form of the platform bill is a data form, and the data in the form usually needs to be manually edited and arranged to be applied to actual business requirements. The traditional flow is that an operator derives a corresponding bill from the background of the platform and then manually processes the bill. With the development of business, manual processing is too time-consuming and labor-consuming, and it is necessary to automatically strip the required contents from the bill through the system.
Most of the existing excel (electronic form software) processing logic derives the positions of the preset table header and the data column through a template, the system accurately recognizes and extracts the data content required by the follow-up through the preset template, the mode can achieve automatic extraction of form data, but the disadvantage exists that the data columns in the imported excel are limited to be placed with the data, the source data derived from the platform cannot be completely matched with the preset template along with various factors such as bill types, bill time and the like, or the source bill needs to be manually resolved into a bill format preset by the system, time and labor are wasted, and the form processing efficiency is reduced.
Disclosure of Invention
The application provides a form processing method, a device, equipment and a medium for overcoming the defects in the prior art, which can realize automatic processing of form data aiming at irregular headers and improve the form processing efficiency.
In order to solve the technical problems, the application provides the following technical scheme:
according to a first aspect of an embodiment of the present application, there is provided a table processing method, including:
acquiring an initial form;
determining a target relationship table from the relationship tables based on the category of the initial table; the relation table is obtained based on a sample table;
extracting header data in the initial table to obtain initial header data;
performing word segmentation on the initial header data to obtain header data subjected to word segmentation;
extracting column data corresponding to the initial header data to obtain initial column data;
analyzing the data type of the initial column data to obtain the initial column data type;
comparing the header data after word segmentation processing with the initial column data type with the target relation table to obtain a comparison result;
and under the condition that the comparison result characterization is matched, taking a matching result with highest matching degree in a sample table head of the target relation table as a target table head, and taking the initial column data as target column data corresponding to the target table head.
Further, the header data after word segmentation and the initial column data type are compared with the target relation table to obtain a comparison result; under the condition that the comparison result characterization is matched, taking a matching result with highest matching degree in a sample table head of the target relation table as a target table head, and taking the initial column data as target column data corresponding to the target table head, wherein the method comprises the following steps:
comparing the header data after word segmentation with the sample header in the target relation table to obtain a first result;
comparing the initial column data type with the data type of the sample column data under the condition that the first result representation is matched, so as to obtain a second result;
and under the condition that the second result representation is matched, taking the matching result with the highest matching degree in the sample header as the target header, and taking the initial column data as the target column data.
Further, the method further comprises:
and under the condition that the first result representation is not matched, repeating the steps of determining a target relation table from the relation table based on the category of the initial table, analyzing the data type of the initial column data to obtain the initial column data type until the first result representation is matched.
Further, the method further comprises:
and repeating the steps of determining a target relation table from the relation table based on the category of the initial table until the data type of the initial column data is analyzed to obtain the initial column data type under the condition that the first result representation is matched but the second result representation is not matched.
Further, the method further comprises:
under the condition that the first result representation is matched, matching degree sequencing is carried out on the matching results in the first result, the matching result with the highest matching degree in the sample header is reserved as the target header, and the other matching results are subjected to unmatching processing; the header data after word segmentation is matched with the word segmentation result of at least one sample header.
According to a second aspect of an embodiment of the present application, there is provided a form processing apparatus, the apparatus including:
the initial form acquisition module is used for acquiring an initial form;
the target relation table determining module is used for determining a target relation table from relation tables based on the category of the initial table; the relation table is obtained based on a sample table;
the initial header acquisition module is used for extracting header data in the initial table to obtain initial header data;
the word segmentation processing module is used for carrying out word segmentation processing on the initial header data to obtain header data subjected to word segmentation processing;
the initial column acquisition module is used for extracting column data corresponding to the initial header data to obtain initial column data;
the initial column analysis module is used for analyzing the data type of the initial column data to obtain the initial column data type;
the comparison module is used for comparing the table head data after word segmentation processing with the initial column data type and the target relation table to obtain a comparison result;
and the target data acquisition module is used for taking a matching result with highest matching degree in a sample table head of the target relation table as a target table head and taking the initial column data as target column data corresponding to the target table head under the condition that the comparison result representation is matched.
According to a third aspect of the embodiment of the present application, there is provided an electronic device, including a processor and a memory, where at least one instruction or at least one program is stored in the memory, where the at least one instruction or the at least one program is loaded and executed by the processor to implement any one of the table processing methods described above.
According to a fourth aspect of embodiments of the present application, there is provided a computer readable storage medium having stored therein at least one instruction or at least one program, the at least one instruction or the at least one program being loaded and executed by a processor to implement any of the above-described table processing methods.
By adopting the technical scheme, the application has the following beneficial effects:
according to the table processing method, the device, the equipment and the medium, word segmentation is carried out on initial table header data in an initial table, data type analysis is carried out on the initial column data, the obtained table header data subjected to word segmentation and the initial column data type are compared with a target relation table, under the condition that comparison result characterization is matched, a matching result with highest matching degree is found out from a sample table header of the target relation table to serve as a target table header, and the initial column data is taken as target column data; according to the technical scheme, automatic processing of the table data can be realized aiming at the irregular header, the table processing efficiency is improved, the problem that a source bill needs to be manually analyzed into a bill format preset by a system is avoided, and time and labor are wasted.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flow chart of a table processing method according to an embodiment of the present application;
fig. 2 is a schematic diagram of an actual application of a table processing method according to an embodiment of the present application;
FIG. 3 is a block diagram of a table processing device according to an embodiment of the present application;
fig. 4 is a block diagram of a hardware structure of an electronic device running a table processing method according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. It will be apparent that the described embodiments are only some, but not all, embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
Reference herein to "one embodiment" or "an embodiment" means that the device may be included in
The particular features, structures, or characteristics of at least one implementation of the present application. In the description of the embodiments of the present application, it should be understood that the directions or positional relationships indicated by the terms "upper", "lower", "top", "bottom", etc. are based on the directions or positional relationships shown in the drawings, are merely for convenience in describing the present application and simplifying the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present application. Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may include one or more of the feature, either explicitly or implicitly. Moreover, the terms "first," "second," and the like, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein.
Referring to fig. 1, a flow chart of a table processing method according to an embodiment of the application is shown, where the table processing method includes:
step S101: acquiring an initial form;
step S102: determining a target relationship table from the relationship tables based on the category of the initial table; the relation table is obtained based on a sample table;
step S103: extracting header data in the initial table to obtain initial header data;
step S104: performing word segmentation on the initial header data to obtain header data subjected to word segmentation;
step S105: extracting column data corresponding to the initial header data to obtain initial column data;
step S106: analyzing the data type of the initial column data to obtain the initial column data type;
step S107: comparing the header data after word segmentation processing with the initial column data type with the target relation table to obtain a comparison result;
step S108: and under the condition that the comparison result characterization is matched, taking a matching result with highest matching degree in a sample table head of the target relation table as a target table head, and taking the initial column data as target column data corresponding to the target table head.
In a specific embodiment, an initial table is obtained through step S101; the initial table may be a table in which bill source data is located; through step S102, a target relation table matched with the initial table type is determined from the relation tables; the specific steps may include: comparing the type information carried in the initial table with the type information carried in each relation table, and determining the relation table matched with the type information carried in the initial table as a target relation table; the relation table is determined based on a sample table, and the sample table is determined based on the sorted historical bill; the specific steps may include: inputting the sample form into a first neural network for training, and repeatedly debugging first neural network parameters according to a training result until the training result meets first preset conditions, wherein the parameters comprise departments, businesses and the like, the first preset conditions are set according to the corresponding first neural network parameters, the first neural network parameters meet preset standards, and the preset standards can be obtained through modes of industry standards, manual setting and the like; step S103, extracting header data in an initial table, and performing de-duplication processing to obtain initial header data; step S104, word segmentation processing is carried out on the initial header data; the specific steps may include: determining a word segmentation library based on the sample table, and matching the initial header data with the word segmentation library data, wherein the matched header data is the header data after word segmentation processing; alternatively, the natural language processing library jieba of python (a computer programming language) is used to segment the list, and because there are many bill nouns in the bill, but there are no dictionaries for Chinese segmentation in the jieba library, some bill nouns need to be added to the jieba dictionary in a custom manner. The custom dictionary is imported, so that word segmentation is more accurate. Custom special character filtering does not participate in word segmentation storage. Such as: ,.,. The method comprises the steps of carrying out a first treatment on the surface of the ":, }); extracting initial column data through step S105; step S106, analyzing the data type of the initial column data to obtain the initial column data type; specifically, the method comprises the following steps: firstly, obtaining the data type of initial column data according to the initial column data, then determining a column data type base based on a sample table, and matching the data type of the initial column data with the column data type base, wherein the matched data type is the initial column data type; optionally, if the data type is integer type or floating point type, marking the symbol (+ -) is carried out on the contents, and if the data type is character type, marking the data length range is carried out on the contents, so that the initial list data and the corresponding initial header data form a corresponding relation table; step S107 is performed to compare the header data after word segmentation processing with the initial column data type and the target relation table, and a comparison result is obtained; because the Chinese word segmentation is in a form of combining a plurality of word groups, a plurality of matchable data can be found when the header is matched; step S108, under the condition that the comparison result characterization is matched, taking a matching result with highest matching degree in a sample table head of the target relation table as a target table head, and taking the initial column data as target column data corresponding to the target table head; the specific step of obtaining the matching weight may include: inputting the sample form into a second neural network for training, and repeatedly debugging parameters of the second neural network according to the training result until the training result meets a second preset condition; or, performing fault-tolerant training self-learning on the initial header data type and the matched data type to obtain a matched weight; for example, the header receding number word segmentation results in receding, number. The relation table can be matched with two word-dividing results of payable quantity, namely passenger withdrawal quantity, passenger withdrawal quantity and payable quantity. In order to improve accuracy of header recognition, keyword weight setting is firstly carried out on the segmented words in the relation table, wherein keywords in the ratio of the payable number to the payable number are payable numbers, and keywords in the ratio of the payable number to the payable number are payable numbers. And then calculating corresponding matching weights through an algorithm of keyword weights (matched phrase/total phrase number), and sequencing the matching weights by taking the head of the highest weight as an alternative column. If the payable quantity is that the word segmentation result in the passenger drop quantity proportion is [ 'payable quantity', 'passenger drop', 'quantity', 'proportion' ] the matching weight of the passenger drop quantity word in the passenger drop quantity proportion is (2/4) ×1=50%; the word segmentation result in the ratio of the number of the receding to the paid quantity is that the matching weight of the word segmentation of the number of the receding in the ratio of the number of the receding to the paid quantity is (2/4) 200=100%. At this time, matching of the payable number to the payable number ratio is released, and the payable number to the payable number ratio is stored as an alternative column. And acquiring corresponding data content of the table head, judging the type, the length and the range of the data content, comparing the data content with corresponding type marks in the corresponding table, successfully writing the comparison into the column of the new sheet table, and deleting the comparison failure from the alternative columns.
In an alternative embodiment, the steps S107 to S108 may include:
comparing the header data after word segmentation with the sample header in the target relation table to obtain a first result;
comparing the initial column data type with the data type of the sample column data under the condition that the first result representation is matched, so as to obtain a second result;
and under the condition that the second result representation is matched, taking the matching result with the highest matching degree in the sample header as the target header, and taking the initial column data as the target column data.
In an alternative embodiment, the method may further include:
and under the condition that the first result representation is not matched, repeating the steps of determining a target relation table from the relation table based on the category of the initial table, analyzing the data type of the initial column data to obtain the initial column data type until the first result representation is matched.
In an alternative embodiment, the method may further include:
and repeating the steps of determining a target relation table from the relation table based on the category of the initial table until the data type of the initial column data is analyzed to obtain the initial column data type under the condition that the first result representation is matched but the second result representation is not matched.
In an alternative embodiment, the method may further include:
under the condition that the first result representation is matched, matching degree sequencing is carried out on the matching results in the first result, the matching result with the highest matching degree in the sample header is reserved as the target header, and the other matching results are subjected to unmatching processing; the header data after word segmentation is matched with the word segmentation result of at least one sample header.
The execution code of the mismatch cancellation process is as follows:
fig. 2 is a schematic diagram of an actual application of a table processing method according to an embodiment of the application; in a specific application scenario, the method may include:
step S201: importing bill source data;
step S202: sequentially judging the bill category, performing header word segmentation, extracting the data content of the corresponding list of the header, and performing data type analysis;
step S203: searching a corresponding header in the corresponding bill type in the relation table by word segmentation of the header;
step S204: judging whether a corresponding header exists or not; if yes, go to step S205, if no, jump back to step S202;
step S205: comparing the data type in the relation table with the data type in the source data;
step S206: judging whether a corresponding data type exists or not; if yes, go to step S207, if no, jump back to step S202;
step S207: the column data in the source data is extracted and a new data table is formed from the columns in the relational table.
As can be seen from the above technical solutions of the embodiments of the present application, in the embodiments of the present application, by performing word segmentation on initial header data in an initial table, performing data type analysis on the initial column data, comparing the obtained word-segmented header data with an initial column data type with a target relationship table, and under the condition that comparison result features are matched, finding out a matching result with the highest matching degree from a sample header of the target relationship table, and using the initial column data as target column data; according to the technical scheme, automatic processing of the table data can be realized aiming at the irregular header, the table processing efficiency is improved, the problem that a source bill needs to be manually analyzed into a bill format preset by a system is avoided, and time and labor are wasted.
Corresponding to the table processing method provided in the foregoing embodiment, the embodiment of the present application further provides a table processing device, and since the table processing device provided in the embodiment of the present application corresponds to the table processing method provided in the foregoing embodiment, implementation of the foregoing table processing method is also applicable to the table processing device provided in the present embodiment, and will not be described in detail in the present embodiment.
Referring to fig. 3, a block diagram of a table processing device according to an embodiment of the application is shown; the device comprises:
001: the initial form acquisition module is used for acquiring an initial form;
002: the target relation table determining module is used for determining a target relation table from relation tables based on the category of the initial table; the relation table is obtained based on a sample table;
003: the initial header acquisition module is used for extracting header data in the initial table to obtain initial header data;
004: the word segmentation processing module is used for carrying out word segmentation processing on the initial header data to obtain header data subjected to word segmentation processing;
005: the initial column acquisition module is used for extracting column data corresponding to the initial header data to obtain initial column data;
006: the initial column analysis module is used for analyzing the data type of the initial column data to obtain the initial column data type;
007: the comparison module is used for comparing the table head data after word segmentation processing with the initial column data type and the target relation table to obtain a comparison result;
008: and the target data acquisition module is used for taking a matching result with highest matching degree in a sample table head of the target relation table as a target table head and taking the initial column data as target column data corresponding to the target table head under the condition that the comparison result representation is matched.
In a specific embodiment, an initial form is acquired by an initial form acquisition module; the initial table may be a table in which bill source data is located; determining a target relation table matched with the initial table type from the relation table through a target relation table determining module; the specific steps may include: comparing the type information carried in the initial table with the type information carried in each relation table, and determining the relation table matched with the type information carried in the initial table as a target relation table; the relation table is determined based on a sample table, and the sample table is determined based on the sorted historical bill; the specific steps may include: inputting the sample form into a first neural network for training, and repeatedly debugging first neural network parameters according to a training result until the training result meets first preset conditions, wherein the parameters comprise departments, businesses and the like, the first preset conditions are set according to the corresponding first neural network parameters, the first neural network parameters meet preset standards, and the preset standards can be obtained through modes of industry standards, manual setting and the like; extracting header data in an initial table through an initial header acquisition module, and obtaining initial header data through de-duplication processing; performing word segmentation on the initial header data through a word segmentation processing module; the specific steps may include: determining a word segmentation library based on the sample table, and matching the initial header data with the word segmentation library data, wherein the matched header data is the header data after word segmentation processing; alternatively, the natural language processing library jieba of python (a computer programming language) is used to segment the list, and because there are many bill nouns in the bill, but there are no dictionaries for Chinese segmentation in the jieba library, some bill nouns need to be added to the jieba dictionary in a custom manner. The custom dictionary is imported, so that word segmentation is more accurate. Custom special character filtering does not participate in word segmentation storage. Such as: ,.,. The method comprises the steps of carrying out a first treatment on the surface of the ":, }); extracting initial column data through an initial column acquisition module; analyzing the data type of the initial column data through an initial column analysis module to obtain the initial column data type; specifically, the method comprises the following steps: firstly, obtaining the data type of initial column data according to the initial column data, then determining a column data type base based on a sample table, and matching the data type of the initial column data with the column data type base, wherein the matched data type is the initial column data type; optionally, if the data type is integer type or floating point type, marking the symbol (+ -) is carried out on the contents, and if the data type is character type, marking the data length range is carried out on the contents, so that the initial list data and the corresponding initial header data form a corresponding relation table; comparing the table head data after word segmentation processing with the initial column data type and the target relation table through a comparison module to obtain a comparison result; because the Chinese word segmentation is in a form of combining a plurality of word groups, a plurality of matchable data can be found when the header is matched; under the condition that the comparison result characterization is matched, a matching result with highest matching degree in a sample table head of the target relation table is taken as a target table head, and the initial column data is taken as target column data corresponding to the target table head; the specific step of obtaining the matching weight may include: inputting the sample form into a second neural network for training, and repeatedly debugging parameters of the second neural network according to the training result until the training result meets a second preset condition; or, performing fault-tolerant training self-learning on the initial header data type and the matched data type to obtain a matched weight; for example, the header receding number word segmentation results in receding, number. The relation table can be matched with two word-dividing results of payable quantity, namely passenger withdrawal quantity, passenger withdrawal quantity and payable quantity. In order to improve accuracy of header recognition, keyword weight setting is firstly carried out on the segmented words in the relation table, wherein keywords in the ratio of the payable number to the payable number are payable numbers, and keywords in the ratio of the payable number to the payable number are payable numbers. And then calculating corresponding matching weights through an algorithm of keyword weights (matched phrase/total phrase number), and sequencing the matching weights by taking the head of the highest weight as an alternative column. If the payable quantity is that the word segmentation result in the passenger drop quantity proportion is [ 'payable quantity', 'passenger drop', 'quantity', 'proportion' ] the matching weight of the passenger drop quantity word in the passenger drop quantity proportion is (2/4) ×1=50%; the word segmentation result in the ratio of the number of the receding to the paid quantity is that the matching weight of the word segmentation of the number of the receding in the ratio of the number of the receding to the paid quantity is (2/4) 200=100%. At this time, matching of the payable number to the payable number ratio is released, and the payable number to the payable number ratio is stored as an alternative column. And acquiring corresponding data content of the table head, judging the type, the length and the range of the data content, comparing the data content with corresponding type marks in the corresponding table, successfully writing the comparison into the column of the new sheet table, and deleting the comparison failure from the alternative columns.
In an alternative embodiment, the apparatus may further include:
the first result acquisition module is used for comparing the header data after word segmentation with the sample header in the target relation table to obtain a first result;
the second result acquisition module is used for comparing the initial column data type with the data type of the sample column data under the condition that the first result representation is matched, so as to obtain a second result;
and the target result acquisition module is used for taking the matching result with the highest matching degree in the sample header as the target header and taking the initial column data as the target column data under the condition that the second result representation is matched.
In an alternative embodiment, the apparatus may further include:
and the first repeating module is used for repeating the steps of determining a target relation table from the relation table and analyzing the data type of the initial column data to obtain the initial column data type until the first result representation is matched under the condition that the first result representation is not matched.
In an alternative embodiment, the apparatus may further include:
and the second repeating module is used for repeating the steps of determining a target relation table from the relation table based on the category of the initial table and analyzing the data type of the initial column data to obtain the initial column data type until the second result representation is matched under the condition that the first result representation is matched but the second result representation is not matched.
In an alternative embodiment, the apparatus may further include:
the matching degree screening module is used for sorting the matching degree of the matching results in the first result under the condition that the first result representation is matched, reserving the matching result with the highest matching degree in the sample header as the target header, and carrying out unmatching treatment on other matching results; the header data after word segmentation is matched with the word segmentation result of at least one sample header.
The execution code of the mismatch cancellation process is as follows:
/>
it should be noted that, in the apparatus provided in the foregoing embodiment, when implementing the functions thereof, only the division of the foregoing functional modules is used as an example, in practical application, the foregoing functional allocation may be implemented by different functional modules, that is, the internal structure of the device is divided into different functional modules, so as to implement all or part of the functions described above. In addition, the apparatus and the method embodiments provided in the foregoing embodiments belong to the same concept, and specific implementation processes of the apparatus and the method embodiments are detailed in the method embodiments and are not repeated herein.
According to the table processing device, word segmentation is carried out on initial table header data in an initial table, data type analysis is carried out on initial column data, then the obtained table header data subjected to word segmentation is compared with an initial column data type and a target relation table, under the condition that comparison result characteristics are matched, a matching result with highest matching degree is found out from sample table headers of the target relation table to serve as target table headers, and the initial column data are used as target column data; according to the technical scheme, automatic processing of the table data can be realized aiming at the irregular header, the table processing efficiency is improved, the problem that a source bill needs to be manually analyzed into a bill format preset by a system is avoided, and time and labor are wasted.
The embodiment of the application also provides electronic equipment, which comprises a processor and a memory, wherein at least one instruction or at least one section of program is stored in the memory, and the at least one instruction or the at least one section of program is loaded and executed by the processor to realize the table processing method provided by the embodiment of the method.
The memory may be used to store software programs and modules that the processor executes to perform various functional applications and to implement advanced autopilot by running the software programs and modules stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, application programs required for functions, and the like; the storage data area may store data created according to the use of the device, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory may also include a memory controller to provide access to the memory by the processor.
The method embodiments provided by the embodiments of the present application may be performed in a computer terminal, a server, or a similar computing device, i.e., the electronic device may include a computer terminal, a server, or a similar computing device. Fig. 4 is a block diagram of a hardware structure of an electronic device for running a table processing method according to an embodiment of the present application, and as shown in fig. 4, an internal structure of the electronic device may include, but is not limited to: processor, network interface and memory. The processor, the network interface, and the memory in the electronic device may be connected by a bus or other means, and in fig. 4 shown in the embodiment of the present disclosure, the connection by the bus is an example.
Among them, a processor (or CPU (Central Processing Unit, central processing unit)) is a computing core and a control core of an electronic device. The network interface may optionally include a standard wired interface, a wireless interface (e.g., WI-FI, mobile communication interface, etc.). A Memory (Memory) is a Memory device in an electronic device for storing programs and data. It will be appreciated that the memory herein may be a high speed RAM memory device or a non-volatile memory device, such as at least one magnetic disk memory device; optionally, at least one memory device located remotely from the processor. The memory provides a storage space that stores an operating system of the electronic device, which may include, but is not limited to: windows (an operating system), linux (an operating system), android (an Android, a mobile operating system) system, IOS (a mobile operating system) system, etc., the application is not limited in this regard; also stored in the memory space are one or more instructions, which may be one or more computer programs (including program code), adapted to be loaded and executed by the processor. In the embodiment of the present disclosure, the processor loads and executes one or more instructions stored in the memory to implement the table processing method provided in the above method embodiment.
The embodiment of the application also provides a computer readable storage medium, and at least one instruction or at least one section of program is stored in the storage medium, and the at least one instruction or the at least one section of program is loaded and executed by a processor to realize the table processing method provided by the method embodiment.
Alternatively, in the present embodiment, the storage medium may include, but is not limited to: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.
It should be noted that: the sequence of the embodiments of the present application is only for description, and does not represent the advantages and disadvantages of the embodiments. And the foregoing description has been directed to specific embodiments of this specification. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multi-sample image classification and parallel processing are also possible or may be advantageous.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments in part.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The foregoing is only illustrative of the present application and is not to be construed as limiting thereof, but rather as various modifications, equivalent arrangements, improvements, etc., within the spirit and principles of the present application.

Claims (8)

1. A form processing method, comprising:
acquiring an initial form;
determining a target relationship table from the relationship tables based on the category of the initial table; the relation table is obtained based on a sample table;
extracting header data in the initial table to obtain initial header data;
performing word segmentation on the initial header data to obtain header data subjected to word segmentation;
extracting column data corresponding to the initial header data to obtain initial column data;
analyzing the data type of the initial column data to obtain the initial column data type;
comparing the header data after word segmentation processing with the initial column data type with the target relation table to obtain a comparison result;
and under the condition that the comparison result characterization is matched, taking a matching result with highest matching degree in a sample table head of the target relation table as a target table head, and taking the initial column data as target column data corresponding to the target table head.
2. The method for processing a table according to claim 1, wherein the header data after the word segmentation and the initial column data type are compared with the target relation table to obtain a comparison result; under the condition that the comparison result characterization is matched, taking a matching result with highest matching degree in a sample table head of the target relation table as a target table head, and taking the initial column data as target column data corresponding to the target table head, wherein the method comprises the following steps:
comparing the header data after word segmentation with the sample header in the target relation table to obtain a first result;
comparing the initial column data type with the data type of the sample column data under the condition that the first result representation is matched, so as to obtain a second result;
and under the condition that the second result representation is matched, taking the matching result with the highest matching degree in the sample header as the target header, and taking the initial column data as the target column data.
3. The form processing method according to claim 2, characterized in that the method further comprises:
and under the condition that the first result representation is not matched, repeating the steps of determining a target relation table from the relation table based on the category of the initial table, analyzing the data type of the initial column data to obtain the initial column data type until the first result representation is matched.
4. A form processing method according to claim 3, characterized in that the method further comprises:
and repeating the steps of determining a target relation table from the relation table based on the category of the initial table until the data type of the initial column data is analyzed to obtain the initial column data type under the condition that the first result representation is matched but the second result representation is not matched.
5. The form processing method according to any one of claims 2 to 4, characterized in that the method further comprises:
under the condition that the first result representation is matched, matching degree sequencing is carried out on the matching results in the first result, the matching result with the highest matching degree in the sample header is reserved as the target header, and the other matching results are subjected to unmatching processing; the header data after word segmentation is matched with the word segmentation result of at least one sample header.
6. A form processing apparatus, the apparatus comprising:
the initial form acquisition module is used for acquiring an initial form;
the target relation table determining module is used for determining a target relation table from relation tables based on the category of the initial table; the relation table is obtained based on a sample table;
the initial header acquisition module is used for extracting header data in the initial table to obtain initial header data;
the word segmentation processing module is used for carrying out word segmentation processing on the initial header data to obtain header data subjected to word segmentation processing;
the initial column acquisition module is used for extracting column data corresponding to the initial header data to obtain initial column data;
the initial column analysis module is used for analyzing the data type of the initial column data to obtain the initial column data type;
the comparison module is used for comparing the table head data after word segmentation processing with the initial column data type and the target relation table to obtain a comparison result;
and the target data acquisition module is used for taking a matching result with highest matching degree in a sample table head of the target relation table as a target table head and taking the initial column data as target column data corresponding to the target table head under the condition that the comparison result representation is matched.
7. An electronic device comprising a processor and a memory, wherein the memory stores at least one instruction or at least one program, the at least one instruction or the at least one program being loaded and executed by the processor to implement a form processing method according to any one of claims 1 to 5.
8. A computer readable storage medium having stored therein at least one instruction or at least one program loaded and executed by a processor to implement the form processing method of any one of claims 1 to 5.
CN202310500574.4A 2023-05-06 2023-05-06 Form processing method, device, equipment and medium Pending CN116702727A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310500574.4A CN116702727A (en) 2023-05-06 2023-05-06 Form processing method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310500574.4A CN116702727A (en) 2023-05-06 2023-05-06 Form processing method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN116702727A true CN116702727A (en) 2023-09-05

Family

ID=87836414

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310500574.4A Pending CN116702727A (en) 2023-05-06 2023-05-06 Form processing method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN116702727A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117057329A (en) * 2023-10-13 2023-11-14 赞塔(杭州)科技有限公司 Table data processing method and device and computing equipment
CN117252183A (en) * 2023-10-07 2023-12-19 之江实验室 Semantic-based multi-source table automatic matching method, device and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117252183A (en) * 2023-10-07 2023-12-19 之江实验室 Semantic-based multi-source table automatic matching method, device and storage medium
CN117252183B (en) * 2023-10-07 2024-04-02 之江实验室 Semantic-based multi-source table automatic matching method, device and storage medium
CN117057329A (en) * 2023-10-13 2023-11-14 赞塔(杭州)科技有限公司 Table data processing method and device and computing equipment
CN117057329B (en) * 2023-10-13 2024-01-26 赞塔(杭州)科技有限公司 Table data processing method and device and computing equipment

Similar Documents

Publication Publication Date Title
CN116702727A (en) Form processing method, device, equipment and medium
Purandare et al. Word sense discrimination by clustering contexts in vector and similarity spaces
CN110162627A (en) Data increment method, apparatus, computer equipment and storage medium
CN110597988A (en) Text classification method, device, equipment and storage medium
CN109446885B (en) Text-based component identification method, system, device and storage medium
CN112446351B (en) Intelligent identification method for medical bills
US20080059498A1 (en) System and method for document section segmentation
CN105653517A (en) Recognition rate determining method and apparatus
CN109389109B (en) Automatic testing method and device for OCR full-text recognition accuracy
CN110503143B (en) Threshold selection method, device, storage medium and device based on intention recognition
CN112163553B (en) Material price accounting method, device, storage medium and computer equipment
CN107273883B (en) Decision tree model training method, and method and device for determining data attributes in OCR (optical character recognition) result
CN112151014A (en) Method, device and equipment for evaluating voice recognition result and storage medium
CN111177332A (en) Method and device for automatically extracting referee document case-related mark and referee result
CN109284504A (en) It grinds to call the score using the security of deep learning model and analyses method and device
Hocking et al. Optical character recognition for South African languages
CN115984859A (en) Image character recognition method and device and storage medium
CN113553853B (en) Named entity recognition method and device, computer equipment and storage medium
CN112989040B (en) Dialogue text labeling method and device, electronic equipment and storage medium
CN111931018B (en) Test question matching and splitting method and device and computer storage medium
CN111144114B (en) Text recognition method and device
CN108255887B (en) Method and device for verifying industry text
Zheng et al. Chinese/English mixed character segmentation as semantic segmentation
CN113434760B (en) Construction method recommendation method, device, equipment and storage medium
CN113342931B (en) Big data based user demand analysis method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination