CN111027285A - Method and system for automatically extracting order information from pdf format order - Google Patents
Method and system for automatically extracting order information from pdf format order Download PDFInfo
- Publication number
- CN111027285A CN111027285A CN201911297269.XA CN201911297269A CN111027285A CN 111027285 A CN111027285 A CN 111027285A CN 201911297269 A CN201911297269 A CN 201911297269A CN 111027285 A CN111027285 A CN 111027285A
- Authority
- CN
- China
- Prior art keywords
- order
- information
- file
- page
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/258—Data format conversion from or to a database
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Document Processing Apparatus (AREA)
Abstract
The invention relates to the technical field of pdf document editing technology and text regular processing, and discloses a method for automatically extracting order information from an order file in a pdf format, which comprises the following steps: analyzing a customer order file in a pdf format; combining the paging block information sorted according to the character string positions into a plain text file line by line; capturing and extracting the key information of the order from the combined plain text file. The method and the system for automatically extracting the order information from the pdf format order can ensure that a foreign trade company operator can automatically extract and import the order detailed information in the pdf file format sent by the client into the company database through programming when importing the order information in the pdf format file of the client into the foreign trade information management database of the company, thereby greatly improving the working efficiency of the operator when using the foreign trade information management system, saving a large amount of time and greatly improving the user experience.
Description
Technical Field
The invention relates to the technical field of pdf document editing technology and text regular processing, in particular to a method and a system for automatically extracting order information from a pdf format order.
Background
pdf format files are a file format widely used internationally, and because the contents of the files cannot be edited at will, and the files have strong versatility and high standardization degree, pdf format files are generally used by overseas clients in the field of foreign trade in enterprise trading activities, especially to transmit important information such as sending orders. In a foreign trade company or an enterprise, a salesman usually needs to input order detailed information in a pdf format sent by a client through an Email attachment into a foreign trade information management system of the company, and due to the particularity of a format structure of a pdf file, in the past, the order detailed information is usually manually input and extracted by a keyboard page by page, field by field, and time, labor and error are easy to make. Therefore, aiming at the structural characteristics of the pdf file format and combining the information characteristics of the order file, a set of method and system capable of automatically extracting and importing the order detailed information in the pdf file format sent by the client into the database of the company are comprehensively developed, so that the working efficiency of an operator when using the foreign trade information management system can be greatly improved, a large amount of time is saved, and the user experience is greatly improved.
Disclosure of Invention
Technical problem to be solved
The invention provides a method and a system for automatically extracting order information from a pdf format order, which can realize the automatic extraction of order key information from a pdf format order file sent by a client and automatically import the order key information into a database in foreign trade information management of the company, can obviously improve the working efficiency of foreign trade operators and improve the user experience, and solve the problems that the past link usually adopts a manual mode to input and extract by a keyboard page by page and field by field, thereby wasting time and labor and being easy to make mistakes due to the particularity of the format structure of the pdf file.
(II) technical scheme
The invention provides the following technical scheme, a method for automatically extracting order information from an order file in pdf format, which comprises the following steps:
s1, analyzing the client order file in pdf format to obtain paging block information sorted according to character string positions;
s2, combining the paging block information sorted according to the character string positions into a plain text file line by line;
and S3, adopting regular expression programming according to the characteristics of pdf file information in the customer order, and capturing and extracting the key information of the order from the combined plain text file.
Preferably, the parsing the pdf formatted customer order file in step S1 includes the following steps:
s101, analyzing pdf customer order files page by page, and searching Tj or TJ labels from the pdf customer order files;
s102, acquiring character string contents and position information thereof from Tj or TJ labels;
s103, analyzing the pdf customer order file page by page, searching l or re labels from the pdf customer order file, and acquiring position information of drawing lines or drawing rectangles;
s104, integrating the position range of the table block in the order file according to the positions of the plurality of drawn lines or the plurality of rectangles;
and S105, comparing and judging whether the character strings obtained from the Tj or the TJ label belong to the character strings in the table or not according to the position range of the table block, and dividing the character strings in each page into two types, wherein one type belongs to the table block and the other type does not belong to the table block.
Preferably, the merging into the plain text file in step S2 includes the following steps:
s201, dividing the character strings which do not belong to the table block in each page into blocks according to positions and sequencing the character strings line by line;
s202, integrating the character strings belonging to the table blocks in each page into a table expressed by a plurality of rows of character strings according to the shape of rows and columns of the table, and calculating the initial row position of the table;
s203, inserting the whole table into a non-table character string in rows according to the row position sequence in each page, and combining to form a plain text page;
and S204, merging and outputting each plain text page into a plain text file according to the page sequence.
Preferably, each row information in the table expressed by a plurality of row character strings is expressed by a row character string, and column information of the table is expressed by column fixed length and column spacer in the row character string.
Preferably, the pdf file information in the customer order is classified into the following two types in step S3:
non-table order key information, adopting a regular expression corresponding to the order key information format to program according to the format characteristics of a client for describing the order information in a pdf file, and capturing and extracting corresponding order key information from a combined plain text file;
and the form order key information is programmed by adopting a regular expression corresponding to the form information format according to the format characteristics of the order information described by the client in the pdf file, and the corresponding order key information in the form is captured and extracted from the combined plain text file.
A system for automatically extracting order information from an order document in pdf format, comprising:
the analysis module is used for analyzing the client order file in the pdf format to obtain paging blocking information which is ordered according to the position of the character string;
the merging module is used for merging the paging block information sequenced according to the character string positions into a plain text file line by line;
and the capturing module is used for capturing and extracting order key information from the combined plain text file by adopting regular expression programming according to the characteristics of pdf file information in the customer order.
Preferably, the parsing module includes:
the character analysis module is used for analyzing the pdf files page by page, searching Tj or TJ tags from the pdf files, and acquiring character string contents and position information of the character string contents from the Tj or TJ tags;
the drawing analysis module is used for analyzing the pdf file page by page, searching l or re labels from the pdf file, acquiring the position information of drawing lines or drawing rectangles, and synthesizing the position range of the table blocks in the order file according to the positions of a plurality of drawing lines or a plurality of rectangles;
and the table analysis module is used for comparing and judging whether the character strings obtained from the Tj or TJ label belong to the character strings in the table according to the position range of the table block, and dividing the character strings in each page into two types, wherein one type belongs to the table block, and the other type does not belong to the table block.
Preferably, the merging module includes:
a single page merging module for blocking and sequencing the character strings not belonging to the table block in each page according to the position as claimed in claim 7, row by row; synthesizing the character strings belonging to the table block as described in claim 7 in each page into a table expressed by a plurality of rows of character strings according to the shape of the rows and columns of the table, and calculating the initial row position of the table; inserting the whole table into a non-table character string in rows according to the row position sequence in each page, and combining to form a plain text page;
and the multi-page merging module is used for merging and outputting each plain text page into a plain text file according to the page sequence.
Preferably, the capturing module includes:
the non-form capturing module is used for capturing and extracting corresponding order key information from the combined plain text file by adopting a regular expression program corresponding to the order key information format according to the format characteristics of the order information described by the client in the pdf file;
and the form capturing module is used for capturing and extracting the corresponding order key information in the form from the combined plain text file by adopting a regular expression program corresponding to the form information format according to the format characteristics of the order information described by the client in the pdf file.
(III) advantageous effects
The invention has the following beneficial effects:
the method and the system for automatically extracting the order information from the pdf format order can change the manual mode of inputting the order information in the pdf format file of the client into the foreign trade information management database of the company by a foreign trade company clerk, and can automatically extract and import the order detailed information in the pdf file format sent by the client into the company database by programming when the foreign trade company clerk imports the order information in the pdf format file of the client into the foreign trade information management database of the company, thereby greatly improving the working efficiency of the clerk when using the foreign trade information management system, saving a large amount of time and greatly improving the user experience.
Drawings
FIG. 1 is a flow chart of a method of the present invention;
FIG. 2 is a schematic diagram of the system of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1-2, a method for automatically extracting order information from an order file in pdf format includes the following steps:
s1, analyzing the client order file in pdf format to obtain paging block information sorted according to character string positions;
s2, combining the paging block information sorted according to the character string positions into a plain text file line by line;
and S3, adopting regular expression programming according to the characteristics of pdf file information in the customer order, and capturing and extracting the key information of the order from the combined plain text file.
In step S3, if the Order Number non-table Order key information in the customer Order is to be extracted, the following regular expression (1) may be used for programming:
Order Number:\s+?(\d{7}) (1)
in the regular expression above:
"Order Number" means that the extracted Order information string must contain an identifier of "Order Number" in front of it;
"\ s +" means more than 1 invisible character, such as a space;
"? "means that more than 1 invisible character may or may not be present;
"\ d {7 }" means a character string consisting of 7-digit numbers;
"()" means that a character string that meets the condition description in small brackets is captured and used as a return value.
According to the regular expression (1), the character strings which can match the condition in the merged plain text file are the return values obtained after the capturing and the extracting of the regular expression (1).
In step S3, if the key information of the table type order in the extracted order page is to be captured, the following regular expression (2) may be used for programming:
(\w+)\t(\w+)\t([\d\/]+)\t(\d+)\t(\w+)\t(\w+)\t([\d\/]+)\t(\d+)\t(\w+)\t(\w+)\t([\d\/]+)\t(\d+) (2)
in the regular expression above:
"\ w" means a text character;
"\ t" means tab editor;
"\ d" means a numeric character;
"+" means more than 1;
"[ ]" is intended to include the characters described by the conditions in parentheses;
"/" means slash character "/".
In this embodiment, the parsing the pdf formatted customer order file in step S1 includes the following steps:
s101, analyzing pdf customer order files page by page, and searching Tj or TJ labels from the pdf customer order files;
s102, acquiring character string contents and position information thereof from Tj or TJ labels;
s103, analyzing the pdf customer order file page by page, searching l or re labels from the pdf customer order file, and acquiring position information of drawing lines or drawing rectangles;
s104, integrating the position range of the table block in the order file according to the positions of the plurality of drawn lines or the plurality of rectangles;
and S105, comparing and judging whether the character strings obtained from the Tj or the TJ label belong to the character strings in the table or not according to the position range of the table block, and dividing the character strings in each page into two types, wherein one type belongs to the table block and the other type does not belong to the table block.
In this technical solution, the merging into the plain text file in step S2 includes the following steps:
s201, dividing the character strings which do not belong to the table block in each page into blocks according to positions and sequencing the character strings line by line;
s202, integrating the character strings belonging to the table blocks in each page into a table expressed by a plurality of rows of character strings according to the shape of rows and columns of the table, and calculating the initial row position of the table;
s203, inserting the whole table into a non-table character string in rows according to the row position sequence in each page, and combining to form a plain text page;
and S204, merging and outputting each plain text page into a plain text file according to the page sequence.
In the technical solution, each row of information in the table expressed by a plurality of rows of character strings is expressed by a row of character strings, and column information of the table is expressed by column fixed length and column space characters in the row of character strings.
In this technical solution, the pdf files in the customer order are classified into the following two types according to the characteristics of the pdf file information in the customer order in step S3: non-table order key information, adopting a regular expression corresponding to the order key information format to program according to the format characteristics of a client for describing the order information in a pdf file, and capturing and extracting corresponding order key information from a combined plain text file;
and the form order key information is programmed by adopting a regular expression corresponding to the form information format according to the format characteristics of the order information described by the client in the pdf file, and the corresponding order key information in the form is captured and extracted from the combined plain text file.
A system for automatically extracting order information from an order document in pdf format, comprising:
the analysis module 10 is used for analyzing the client order file in pdf format to obtain paging block information sorted according to the character string position;
a merging module 20, configured to merge the paging block information sorted according to the string position into a plain text file line by line;
and the capturing module 30 is configured to capture and extract order key information from the combined plain text file by using a regular expression programming according to characteristics of pdf file information in the customer order.
In this technical solution, the parsing module 10 includes:
the character analysis module 101 is configured to analyze the pdf files page by page, search for Tj or Tj tags from the pdf files, and acquire character string contents and position information thereof from the Tj or Tj tags;
the drawing analysis module 102 is configured to analyze the pdf file page by page, search for an l or re label from the pdf file, obtain position information of a drawing line or a drawing rectangle, and synthesize a position range of a table block in the order file according to positions of a plurality of drawing lines or a plurality of rectangles;
and the table analysis module 103 is configured to compare and judge whether the character string obtained from the Tj or Tj tag belongs to a character string in a table according to the position range of the table block, and accordingly divide the character string in each page into two types, one type belongs to the table block, and the other type does not belong to the table block.
In this technical solution, the merging module 20 includes:
a single page merging module 201, configured to chunk the character strings in each page, which do not belong to the table block, according to the position and sort the character strings row by row, as described in claim 7; synthesizing the character strings belonging to the table block as described in claim 7 in each page into a table expressed by a plurality of rows of character strings according to the shape of the rows and columns of the table, and calculating the initial row position of the table; inserting the whole table into a non-table character string in rows according to the row position sequence in each page, and combining to form a plain text page;
and a multi-page merging module 202, configured to merge and output each plaintext page into a plaintext file according to the page order.
In this embodiment, the capturing module 30 includes:
a non-table capturing module 301, configured to capture and extract corresponding order key information from the combined plain text file by programming a regular expression corresponding to the order key information format according to format characteristics of the order information described in the pdf file by the client;
the form capturing module 302 is configured to capture and extract the order key information corresponding to the form from the combined plain text file by programming the order key information in the form class by using a regular expression corresponding to the form information format according to the format characteristics of the order information described in the pdf file by the client.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.
Claims (9)
1. A method for automatically extracting order information from an order document in pdf format, comprising the steps of:
s1, analyzing the client order file in pdf format to obtain paging block information sorted according to character string positions;
s2, combining the paging block information sorted according to the character string positions into a plain text file line by line;
and S3, adopting regular expression programming according to the characteristics of pdf file information in the customer order, and capturing and extracting the key information of the order from the combined plain text file.
2. The method for automatically retrieving order information from a pdf formatted order file as recited in claim 1, wherein said step of parsing said pdf formatted customer order file at step S1 comprises the steps of:
s101, analyzing pdf customer order files page by page, and searching Tj or TJ labels from the pdf customer order files;
s102, acquiring character string contents and position information thereof from Tj or TJ labels;
s103, analyzing the pdf customer order file page by page, searching l or re labels from the pdf customer order file, and acquiring position information of drawing lines or drawing rectangles;
s104, integrating the position range of the table block in the order file according to the positions of the plurality of drawn lines or the plurality of rectangles;
and S105, comparing and judging whether the character strings obtained from the Tj or the TJ label belong to the character strings in the table or not according to the position range of the table block, and dividing the character strings in each page into two types, wherein one type belongs to the table block and the other type does not belong to the table block.
3. The method for automatically extracting order information from an order file with pdf format as claimed in claim 1, wherein said step of merging into a plain text file at step S2 comprises the steps of:
s201, dividing the character strings which do not belong to the table block in each page into blocks according to positions and sequencing the character strings line by line;
s202, integrating the character strings belonging to the table blocks in each page into a table expressed by a plurality of rows of character strings according to the shape of rows and columns of the table, and calculating the initial row position of the table;
s203, inserting the whole table into a non-table character string in rows according to the row position sequence in each page, and combining to form a plain text page;
and S204, merging and outputting each plain text page into a plain text file according to the page sequence.
4. The method of automatically retrieving order information from an order file in pdf format according to claim 3, wherein said step of: each row information in the table expressed by a plurality of row character strings is expressed by a row character string, and column information of the table is expressed by column fixed length and column spacing characters in the row character string.
5. The method for automatically extracting order information from a pdf formatted order file as recited in claim 1, wherein said step S3 is divided into the following two categories according to the characteristics of the pdf file information in the customer order:
non-table order key information, adopting a regular expression corresponding to the order key information format to program according to the format characteristics of a client for describing the order information in a pdf file, and capturing and extracting corresponding order key information from a combined plain text file;
and the form order key information is programmed by adopting a regular expression corresponding to the form information format according to the format characteristics of the order information described by the client in the pdf file, and the corresponding order key information in the form is captured and extracted from the combined plain text file.
6. A system for automatically extracting order information from an order document in pdf format, comprising:
the analysis module 10 is used for analyzing the client order file in pdf format to obtain paging block information sorted according to the character string position;
a merging module 20, configured to merge the paging block information sorted according to the string position into a plain text file line by line;
and the capturing module 30 is configured to capture and extract order key information from the combined plain text file by using a regular expression programming according to characteristics of pdf file information in the customer order.
7. The system for automatically retrieving order information from an order file in pdf format according to claim 6, wherein said parsing module 10 comprises:
the character analysis module 101 is configured to analyze the pdf files page by page, search for Tj or Tj tags from the pdf files, and acquire character string contents and position information thereof from the Tj or Tj tags;
the drawing analysis module 102 is configured to analyze the pdf file page by page, search for an l or re label from the pdf file, obtain position information of a drawing line or a drawing rectangle, and synthesize a position range of a table block in the order file according to positions of a plurality of drawing lines or a plurality of rectangles;
and the table analysis module 103 is configured to compare and judge whether the character string obtained from the Tj or Tj tag belongs to a character string in a table according to the position range of the table block, and accordingly divide the character string in each page into two types, one type belongs to the table block, and the other type does not belong to the table block.
8. The system for automatically retrieving order information from an order file in pdf format according to claim 6, wherein said merge module 20 comprises:
a single page merging module 201, configured to chunk the character strings in each page, which do not belong to the table block, according to the position and sort the character strings row by row, as described in claim 7; synthesizing the character strings belonging to the table block as described in claim 7 in each page into a table expressed by a plurality of rows of character strings according to the shape of the rows and columns of the table, and calculating the initial row position of the table; inserting the whole table into a non-table character string in rows according to the row position sequence in each page, and combining to form a plain text page;
and a multi-page merging module 202, configured to merge and output each plaintext page into a plaintext file according to the page order.
9. The system for automatically retrieving order information from an order file in pdf format according to claim 6, wherein said capture module 30 comprises:
a non-table capturing module 301, configured to capture and extract corresponding order key information from the combined plain text file by programming a regular expression corresponding to the order key information format according to format characteristics of the order information described in the pdf file by the client;
the form capturing module 302 is configured to capture and extract the order key information corresponding to the form from the combined plain text file by programming the order key information in the form class by using a regular expression corresponding to the form information format according to the format characteristics of the order information described in the pdf file by the client.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911297269.XA CN111027285B (en) | 2019-12-17 | 2019-12-17 | Method and system for automatically extracting order information from pdf format order |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911297269.XA CN111027285B (en) | 2019-12-17 | 2019-12-17 | Method and system for automatically extracting order information from pdf format order |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111027285A true CN111027285A (en) | 2020-04-17 |
CN111027285B CN111027285B (en) | 2023-06-16 |
Family
ID=70209589
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911297269.XA Active CN111027285B (en) | 2019-12-17 | 2019-12-17 | Method and system for automatically extracting order information from pdf format order |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111027285B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022007593A1 (en) * | 2020-07-10 | 2022-01-13 | 苏宁易购集团股份有限公司 | E-commerce platform shop-opening method and apparatus based on robot process automation |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102081736A (en) * | 2009-11-27 | 2011-06-01 | 株式会社理光 | Equipment and method for extracting enclosing rectangles of characters from portable electronic documents |
US20130066663A1 (en) * | 2011-09-12 | 2013-03-14 | Doco Labs, Lcc | Telecom Profitability Management |
CN103530574A (en) * | 2013-09-23 | 2014-01-22 | 中山大学 | Method for inserting and extracting hidden information based on English PDF document |
US20160055376A1 (en) * | 2014-06-21 | 2016-02-25 | iQG DBA iQGATEWAY LLC | Method and system for identification and extraction of data from structured documents |
CN106331354A (en) * | 2016-08-26 | 2017-01-11 | 商客通尚景科技(上海)股份有限公司 | Short message information extracting and analyzing method |
CN108595402A (en) * | 2018-04-28 | 2018-09-28 | 西安极数宝数据服务有限公司 | A kind of system of extraction PDF form datas |
CN109062874A (en) * | 2018-06-12 | 2018-12-21 | 平安科技(深圳)有限公司 | Acquisition methods, terminal device and the medium of financial data |
CN109614596A (en) * | 2018-12-13 | 2019-04-12 | 税友软件集团股份有限公司 | A kind of electronic note processing method, device and system |
US20190122043A1 (en) * | 2017-10-23 | 2019-04-25 | Education & Career Compass | Electronic document processing |
CN110516048A (en) * | 2019-09-02 | 2019-11-29 | 苏州朗动网络科技有限公司 | The extracting method, equipment and storage medium of list data in pdf document |
-
2019
- 2019-12-17 CN CN201911297269.XA patent/CN111027285B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102081736A (en) * | 2009-11-27 | 2011-06-01 | 株式会社理光 | Equipment and method for extracting enclosing rectangles of characters from portable electronic documents |
US20130066663A1 (en) * | 2011-09-12 | 2013-03-14 | Doco Labs, Lcc | Telecom Profitability Management |
CN103530574A (en) * | 2013-09-23 | 2014-01-22 | 中山大学 | Method for inserting and extracting hidden information based on English PDF document |
US20160055376A1 (en) * | 2014-06-21 | 2016-02-25 | iQG DBA iQGATEWAY LLC | Method and system for identification and extraction of data from structured documents |
CN106331354A (en) * | 2016-08-26 | 2017-01-11 | 商客通尚景科技(上海)股份有限公司 | Short message information extracting and analyzing method |
US20190122043A1 (en) * | 2017-10-23 | 2019-04-25 | Education & Career Compass | Electronic document processing |
CN108595402A (en) * | 2018-04-28 | 2018-09-28 | 西安极数宝数据服务有限公司 | A kind of system of extraction PDF form datas |
CN109062874A (en) * | 2018-06-12 | 2018-12-21 | 平安科技(深圳)有限公司 | Acquisition methods, terminal device and the medium of financial data |
CN109614596A (en) * | 2018-12-13 | 2019-04-12 | 税友软件集团股份有限公司 | A kind of electronic note processing method, device and system |
CN110516048A (en) * | 2019-09-02 | 2019-11-29 | 苏州朗动网络科技有限公司 | The extracting method, equipment and storage medium of list data in pdf document |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022007593A1 (en) * | 2020-07-10 | 2022-01-13 | 苏宁易购集团股份有限公司 | E-commerce platform shop-opening method and apparatus based on robot process automation |
Also Published As
Publication number | Publication date |
---|---|
CN111027285B (en) | 2023-06-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109669933B (en) | Transaction data intelligent processing method and device and computer readable storage medium | |
CN107729526B (en) | Text structuring method | |
CN106709032A (en) | Method and device for extracting structured information from spreadsheet document | |
CN108984593A (en) | The method that multi-format text keeps off typing and compares | |
CN110309132B (en) | Quota standardization method for engineering approximate calculation table | |
US7058623B2 (en) | Computer automated system for management of engineering drawings | |
CN110209643A (en) | A kind of data processing method and device | |
CN110728453B (en) | Policy automatic matching analysis system based on big data | |
CN105976302A (en) | Configurable data comparing method and system | |
CN110909123A (en) | Data extraction method and device, terminal equipment and storage medium | |
CN111027285B (en) | Method and system for automatically extracting order information from pdf format order | |
CN107844960B (en) | Investment analysis tool for automatically and intelligently analyzing business plan | |
CN114065719A (en) | Document processing method and device, electronic equipment and computer readable storage medium | |
CN112596851A (en) | Multi-source heterogeneous data batch extraction method and analysis method of simulation platform | |
CN111966640A (en) | Document file identification method and system | |
CN109063063B (en) | Data processing method and device based on multi-source data | |
CN116796707A (en) | Document multi-format data filling and modularized automatic generation method | |
CN107193788A (en) | Construction industry engineering project Excel file data format storage method and system | |
CN114417820A (en) | Content filtering method for target object | |
CN107562462B (en) | Method for constructing complex system project by multiple persons in parallel | |
CN111010331A (en) | E-mail monitoring and summarizing method, system, terminal and storage medium | |
CN106484835A (en) | Approving electronic document handling method and device | |
CN118095794B (en) | Work order information extraction method and system based on regular algorithm | |
CN117520421B (en) | Express sorting method and device, electronic equipment and storage medium | |
CN117973334B (en) | Automatic identification importing method based on file form |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |