CN115168714B - Web API data extraction method and device - Google Patents

Web API data extraction method and device Download PDF

Info

Publication number
CN115168714B
CN115168714B CN202210803680.5A CN202210803680A CN115168714B CN 115168714 B CN115168714 B CN 115168714B CN 202210803680 A CN202210803680 A CN 202210803680A CN 115168714 B CN115168714 B CN 115168714B
Authority
CN
China
Prior art keywords
record
request
web api
template
field
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210803680.5A
Other languages
Chinese (zh)
Other versions
CN115168714A (en
Inventor
朱立宁
张成成
洪志远
吴政
戴昭鑫
丁康乐
杨霄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chinese Academy of Surveying and Mapping
Original Assignee
Chinese Academy of Surveying and Mapping
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chinese Academy of Surveying and Mapping filed Critical Chinese Academy of Surveying and Mapping
Priority to CN202210803680.5A priority Critical patent/CN115168714B/en
Publication of CN115168714A publication Critical patent/CN115168714A/en
Application granted granted Critical
Publication of CN115168714B publication Critical patent/CN115168714B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a Web API data extraction method and a device, wherein the method comprises the steps of S1, sending a Web API request and returning a request result of the Web API; s2, setting a proper record extraction rule, and extracting records from a request result of the Web API; s3, setting a proper field extraction rule, and extracting a field from the extraction record; s4, reconstructing the extracted fields into a structured record; s5, storing the structured record. The advantages are that: by using regular expressions, the method of extracting records and then extracting fields can be used for extracting needed data from massive, multi-source and heterogeneous Web APIs efficiently and conveniently. In addition, when the returned result of the Web API is structurally changed, the change can be rapidly handled, and the maintenance cost is effectively reduced.

Description

Web API data extraction method and device
Technical Field
The present invention relates to the field of information processing technologies, and in particular, to a method and an apparatus for extracting Web API data.
Background
The Web API (Web Application Programming Interface) is an interface that allows access through a network or the Internet, intended to disclose data and functions to public users, authorized users, or enterprise users. Under the pushing of an API economic mode, government departments and more enterprises issue data on the Internet in the form of Web APIs, and a large number of Web APIs, such as APIs for issuing air quality in real time, APIs for issuing weather forecast in real time and the like, are emerging. The massive Web API return information not only contains valuable information, but also has little redundant information, is favored by students, and is an important data source for constructing space-time big data platforms, urban brains, data warehouses, data lakes and the like.
The data extraction is to extract the needed information from the text and perform structuring treatment to form the organization form like a table. The extraction of the Web API data refers to extracting and structuring required information from the character string type data returned by the Web API.
The extraction of massive Web API data has important significance for constructing urban big data. However, web APIs come from a variety of industries, resulting in a wide variety of data structures. Data extraction is carried out on massive, multi-source and heterogeneous Web APIs, and a method which is efficient and convenient is not available at present except for the traditional time-consuming and labor-consuming manual programming method.
Disclosure of Invention
The invention aims to provide a Web API data extraction method and device, so as to solve the problems in the prior art.
In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:
a Web API data extraction method comprises the following steps,
s1, sending a Web API request and returning a request result of the Web API;
s2, setting a proper record extraction rule, and extracting records from a request result of the Web API;
s3, setting a proper field extraction rule, and extracting a field from the extraction record;
s4, reconstructing the extracted fields into a structured record;
s5, storing the structured record.
Preferably, in step S1, a request protocol, a request mode, a request address and a request parameter for accessing the Web API are set, a Web API request is sent according to the set information, and a request result of the Web API is returned.
Preferably, when the request result of the Web API returned in step S1 is encrypted, there is also a space between step S1 and step S2,
and decrypting the request result of the encrypted Web API.
Preferably, step S2 is specifically to select a suitable template from the extracted and recorded regular expression template set according to the request result of the Web API; setting keywords required by a template, and acquiring a regular expression of an extraction record; and extracting the record from the request result of the Web API by using the regular expression for extracting the record.
Preferably, the regular expression template set of the extraction records comprises,
a1, a json object template; all information applicable to a record stored in the returned result in json object format, i.e. a record is contained within brackets { }; the keyword of the json object template is the name of the first field of the json object;
a2, xml does not contain a child node template with the same name; all information suitable for recording is stored in the returned result in xml format, namely, all information of a record is contained in a pair of brackets or a self-closing bracket, and no child node with the same name as the recording node exists in the child nodes of the xml node; the keyword of the xml node template without the homonym is the node name of the xml object;
a3, xml containing a homonymous child node template; all information suitable for recording is stored in the returned result in the xml format, namely, all information of a record is contained in a pair of angle brackets, and a child node which has the same name as the recording node is not arranged in the child nodes of the xml node; the key words of the xml child node templates with the same names are the node names of the xml objects and the names of the last child nodes;
a4, positioning templates front and back; adapted to the case where the record in the returned result has a fixed context; the keywords of the front and rear positioning templates are fixed upper information and fixed lower information;
a5, all content templates; the method is suitable for the condition that the returned result contains at most one record; the whole content template has no keyword information;
a6, separating Fu Moban; the method is suitable for the situation that a plurality of records are divided by using fixed segmenters in the returned result; the keywords of the separator template are separators among records;
a7, customizing a template; the regular expression of the extraction record is written according to the actual situation, and the method is suitable for the situation that all the six templates are not suitable.
Preferably, step S3 is specifically to set a field to be extracted, and select a suitable template from a regular expression template set for extracting the field according to the field; setting keywords required by a template, and acquiring a regular expression of an extraction field; fields are extracted from the extraction record using a regular expression that extracts the fields.
Preferably, the regular expression template set of extraction fields comprises,
b1, json template; the method is suitable for the situation of dividing the field name and the field value by a colon; the key word of the json template is a field name used in the record by a field;
b2, an xml template; the fields are adapted to be stored in the record in xml format, i.e. a pair of brackets are indicated; the key word of the xml template is an xml node name used in the record by a field;
b3, positioning templates front and back; adapted to the case where a field has a fixed context in the record; the keywords of the front and rear positioning templates are fixed upper information and fixed lower information;
b4, partition Fu Moban; the method is suitable for the situation that a plurality of fields are divided by using fixed segmenters in the record; the key words of the separator template are the separator information used among a plurality of fields;
b5, customizing a template; the regular expression for extracting the fields is written according to the actual situation, and the method is suitable for the situation that all the four templates are not suitable.
Preferably, the data extraction method further includes storing configuration information of step S1 to step S5, where the configuration information includes a request protocol, a request manner, a request address and a request parameter for accessing the Web API, and extracting a regular expression of a record, a field to be extracted, a regular expression of a field to be extracted, and information required for storing the structured record.
Preferably, the data extraction method further includes executing the Web API data extraction task at regular time according to the content update frequency of the Web API.
The invention also aims to provide a Web API data extraction device which is used for realizing the data extraction method of any one of the above, and comprises,
a request sending module; the method comprises the steps of sending a Web API request and returning a request result of the Web API;
returning to the content module; the method is used for displaying the request result of the Web API and providing reference for the subsequent operation of the user;
extracting a recording module; the method comprises the steps of setting a proper record extraction rule, and extracting records from a request result of a Web API;
extracting a field module; for setting appropriate field extraction rules, extracting fields from the extraction record;
a result preview module; for reorganizing the extracted fields into a structured record;
a result storage module; for storing the structured record;
the data extraction device also comprises a decryption module and/or a configuration information storage module and/or a timing execution module;
a decryption module; decrypting the request result of the encrypted Web API;
a configuration information storage module; the method comprises the steps of storing user configuration information, including a request protocol, a request mode, a request address and a request parameter for accessing a Web API, extracting a regular expression of a record, a field needing to be extracted, extracting the regular expression of the field, and storing information required by the structured record;
a timing execution module; and the Web API data extraction task is executed at fixed time according to the content updating frequency of the Web API.
The beneficial effects of the invention are as follows: the method and the device utilize the regular expression and adopt a mode of extracting the record and then extracting the field, so that the required data can be extracted from massive, multi-source and heterogeneous Web APIs efficiently and conveniently. In addition, when the returned result of the Web API is structurally changed, the change can be rapidly handled, and the maintenance cost is effectively reduced.
Drawings
FIG. 1 is a flow chart of a Web API data extraction method in an embodiment of the invention;
FIG. 2 is a flow chart of extracting records from a Web API request result in an embodiment of the invention;
FIG. 3 is a flow chart of extracting fields from an extraction record according to an embodiment of the invention;
fig. 4 is a schematic structural diagram of a Web API data extracting apparatus according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the detailed description is presented by way of example only and is not intended to limit the invention.
Example 1
As shown in fig. 1, in this embodiment, there is provided a Web API data extraction method, including the steps of,
s1, sending a Web API request and returning a request result of the Web API;
s2, setting a proper record extraction rule, and extracting records from a request result of the Web API;
s3, setting a proper field extraction rule, and extracting a field from the extraction record;
s4, reconstructing the extracted fields into a structured record;
s5, storing the structured record.
In this embodiment, step S1 specifically includes setting a request protocol, a request mode, a request address and a request parameter for accessing the Web API, sending a Web API request according to the set information, and returning a request result of the Web API.
When the data extraction method is executed, firstly, request information of Web API (application program interface) is set, wherein the request information comprises information such as used request protocols (such as HTTP (hyper text transport protocol), SOAP (simple object access protocol) and the like), used request modes (such as GET (generic object access technology), POST (generic object access technology) and the like), used request address information, used request parameters and the like; and then, sending a Web API request, and acquiring a return result of the Web API request.
In actual application, if the request result of the Web API returned in step S1 is encrypted, the encrypted request result of the Web API needs to be decrypted before step S2 is executed.
As shown in fig. 2, in this embodiment, step S2 specifically includes,
s21, selecting a proper template from the extracted and recorded regular expression template set according to the request result of the Web API;
s22, setting keywords required by a template, and acquiring a regular expression of an extraction record;
s23, extracting records from the request results of the Web API by using the regular expression for extracting the records.
The regular expression template set of the extraction record comprises seven types of templates, a json object template, an xml non-homonymous child node template, an xml homonymous child node template, a front-back positioning template, an overall content template, a separator template and a custom template, wherein the application condition and keyword information of each template are as follows:
(1) json object templates; all information applicable to a record stored in the returned result in json object format, i.e. a record is contained within brackets { }; one keyword of the json object template is the name of the first field of the json object;
(2) xml does not contain a child node template with the same name; all information suitable for recording is stored in the returned result in xml format, namely, all information of a record is contained in a pair of brackets or a self-closing bracket, and no child node with the same name as the recording node exists in the child nodes of the xml node; one keyword of the xml without the homonym child node template is the node name of the xml object;
(3) xml contains the homonymous child node template; all information suitable for recording is stored in the returned result in the xml format, namely, all information of a record is contained in a pair of angle brackets, and a child node which has the same name as the recording node is not arranged in the child nodes of the xml node; two keywords of the xml containing the same name child node template are the node name of the xml object and the name of the last child node;
(4) Positioning templates front and back; adapted to the case where the record in the returned result has a fixed context; two keywords of the front and rear positioning templates are fixed upper information and fixed lower information;
(5) A full content template; the method is suitable for the condition that the returned result contains at most one record; the whole content template has no keyword information;
(6) A partition Fu Moban; the method is suitable for the situation that a plurality of records are divided by using fixed segmenters (such as commas, semicolons, vertical lines and the like) in the returned results; one keyword of the separator template is a separator between records;
(7) Customizing a template; the regular expression of the extraction record is written according to the actual situation, and the method is suitable for the situation that all the six templates are not suitable.
As shown in fig. 3, in this embodiment, step S3 specifically includes,
s31, setting a field to be extracted;
s32, selecting a proper template from the regular expression template set of the extracted fields according to the fields;
s33, setting keywords required by a template, and acquiring a regular expression of an extraction field;
s34, extracting the fields from the extraction record by using the regular expression of the extraction fields.
The regular expression template set of the extraction field comprises five types of templates, a json template, an xml template, a front-back positioning template, a separator template and a custom template. The applicable situation of each template and keyword information are as follows:
(1) json template; the method is suitable for the situation of dividing the field name and the field value by a colon; one keyword of the json template is a field name used by a field in the record;
(2) An xml template; the fields are adapted to be stored in the record in xml format, i.e. a pair of brackets are indicated; one keyword of the xml template is an xml node name used by a field in the record;
(3) Positioning templates front and back; adapted to the case where a field has a fixed context in the record; two keywords of the front and rear positioning templates are fixed upper information and fixed lower information;
(4) A partition Fu Moban; the method is applicable to the situation that a plurality of fields are divided by using fixed segmenters (such as commas, semicolons, vertical lines and the like) in the record; one keyword of the separator template is the separator information used among a plurality of fields;
(5) Customizing a template; the regular expression for extracting the fields is written according to the actual situation, and the method is suitable for the situation that all the four templates are not suitable.
In this embodiment, during actual application, the data extraction method further includes storing configuration information of steps S1 to S5, where the configuration information specifically includes a request protocol, a request manner, a request address and a request parameter for accessing the Web API, and extracting a regular expression of a record, a field to be extracted, a regular expression of a field to be extracted, and information required for storing the structured record.
In this embodiment, in actual application, the data extraction method further includes executing the Web API data extraction task at regular time according to the content update frequency of the Web API.
In the data extraction method, the regular expression is used, and a mode of extracting records and then extracting fields is adopted, so that needed data can be efficiently and conveniently extracted from massive, multi-source and heterogeneous Web APIs. In addition, when the returned result of the Web API is structurally changed, the change can be rapidly handled, and the maintenance cost is effectively reduced.
Example two
As shown in fig. 4, in the present embodiment, there is provided a Web API data extracting apparatus for implementing the above-described data extracting method, the data extracting apparatus including,
a send request module 401; the method is used for sending the Web API request and returning the request result of the Web API. The specific working process of the sending request module 401 is as follows: setting a request protocol, a request mode, a request address and a request parameter for accessing the Web API, sending a Web API request according to the set information, and returning a request result of the Web API
Returning to the content module 402; the method is used for displaying the request result of the Web API and providing reference for the subsequent operation of the user;
an extraction recording module 403; for setting appropriate record extraction rules, and extracting records from the request results of the Web API. The specific working process of the extraction recording module 403 is as follows: selecting a proper template from a regular expression template set of the extraction records according to the request result of the Web API, setting keywords required by the template, obtaining the regular expression of the extraction records, and extracting the records from the request result of the Web API by using the regular expression of the extraction records. Wherein a set of recorded regular expression templates is extracted, see embodiment one.
An extract field module 404; for setting appropriate field extraction rules, extracting fields from the extraction record. The extraction field module 404 specifically works as follows: setting a field to be extracted, selecting a proper template from a regular expression template set of the extracted field according to the field, setting keywords required by the template, acquiring a regular expression of the extracted field, and extracting the field from an extraction record by using the regular expression of the extracted field. Wherein a regular expression template set of fields is extracted, see embodiment one.
A result preview module 405; for reorganizing the extracted fields into a structured record;
a result saving module 406; for storing the structured record.
In this embodiment, the data extraction device may set a decryption module and/or a configuration information storage module and/or a timing execution module according to actual requirements; wherein,
a decryption module 407; decrypting the request result of the encrypted Web API;
a configuration information storage module 408; the device is used for storing user configuration information (the user configuration information is information needed in the process, needs to be stored separately and serves a timing execution module 409), and comprises information set by a user in a sending request module 401, an extraction recording module 403, an extraction field module 404, a result storage module 406 and a decryption module 407;
a timing execution module 409; and the Web API data extraction task is executed at fixed time according to the content updating frequency of the Web API.
In this embodiment, the data extraction device uses a regular expression to extract the record first and then extract the field, so that the required data can be extracted from the massive, multi-source and heterogeneous Web APIs efficiently and conveniently. In addition, when the returned result of the Web API is structurally changed, the change can be rapidly handled, and the maintenance cost is effectively reduced.
By adopting the technical scheme disclosed by the invention, the following beneficial effects are obtained:
the invention provides a method and a device for extracting Web API data, which utilize a regular expression and adopt a mode of extracting records and then extracting fields, so that required data can be extracted from massive, multi-source and heterogeneous Web APIs efficiently and conveniently. In addition, when the returned result of the Web API is structurally changed, the change can be rapidly handled, and the maintenance cost is effectively reduced.
The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which is also intended to be covered by the present invention.

Claims (2)

1. A Web API data extraction method is characterized in that: comprises the following steps of the method,
s1, sending a Web API request and returning a request result of the Web API; step S1, a request protocol, a request mode, a request address and a request parameter for accessing the Web API are set, a Web API request is sent according to the set information, and a request result of the Web API is returned;
when the request result of the Web API returned in step S1 is encrypted, there is also a space between step S1 and step S2,
decrypting the request result of the encrypted Web API;
s2, setting a proper record extraction rule, and extracting records from a request result of the Web API; step S2, selecting a proper template from the extracted and recorded regular expression template set according to the request result of the Web API; setting keywords required by a template, and acquiring a regular expression of an extraction record; extracting records from the request results of the Web API by using the regular expressions of the extracted records;
the regular expression template set of extraction records includes,
a1, a json object template; all information applicable to a record stored in the returned result in json object format, i.e. a record is contained within brackets { }; one keyword of the json object template is the name of the first field of the json object;
a2, xml does not contain a child node template with the same name; all information suitable for recording is stored in the returned result in xml format, namely, all information of a record is contained in a pair of brackets or a self-closing bracket, and no child node with the same name as the recording node exists in the child nodes of the xml node; one keyword of the xml without the homonym child node template is the node name of the xml object;
a3, xml containing a homonymous child node template; all information suitable for recording is stored in the returned result in the form of xml, namely a record is contained in a pair of angle brackets, and the child node with the same name as the recording node is arranged in the child nodes of the xml node; two keywords of the xml containing the same name child node template are the node name of the xml object and the name of the last child node;
a4, positioning templates front and back; adapted to the case where the record in the returned result has a fixed context; two keywords of the front and rear positioning templates are fixed upper information and fixed lower information;
a5, all content templates; the method is suitable for the condition that the returned result contains at most one record; the whole content template has no keyword information;
a6, separating Fu Moban; the method is suitable for the situation that a plurality of records are divided by using fixed segmenters in the returned result; one keyword of the separator template is used as a separator between records;
a7, customizing a template; the regular expression of the extraction record is written according to the actual situation, and the method is suitable for the situation that all the six templates are not applicable;
s3, setting a proper field extraction rule, and extracting a field from the extraction record; step S3, setting a field to be extracted, and selecting a proper template from a regular expression template set for extracting the field according to the field; setting keywords required by a template, and acquiring a regular expression of an extraction field; extracting the fields from the extraction record using a regular expression of the extraction fields;
the regular expression template set of extraction fields includes,
b1, json template; the method is suitable for the situation of dividing the field name and the field value by a colon; one keyword of the json template is a field name used in the record for a field;
b2, an xml template; the fields are adapted to be stored in the record in xml format, i.e. a pair of brackets are indicated; one keyword of the xml template is an xml node name used in the record for a field;
b3, positioning templates front and back; adapted to the case where a field has a fixed context in the record; two keywords of the front and rear positioning templates are fixed upper information and fixed lower information;
b4, partition Fu Moban; the method is suitable for the situation that a plurality of fields are divided by using fixed segmenters in the record; one keyword of the separator template is used as the separator information used among a plurality of fields;
b5, customizing a template; namely, writing a regular expression of the extracted field according to the actual situation, and being applicable to the situation that all the four templates are not applicable;
s4, reconstructing the extracted fields into a structured record;
s5, storing the structured record;
the data extraction method further comprises the step of storing configuration information of the step S1 to the step S5, wherein the configuration information comprises a request protocol, a request mode, a request address and a request parameter for accessing the Web API, a regular expression of an extraction record, a field to be extracted, a regular expression of an extraction field and information required by storing the structured record;
the data extraction method further comprises the step of executing the Web API data extraction task at regular time according to the content update frequency of the Web API.
2. A Web API data extraction apparatus, characterized by: the data extraction device is used for realizing the data extraction method as set forth in claim 1, and comprises,
a request sending module; the method comprises the steps of sending a Web API request and returning a request result of the Web API;
returning to the content module; the method is used for displaying the request result of the Web API and providing reference for the subsequent operation of the user;
extracting a recording module; the method comprises the steps of setting a proper record extraction rule, and extracting records from a request result of a Web API;
extracting a field module; for setting appropriate field extraction rules, extracting fields from the extraction record;
a result preview module; for reorganizing the extracted fields into a structured record;
a result storage module; for storing the structured record;
the data extraction device also comprises a decryption module and/or a configuration information storage module and/or a timing execution module;
a decryption module; decrypting the request result of the encrypted Web API;
a configuration information storage module; the method comprises the steps of storing user configuration information, including a request protocol, a request mode, a request address and a request parameter for accessing a Web API, extracting a regular expression of a record, a field needing to be extracted, extracting the regular expression of the field, and storing information required by the structured record;
a timing execution module; and the Web API data extraction task is executed at fixed time according to the content updating frequency of the Web API.
CN202210803680.5A 2022-07-07 2022-07-07 Web API data extraction method and device Active CN115168714B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210803680.5A CN115168714B (en) 2022-07-07 2022-07-07 Web API data extraction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210803680.5A CN115168714B (en) 2022-07-07 2022-07-07 Web API data extraction method and device

Publications (2)

Publication Number Publication Date
CN115168714A CN115168714A (en) 2022-10-11
CN115168714B true CN115168714B (en) 2023-11-10

Family

ID=83494016

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210803680.5A Active CN115168714B (en) 2022-07-07 2022-07-07 Web API data extraction method and device

Country Status (1)

Country Link
CN (1) CN115168714B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101515287A (en) * 2009-03-24 2009-08-26 崔志明 Automatic generating method of wrapper of complex page
CN102004777A (en) * 2010-11-19 2011-04-06 中国科学院软件研究所 Customizable Web information integration method and system
CN105138561A (en) * 2015-07-23 2015-12-09 中国测绘科学研究院 Deep web space data acquisition method and apparatus
CN106845092A (en) * 2017-01-03 2017-06-13 青岛海信医疗设备股份有限公司 A kind of system docking method and device
CN107704539A (en) * 2017-09-22 2018-02-16 清华大学 The method and device of extensive text message batch structuring
CN110262904A (en) * 2019-05-17 2019-09-20 北京达佳互联信息技术有限公司 Collecting method and device
CN113204706A (en) * 2021-05-24 2021-08-03 北京明略软件系统有限公司 Data screening and extracting method and system based on MapReduce

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100083095A1 (en) * 2008-09-29 2010-04-01 Nikovski Daniel N Method for Extracting Data from Web Pages
US8745639B2 (en) * 2009-12-31 2014-06-03 Cbs Interactive Inc. Controller and method to build a combined web page using data retrieved from multiple APIS
WO2016138067A1 (en) * 2015-02-24 2016-09-01 Cloudlock, Inc. System and method for securing an enterprise computing environment
WO2017190153A1 (en) * 2016-04-29 2017-11-02 Unifi Software Automatic generation of structured data from semi-structured data
US11681710B2 (en) * 2018-12-23 2023-06-20 Microsoft Technology Licensing, Llc Entity extraction rules harvesting and performance
US20200242634A1 (en) * 2019-01-29 2020-07-30 Salesforce.Com, Inc. Method and system for automatically identifying candidates from a plurality of different websites, determining which candidates correspond to company executives for a company profile, and generating an executive profile for the company profile
US10942732B1 (en) * 2019-08-19 2021-03-09 Sap Se Integration test framework

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101515287A (en) * 2009-03-24 2009-08-26 崔志明 Automatic generating method of wrapper of complex page
CN102004777A (en) * 2010-11-19 2011-04-06 中国科学院软件研究所 Customizable Web information integration method and system
CN105138561A (en) * 2015-07-23 2015-12-09 中国测绘科学研究院 Deep web space data acquisition method and apparatus
CN106845092A (en) * 2017-01-03 2017-06-13 青岛海信医疗设备股份有限公司 A kind of system docking method and device
CN107704539A (en) * 2017-09-22 2018-02-16 清华大学 The method and device of extensive text message batch structuring
CN110262904A (en) * 2019-05-17 2019-09-20 北京达佳互联信息技术有限公司 Collecting method and device
CN113204706A (en) * 2021-05-24 2021-08-03 北京明略软件系统有限公司 Data screening and extracting method and system based on MapReduce

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
API数据提取;mysimplebook;《https://www.jianshu.com/p/026b30f45d59》;20191107;网页第1-3页 *

Also Published As

Publication number Publication date
CN115168714A (en) 2022-10-11

Similar Documents

Publication Publication Date Title
US11741309B2 (en) Templated rule-based data augmentation for intent extraction
CN106096056B (en) One kind being based on distributed public sentiment data real-time collecting method and system
CN106575166B (en) Method for processing hand input character, splitting and merging data and processing encoding and decoding
US11520992B2 (en) Hybrid learning system for natural language understanding
CN102193953B (en) System and method for migrating desktop applications
JP5570608B2 (en) Excel-based analysis report creation system and method
JP5756386B2 (en) Method, apparatus, and program for supporting generation and management of metadata for correcting problems of dynamic web application
AU2019201531A1 (en) An in-app conversational question answering assistant for product help
US20130145241A1 (en) Automated augmentation of text, web and physical environments using multimedia content
US20120290292A1 (en) Unstructured data support with automatic rule generation
JPS6356726A (en) Method for converting knowledge base to data base
WO2020091863A1 (en) Systems and methods for identifying documents with topic vectors
CN110532309A (en) A kind of generation method of Library User's portrait system
CN116360879A (en) Method and device for creating multi-level information framework
CN115168714B (en) Web API data extraction method and device
KR20220168062A (en) Article writing soulution using artificial intelligence and device using the same
CN102804174B (en) Sequential layout builder architecture
US20220229998A1 (en) Lookup source framework for a natural language understanding (nlu) framework
CN116306506A (en) Intelligent mail template method based on content identification
CN104407875B (en) A kind of web site contents preparation method of dynamic renewal
CN110311819A (en) Automatic production of HTML page and MIBs table generating method, management method, equipment end and management system based on page configuration file
JP2005284417A (en) Random access method for xml document of table format, and its program
JP5163662B2 (en) File generation system and file generation method
Madan et al. Social network wrappers (SNWs): An approach used for exploiting and mining social media platforms
JPH096602A (en) Information storage method of integrated er model and software development process management system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant