CN116244486A - Crawling data processing method and system based on data stream - Google Patents
Crawling data processing method and system based on data stream Download PDFInfo
- Publication number
- CN116244486A CN116244486A CN202310244348.4A CN202310244348A CN116244486A CN 116244486 A CN116244486 A CN 116244486A CN 202310244348 A CN202310244348 A CN 202310244348A CN 116244486 A CN116244486 A CN 116244486A
- Authority
- CN
- China
- Prior art keywords
- data
- item
- crawling
- items
- pipeline
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000009193 crawling Effects 0.000 title claims abstract description 46
- 238000003672 processing method Methods 0.000 title claims abstract description 20
- 238000012545 processing Methods 0.000 claims abstract description 47
- 230000006870 function Effects 0.000 claims abstract description 45
- 238000004140 cleaning Methods 0.000 claims abstract description 32
- 238000000034 method Methods 0.000 claims abstract description 13
- 238000012805 post-processing Methods 0.000 claims description 10
- 238000004590 computer program Methods 0.000 claims description 5
- 238000010276 construction Methods 0.000 abstract description 4
- 238000013479 data entry Methods 0.000 abstract 2
- 238000007726 management method Methods 0.000 description 5
- 238000004891 communication Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 239000008186 active pharmaceutical agent Substances 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/23—Updating
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
The invention discloses a crawling data processing method and system based on data flow, wherein the method comprises the following steps: crawling target data based on the key information strip to generate a plurality of data items, and transmitting the data items to a first data pipeline; receiving data items through a first data pipeline, inputting the data items into a corresponding data cleaning function according to the types of the data items for cleaning, and transmitting the cleaned data items meeting the requirements to a second data pipeline; creating a plurality of data entry functions of different types, receiving data items through a second data pipeline, inputting the data items into corresponding data entry functions according to the types of the data items for entry query processing, and updating the database according to query processing results; the data processing mode has clear logic structure, is convenient to expand, can realize the quick construction of one data acquisition item, and is not easy to make mistakes when the data meeting the conditions are stored in the database.
Description
Technical Field
The present invention relates to the field of crawling data processing technologies, and in particular, to a crawling data processing method and system based on data flow.
Background
With the development of artificial intelligence technology, more and more functions require a large amount of data as support. While a significant portion of enterprise users employ crawler tools to collect data and analyze the data using big data. Crawler technology is used for capturing data from web pages or equipment information and other places through certain rules and methods. But the quality of the data collected by the crawler tool is far from meeting the requirements of being able to be used, so the data needs to be subjected to a large number of cleaning and warehousing procedures. Often, an enterprise or an item needs to collect data information of tens or hundreds of dimensions, so a multitasking data collection program generally adopts a parallel processing mode in the prior art, that is, each processing module (such as cleaning, warehousing and the like) is independent of each other, each processing module is in communication connection with a database, any processing module is used for placing target data in the database after completing tasks of the processing module, for example, after a cleaning module takes out data from the database and cleans the data, the data meeting requirements is placed in the database, and a database entering module extracts the data placed in the cleaning module from the database for processing. For the traditional data crawling processing mode, when the data types are multiple, the frame construction work for data acquisition and processing is huge, the management difficulty is also high, logic confusion is easy to occur, and repeated warehouse entry is easy to cause.
Disclosure of Invention
The invention aims to provide a crawling data processing method and system based on data flow, which can quickly build a data acquisition and processing program framework, has a clear logic structure and is not easy to make mistakes.
In order to achieve the above object, the present invention discloses a crawling data processing method based on data flow, which includes:
creating a plurality of key information strips which respectively belong to different dimensions and are used for crawling data;
crawling target data based on the key information bar by adopting a crawler tool to generate a plurality of data items, wherein each data Item comprises one Item of target data, and transmitting the data Item to a first data pipeline;
creating a plurality of different types of data cleaning functions;
receiving the data Item through the first data pipeline, inputting the data Item into a corresponding data cleaning function according to the category of the data Item for cleaning, and transmitting the cleaned data Item meeting the requirement to a second data pipeline;
creating a plurality of data warehouse-in functions of different types;
and receiving the data Item through the second data pipeline, inputting the data Item into a corresponding data warehousing function according to the category of the data Item for warehousing query processing, and updating a database according to the query processing result.
Preferably, each key information bar includes a plurality of data fields, field names of data fields representing the same content in the key information bars of different dimensions are the same, and table names and table unique indexes corresponding to each key information bar are integrated in the same information table to perform unified management.
Preferably, when a data Item is newly added into a database, all data items of the same category of the newly added data Item in the database are integrally ordered.
Preferably, the method for overall ordering the data items comprises the following steps:
when the data Item is processed through the data warehousing function, transmitting the data Item meeting the warehousing condition to a third data pipeline;
receiving the data Item from the third data pipeline by adopting a data marking function, marking the data Item, and writing the characteristic name of the data Item into redis;
and reading the corresponding feature names from the Redis by adopting a data sorting function, and sorting the marks of the similar data items in the database based on the read feature names.
The invention also discloses a crawling data processing system based on the data stream, which comprises:
the data preparation module is used for creating a plurality of key information strips which respectively belong to different dimensions and are used for crawling data;
the data acquisition module is used for crawling target data based on the key information bar by adopting a crawler tool to generate a plurality of data items, wherein each data Item comprises one Item of target data, and the data items are transmitted to a first data pipeline;
the data cleaning module is used for creating a plurality of data cleaning functions of different types, receiving the data items through the first data pipeline, inputting the data items into the corresponding data cleaning functions according to the types of the data items for cleaning, and transmitting the cleaned data items meeting the requirements to the second data pipeline;
the data warehouse-in module is used for creating a plurality of data warehouse-in functions of different types, receiving the data Item through the second data pipeline, inputting the data Item into the corresponding data warehouse-in function according to the category of the data Item for warehouse-in query processing, and updating the database according to the query processing result.
Preferably, each key information bar includes a plurality of data fields, and field names of data fields representing the same content in the key information bars of different dimensions are the same, and the data preparation module further integrates a table name and a table unique index corresponding to each key information bar into the same information table for unified management.
Preferably, the system further comprises a data post-processing module, wherein the data post-processing module is used for integrally sequencing all data items of the same category of the data Item newly added in the database when the data Item is newly added in the database.
Preferably, the data post-processing module comprises a marking module and a sorting module; the marking module is used for receiving the data Item from the third data pipeline by adopting a data marking function, marking the data Item, and writing the characteristic name of the data Item into redis; the third data pipeline is used for receiving data items meeting the warehousing conditions; the sorting module is used for reading the corresponding feature names from the Redis by adopting a data sorting function and sorting the marks of the similar data items in the database based on the read feature names.
The invention also discloses another crawling data processing system based on the data stream, which comprises:
one or more processors;
a memory;
and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the programs including instructions for performing the data stream based crawling data processing method as described above.
The invention also discloses a computer readable storage medium comprising a computer program executable by a processor to perform a data stream based crawling data processing method as described above.
Compared with the prior art, the technical scheme of the invention designs the framework of the processing program for processing the crawling data by using the thought of the data flow, namely, each processing flow is connected in series, and only the last warehousing flow is connected with the database in a communication way, so that the data to be processed flows into the cleaning stage from the collecting stage and flows into the warehousing stage from the cleaning stage in a sequential flow mode, and finally the data meeting the requirements is updated to the database in the warehousing stage; therefore, the data processing mode has clear logic structure, is convenient to expand, can realize the quick construction of one data acquisition item, and is not easy to make mistakes when the data meeting the conditions are stored in the database.
Drawings
Fig. 1 is a schematic diagram of a crawling data processing method in an embodiment of the present invention.
FIG. 2 is a flowchart of a method for crawling data processing in an embodiment of the present invention.
Detailed Description
In order to describe the technical content, the constructional features, the achieved objects and effects of the present invention in detail, the following description is made in connection with the embodiments and the accompanying drawings.
The embodiment discloses a crawling data processing method based on data flow, which is used for crawling data from web pages or other equipment and other places through a crawler tool. Specifically, as shown in fig. 1 and 2, the data processing method includes:
s1, a data preparation stage: according to project requirements, creating a plurality of key information strips which respectively belong to different dimensions and are used for crawling data;
s2, entering a data acquisition stage: crawling target data from a target webpage or other equipment based on the key information bar by adopting a crawler tool to generate a plurality of data items (namely data containers), wherein each data Item comprises one Item of target data, and transmitting the data Item to a first data pipeline;
s3, entering a data cleaning stage: firstly, creating a plurality of data cleaning functions of different types;
s4, receiving the data Item through the first data pipeline, inputting the data Item into a corresponding data cleaning function according to the category of the data Item for cleaning, and transmitting the cleaned data Item meeting the requirement to a second data pipeline;
s5, entering a data warehouse-in stage: firstly, creating a plurality of data warehouse-in functions of different types;
s6, receiving the data Item through the second data pipeline, inputting the data Item into a corresponding data warehousing function according to the category of the data Item to carry out warehousing query processing, and updating a database according to the query processing result. That is, whether the same object as the data in the current data Item exists is queried in the database, if not, the creation time and the update time are initialized for the current database, and then the data insertion operation is performed; if the data object exists, updating the field values of the new data and the old data, if the field values are inconsistent, directly skipping, generating an updated dictionary, adding the updated data, and then performing data updating operation.
In the data processing method in this embodiment, the framework of the processing procedure for processing the crawl data is designed by using the idea of data flow, that is, as shown in fig. 1, each processing procedure is connected in series, and only the last warehousing procedure is connected with the database in a communication manner, so that the data to be processed flows from the acquisition stage to the cleaning stage, flows from the cleaning stage to the warehousing stage, and finally updates the data meeting the requirements to the database in the warehousing stage. Therefore, the data processing mode has clear logic structure, is convenient to expand, can realize the quick construction of one data acquisition item, and is not easy to make mistakes when the data meeting the conditions are stored in the database.
Further, each key information bar includes a plurality of data fields, field names of the data fields representing the same content (such as release time) in the key information bars with different dimensions are the same, and a table name and a table unique index corresponding to each key information bar are integrated in the same information table so as to perform unified management and facilitate subsequent unified call.
Furthermore, the data processing method in this embodiment further includes a data post-processing stage, that is, when a database has a new data Item added into the database, the data items of the same class as the newly added data Item in the database are integrally ordered, so as to facilitate subsequent calls.
Specifically, the method for overall ordering the data items includes:
firstly, when the data Item is processed through the data warehousing function, transmitting the data Item meeting the warehousing condition to a third data pipeline;
then, the data Item is received from the third data pipeline by adopting a data marking function, the data Item is marked, and the characteristic name of the data Item is written into redis (remote dictionary service, which is an open source log-type, key-Value database written by ANSI C language, supports network, can be based on memory and can be persistent and provides APIs of multiple languages); for example, if the data stored in the database is component A and the version number is 1.0, the feature name "component A" is written into redis;
and then, reading the corresponding feature names from the Redis by adopting a data sorting function, and sorting the marks of the similar data items in the database based on the read feature names. For example, if "component a" is read, all the data of component a are queried in the database, if three data are queried, namely, component a (version 1.0), component a (version 2.0), component a (version 3.0), wherein component a (version 1.0) and component a (version 2.0) are the existing data, the sequence number of the tag of component a (version 1.0) is 2, the sequence number of the tag of component a (version 2.0) is 1 (representing the latest), then, but after component a (version 3.0) enters, the sequence number of the tag of component a (version 1.0) is 3, the sequence number of the tag of component a (version 2.0) is 2, and the sequence number of the tag of component a (version 3.0) is 1 (representing the latest) through the processing of the data sorting function.
In another preferred embodiment of the present invention, a crawling data processing system based on data flow is also disclosed, which includes the following functional modules:
the data preparation module is used for creating a plurality of key information strips which respectively belong to different dimensions and are used for crawling data;
the data acquisition module is used for crawling target data based on the key information bar by adopting a crawler tool to generate a plurality of data items, wherein each data Item comprises one Item of target data, and the data items are transmitted to a first data pipeline;
the data cleaning module is used for creating a plurality of data cleaning functions of different types, receiving the data items through the first data pipeline, inputting the data items into the corresponding data cleaning functions according to the types of the data items for cleaning, and transmitting the cleaned data items meeting the requirements to the second data pipeline;
the data warehouse-in module is used for creating a plurality of data warehouse-in functions of different types, receiving the data Item through the second data pipeline, inputting the data Item into the corresponding data warehouse-in function according to the category of the data Item for warehouse-in query processing, and updating the database according to the query processing result.
Further, each key information bar includes a plurality of data fields, field names of data fields representing the same content in the key information bars of different dimensions are the same, and the data preparation module integrates a table name and a table unique index corresponding to each key information bar into the same information table so as to perform unified management.
Furthermore, the processing system in this embodiment further includes a data post-processing module, where the data post-processing module is configured to, when a database has a new data Item added in the database, perform overall sorting on all data items in the same category as the newly added data Item in the database.
Specifically, the data post-processing module comprises a marking module and a sorting module; the marking module is used for receiving the data Item from the third data pipeline by adopting a data marking function, marking the data Item, and writing the characteristic name of the data Item into redis; the third data pipeline is used for receiving data items meeting the warehousing conditions; the sorting module is used for reading the corresponding feature names from the Redis by adopting a data sorting function and sorting the marks of the similar data items in the database based on the read feature names.
The present invention also discloses another data stream based crawling data processing system comprising one or more processors, a memory and one or more programs, wherein one or more programs are stored in the memory and configured to be executed by the one or more processors, the programs comprising instructions for performing the data stream based crawling data processing method as described above. The processor may employ a general-purpose central processing unit (Central Processing Unit, CPU), microprocessor, application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits for executing associated programs to perform the functions required to be performed by the modules in the data flow based crawling data processing system of the embodiments of the present application or to perform the data flow based crawling data processing method of the embodiments of the present application.
The invention also discloses a computer readable storage medium comprising a computer program executable by a processor to perform a data stream based crawling data processing method as described above. The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a read-only memory (ROM), or a random-access memory (random access memory, RAM), or a magnetic medium, for example, a floppy disk, a hard disk, a magnetic tape, a magnetic disk, or an optical medium, for example, a digital versatile disk (digital versatile disc, DVD), or a semiconductor medium, for example, a Solid State Disk (SSD), or the like.
The present application also discloses a computer program product or a computer program comprising computer instructions stored in a computer readable storage medium. The processor of the electronic device reads the computer instructions from the computer readable storage medium and executes the computer instructions to cause the electronic device to perform the data stream based crawling data processing method described above.
The foregoing description of the preferred embodiments of the present invention is not intended to limit the scope of the claims, which follow, as defined in the claims.
Claims (10)
1. A method for crawling data processing based on a data stream, comprising:
creating a plurality of key information strips which respectively belong to different dimensions and are used for crawling data;
crawling target data based on the key information bar by adopting a crawler tool to generate a plurality of data items, wherein each data Item comprises one Item of target data, and transmitting the data Item to a first data pipeline;
creating a plurality of different types of data cleaning functions;
receiving the data Item through the first data pipeline, inputting the data Item into a corresponding data cleaning function according to the category of the data Item for cleaning, and transmitting the cleaned data Item meeting the requirement to a second data pipeline;
creating a plurality of data warehouse-in functions of different types;
and receiving the data Item through the second data pipeline, inputting the data Item into a corresponding data warehousing function according to the category of the data Item for warehousing query processing, and updating a database according to the query processing result.
2. The method according to claim 1, wherein each key information bar includes a plurality of data fields, and field names of data fields representing the same content in the key information bars of different dimensions are the same, and a table name and a table unique index corresponding to each key information bar are integrated in the same information table for unified management.
3. The crawling data processing method based on data flow according to claim 1, characterized in that when a new data Item is added in a database, all data items of the same category of the newly added data Item in the database are integrally ordered.
4. A method of data flow based crawling data processing as claimed in claim 3, wherein the method of overall ordering said data items comprises:
when the data Item is processed through the data warehousing function, transmitting the data Item meeting the warehousing condition to a third data pipeline;
receiving the data Item from the third data pipeline by adopting a data marking function, marking the data Item, and writing the characteristic name of the data Item into redis;
and reading the corresponding feature names from the Redis by adopting a data sorting function, and sorting the marks of the similar data items in the database based on the read feature names.
5. A data flow based crawling data processing system, comprising:
the data preparation module is used for creating a plurality of key information strips which respectively belong to different dimensions and are used for crawling data;
the data acquisition module is used for crawling target data based on the key information bar by adopting a crawler tool to generate a plurality of data items, wherein each data Item comprises one Item of target data, and the data items are transmitted to a first data pipeline;
the data cleaning module is used for creating a plurality of data cleaning functions of different types, receiving the data items through the first data pipeline, inputting the data items into the corresponding data cleaning functions according to the types of the data items for cleaning, and transmitting the cleaned data items meeting the requirements to the second data pipeline;
the data warehouse-in module is used for creating a plurality of data warehouse-in functions of different types, receiving the data Item through the second data pipeline, inputting the data Item into the corresponding data warehouse-in function according to the category of the data Item for warehouse-in query processing, and updating the database according to the query processing result.
6. The system of claim 5, wherein each key information item includes a plurality of data fields, and the fields of the data fields representing the same content in the key information items of different dimensions are the same, and the data preparation module further integrates the table name and the table unique index corresponding to each key information item into the same information table for unified management.
7. The crawling data processing system based on data flow of claim 5, further comprising a data post-processing module, wherein the data post-processing module is configured to, when a database has a newly added data Item in the database, perform overall sorting on all data items in the same category as the newly added data Item in the database.
8. The data stream based crawling data processing system of claim 7, wherein said data post-processing module comprises a tagging module and a ranking module; the marking module is used for receiving the data Item from the third data pipeline by adopting a data marking function, marking the data Item, and writing the characteristic name of the data Item into redis; the third data pipeline is used for receiving data items meeting the warehousing conditions; the sorting module is used for reading the corresponding feature names from the Redis by adopting a data sorting function and sorting the marks of the similar data items in the database based on the read feature names.
9. A data flow based crawling data processing system, comprising:
one or more processors;
a memory;
and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the programs comprising instructions for performing the data flow based crawling data processing method of any of claims 1-4.
10. A computer readable storage medium comprising a computer program executable by a processor to perform the data stream based crawling data processing method of any of claims 1 to 4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310244348.4A CN116244486A (en) | 2023-03-06 | 2023-03-06 | Crawling data processing method and system based on data stream |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310244348.4A CN116244486A (en) | 2023-03-06 | 2023-03-06 | Crawling data processing method and system based on data stream |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116244486A true CN116244486A (en) | 2023-06-09 |
Family
ID=86633063
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310244348.4A Pending CN116244486A (en) | 2023-03-06 | 2023-03-06 | Crawling data processing method and system based on data stream |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116244486A (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110251878A1 (en) * | 2010-04-13 | 2011-10-13 | Yahoo! Inc. | System for processing large amounts of data |
CN105069117A (en) * | 2015-08-11 | 2015-11-18 | 国网技术学院 | Data flow efficiency improving method based on storage process |
CN110781368A (en) * | 2019-10-22 | 2020-02-11 | 北京赛时科技有限公司 | Information crawling system and method for specified experts |
CN112597373A (en) * | 2020-12-29 | 2021-04-02 | 科技谷(厦门)信息技术有限公司 | Data acquisition method based on distributed crawler engine |
-
2023
- 2023-03-06 CN CN202310244348.4A patent/CN116244486A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110251878A1 (en) * | 2010-04-13 | 2011-10-13 | Yahoo! Inc. | System for processing large amounts of data |
CN105069117A (en) * | 2015-08-11 | 2015-11-18 | 国网技术学院 | Data flow efficiency improving method based on storage process |
CN110781368A (en) * | 2019-10-22 | 2020-02-11 | 北京赛时科技有限公司 | Information crawling system and method for specified experts |
CN112597373A (en) * | 2020-12-29 | 2021-04-02 | 科技谷(厦门)信息技术有限公司 | Data acquisition method based on distributed crawler engine |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109669933B (en) | Transaction data intelligent processing method and device and computer readable storage medium | |
CN105138312B (en) | A kind of table generation method and device | |
CN104391978A (en) | Method and device for storing and processing web pages of browsers | |
CN115269515B (en) | Processing method for searching specified target document data | |
CN104346331A (en) | Retrieval method and system for XML database | |
CN111627552A (en) | Medical streaming data blood relationship analysis and storage method and device | |
CN112307191A (en) | Multi-system interactive log query method, device, equipment and storage medium | |
KR20170115109A (en) | Text-Mining Application Technique for Productive Construction Document Management | |
CN111143370B (en) | Method, apparatus and computer-readable storage medium for analyzing relationships between a plurality of data tables | |
CN110765402A (en) | Visual acquisition system and method based on network resources | |
CN105677723A (en) | Method for establishing and searching data labels for industrial signal source | |
CN117076742A (en) | Data blood edge tracking method and device and electronic equipment | |
US20180060404A1 (en) | Schema abstraction in data ecosystems | |
WO2016206395A1 (en) | Weekly report information processing method and device | |
CN116244486A (en) | Crawling data processing method and system based on data stream | |
CN116450664A (en) | Data processing method, device, equipment and storage medium | |
CN109948015B (en) | Meta search list result extraction method and system | |
CN114882242A (en) | Violation image identification method and system based on computer vision | |
CN112131215B (en) | Bottom-up database information acquisition method and device | |
JP5444071B2 (en) | Fault information collection system, method and program | |
CN111352824A (en) | Test method and device and computer equipment | |
CN112925856B (en) | Entity relationship analysis method, entity relationship analysis device, entity relationship analysis equipment and computer storage medium | |
CN111931502B (en) | Word segmentation processing method and system and word segmentation searching method | |
CN110020050B (en) | Method for realizing intelligent capture rule configuration technology based on standard documents | |
Azeroual | A text and data analytics approach to enrich the quality of unstructured research information |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |