CN117851487A - Data acquisition method, device, electronic equipment and storage medium - Google Patents

Data acquisition method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN117851487A
CN117851487A CN202410010013.0A CN202410010013A CN117851487A CN 117851487 A CN117851487 A CN 117851487A CN 202410010013 A CN202410010013 A CN 202410010013A CN 117851487 A CN117851487 A CN 117851487A
Authority
CN
China
Prior art keywords
acquisition
data
template
target data
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410010013.0A
Other languages
Chinese (zh)
Inventor
王守任
付龙
刘磊
叶佳蕊
薛茜
陈润
吕琦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Bond Financial Valuation Center Co ltd
Original Assignee
China Bond Financial Valuation Center Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Bond Financial Valuation Center Co ltd filed Critical China Bond Financial Valuation Center Co ltd
Priority to CN202410010013.0A priority Critical patent/CN117851487A/en
Publication of CN117851487A publication Critical patent/CN117851487A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a data acquisition method, a data acquisition device, electronic equipment and a storage medium, and belongs to the technical field of data acquisition. The method comprises the following steps: determining at least one acquisition task; calling an acquisition template corresponding to each acquisition task; aiming at each acquisition template, acquiring target data from a data source corresponding to the acquisition template, and filling the target data into the acquisition template; and sending the filled acquisition template to a client terminal corresponding to the rechecking personnel, and storing target data in the filled acquisition template into an acquisition database after receiving a rechecking pass instruction sent by the client terminal. According to the method, the acquisition templates corresponding to the acquisition tasks are called, automatic acquisition of target data is carried out on each acquisition template from the corresponding data source, and after the review personnel review, the target data in the filled acquisition templates are stored in the acquisition database, so that automatic acquisition of the data is realized, and personnel management cost is reduced.

Description

Data acquisition method, device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of data acquisition technologies, and in particular, to a data acquisition method, a data acquisition device, an electronic device, and a storage medium.
Background
In recent years, as the use of data is becoming more and more abundant, the requirements for accuracy and timeliness of data acquisition are also increasing.
The traditional data acquisition mode relies on a large amount of manual operations, from the acquisition of files, the reading of files, the copying of information and the checking of quality, the manual implementation of a plurality of posts is needed, the acquisition efficiency is low, various links which are complicated and complicated in the acquisition process have huge operation risks, the risks directly influence the accuracy of data acquisition results and the use effect of downstream data application, and the quality of acquisition work and the management of manual organization are both required to be ensured with great cost.
Disclosure of Invention
The invention provides a data acquisition method, a data acquisition device, electronic equipment and a storage medium, which are used for solving the defects of low acquisition efficiency and high management cost in the prior art and realizing automatic acquisition of data, thereby improving the acquisition efficiency and reducing the personnel management cost.
The invention provides a data acquisition method, which comprises the following steps:
determining at least one acquisition task;
calling an acquisition template corresponding to each acquisition task, wherein the acquisition template is generated based on a template configuration instruction or is generated based on an acquisition item corresponding to the acquisition task;
For each acquisition template, acquiring target data from a data source corresponding to the acquisition template, and filling the target data into the acquisition template;
and sending the filled acquisition template to a client terminal corresponding to a rechecking person, and storing target data in the filled acquisition template into an acquisition database after receiving a rechecking pass instruction sent by the client terminal.
According to the data acquisition method provided by the invention, the acquisition of target data is carried out from the data source corresponding to the acquisition template, and the target data is filled into the acquisition template, and the method comprises the following steps:
collecting target data from the data source, performing primary verification on the target data, and determining whether the target data accords with preset field information of each field in the collection template, wherein the preset field information at least comprises field type information and field unique information;
and filling the target data into corresponding fields in the acquisition template when the target data accords with the preset field information.
According to the data acquisition method provided by the invention, the acquisition of target data from the data source comprises the following steps:
And when the target data at the data source is unstructured data, calling an identification model corresponding to the unstructured data to identify and grab the target data so as to acquire the target data.
According to the data acquisition method provided by the invention, before the filled acquisition template is sent to the client terminal corresponding to the rechecking personnel, the method further comprises the following steps:
and determining whether fields needing manual acquisition exist in the acquisition template, and filling corresponding fields based on operation instructions of acquisition personnel when the fields exist.
According to the data acquisition method provided by the invention, the filled acquisition template is sent to the client terminal corresponding to the rechecking personnel, and the method comprises the following steps:
and carrying out secondary verification on the filled acquisition template based on a preset rule, and after the secondary verification is passed, sending the filled acquisition template to the client terminal, wherein the preset rule is preset based on the acquisition requirement of acquisition items.
According to the data acquisition method provided by the invention, the acquisition database comprises an initial library table, a copy table and a target table, wherein the initial library table is used for storing initial acquired target data, the copy table is used for storing target data to be checked, and the target table is used for storing the checked target data;
The duplicate table and the target table are generated by mapping when the initial library table is stored, the initial library table is generated based on the configuration of a user, and the field information in the duplicate table and the field information in the target table are the same as the field information in the initial library table;
the method further comprises the steps of:
when it is determined that the target table needs to be synchronized to a data warehouse, the target table is synchronized to the data warehouse with the data storage of the fields of the target table completed.
According to the data acquisition method provided by the invention, before determining at least one acquisition task, the method further comprises the following steps:
outputting a configuration interface, the configuration interface comprising a plurality of configuration components;
and responding to a configuration instruction, and determining target configuration content, wherein the target configuration content comprises the acquisition template, the preset rule and the initial library table.
The invention also provides a data acquisition device, comprising:
the determining module is used for determining at least one acquisition task;
the calling module is used for calling the acquisition templates corresponding to the acquisition tasks, wherein the acquisition templates are generated based on template configuration instructions or are generated based on the acquisition items corresponding to the acquisition tasks;
The acquisition module is used for acquiring target data from a data source corresponding to each acquisition template and filling the target data into the acquisition templates;
the storage module is used for sending the filled collection template to a client terminal corresponding to a rechecking person, and storing target data in the filled collection template into the collection database after receiving a rechecking pass instruction sent by the client terminal.
The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the data acquisition method as described above when executing the program.
The invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a data acquisition method as described in any of the above.
The invention also provides a computer program product comprising a computer program which, when executed by a processor, implements a data acquisition method as described in any one of the above.
According to the data acquisition method, the device, the electronic equipment and the storage medium, the acquisition templates corresponding to the acquisition tasks are called through determining the acquisition tasks, automatic acquisition of target data is carried out on each acquisition template from the corresponding data source, the target data are filled in the acquisition templates after acquisition, and the target data in the filled acquisition templates are stored in the acquisition database after rechecking of a rechecking staff passes, so that automatic acquisition of the data is realized, and personnel management cost is reduced.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a data acquisition method provided by the invention;
FIG. 2 is a schematic diagram of a data acquisition device according to the present invention;
fig. 3 is a schematic structural diagram of an electronic device provided by the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The traditional data acquisition mode relies on a large amount of manual operations, from the acquisition of file, the reading of file, the duplication of information and the check of quality, needs the manual realization of many posts, and acquisition efficiency is lower, has huge operation risk to the quality of acquisition work and the management of manual organization all need great cost to give the guarantee.
In view of the foregoing, the present invention provides a data acquisition method, and fig. 1 is a schematic flow chart of the data acquisition method provided in the embodiment of the present invention, as shown in fig. 1, where the data acquisition method includes:
step 110: at least one acquisition task is determined.
Specifically, at least one acquisition task is determined, and the acquisition task can be generated by triggering an acquisition data source or manually uploading the acquisition data source, and is not particularly limited herein. The staff can configure the collected data sources corresponding to the collected items through the data source configuration component in advance aiming at the collected items, the data source configuration component supports FTP (File Transfer Protocol) and SFTP (Secure File Transfer Protocol), the bulletin library docking, the data acquisition platform, the digital warehouse, the Python platform and the like, and after different types of data sources are selected, the data sources can trigger the generation task.
FTP/SFTP access: the configuration item comprises the belonging collection item, an FTP/SFTP server address, a port, a login user name password, a file name, a directory, a file type and a file updating frequency, and if the access file type is csv or excel in a fixed format, a mapping relation can be established, and a file field is mapped into a collection template.
Docking of bulletin libraries: the acquisition task is triggered to be generated by periodically searching the bulletin or setting a mode of sending a message when the bulletin of a specific type is newly added. The configuration information of the bulletin comprises the belonged collection project, the belonged collection template, the bulletin source, the bulletin type, the bulletin title keyword, the increment/total information, the bulletin update reminding and the like. The bulletin library can be pre-configured based on the configuration information according to the data acquisition requirement.
And a data acquisition platform: and completing configuration of page element information grabbing by acquiring configuration.
Data warehouse: the data warehouse may be queried by SQL (Structured Query Language ) to generate acquisition tasks. The data warehouse may output a fetch result at regular time, where the fetch result is a logical result that the data warehouse runs out based on a pre-configuration, and based on the fetch result, an acquisition task may be triggered. The configuration information of the data warehouse includes: the data warehouse can be configured in advance based on the configuration information according to data acquisition requirements, wherein the data warehouse is layered, relates to an initial library table, query SQL, mapping relation from a query result to an acquisition template, query frequency, call-up time and the like.
Python platform: the Python platform processes the generated file, the Python platform can call the information updating interface and write the file-id, and the method calls the file downloading interface to download the file after receiving the information of the Python platform, so as to generate the acquisition task.
Step 120: and calling an acquisition template corresponding to each acquisition task, wherein the acquisition template is generated based on a template configuration instruction or is generated based on an acquisition item corresponding to the acquisition task.
Specifically, each acquisition task corresponds to an acquisition template, and after the acquisition task is generated, a piece of task data is generated, wherein the task data comprises all the acquisition data under the acquisition task. Each acquisition task may correspond to an acquisition item and one or more acquisition templates. The task information includes: task ID, belonging acquisition project, task state, task generation time, data number, acquisition accuracy rate and the like.
The collection template is generated based on the template configuration instruction, namely, for the collection project, a configurator inputs the template configuration instruction on the template configuration page, and then the collection template corresponding to the template configuration instruction is generated according to the template configuration instruction input by the configurator. The collection template may be generated, for example, by parsing fields in whole or in part in a type of announcement in a collection item selected by a configurator. The other is based on the collection project corresponding to the collection task, automatically analyzing the content such as bulletin in the collection project, further automatically generating the collection template according to the analysis field, and it can be understood that the automatically generated collection template needs to be checked and adjusted manually to ensure the accuracy of the collection template.
The method can also output the generated acquisition task through the display screen, the acquisition task can be deleted based on the operation of a user, and the acquisition task is invalid after the deletion. The acquisition task can also be queried through information such as task ID, belonging acquisition project, generation time and the like, and after the query, a user clicks one piece of task data, all acquisition data under the acquisition task can be checked.
Step 130: and for each acquisition template, acquiring target data from a data source corresponding to the acquisition template, and filling the target data into the acquisition template.
Specifically, each acquisition template corresponds to a data source from which data is acquired. The data source corresponding to the acquisition template can acquire target data, and the acquisition template is filled after the target data is acquired.
When data is acquired from a data source and needs to be acquired by a crawler, relevant information of the data acquisition needs to be configured in advance, and the configuration information can include: the method comprises the steps of acquiring items, data source addresses, acquiring titles, maximum acquisition depth, allowing access to a parent directory, allowing access to a port, not allowing access to a port, capturing a screenshot, basic authentication information, frequency, proxy, generating tasks, intercepting pages, field matching rules, removing duplicates, filtering conditions and the like. The user can perform corresponding configuration based on the configuration information in the configuration interface.
The collection template includes a plurality of fields, each of which may include template information and field information, the template information including a template name, a collection item name, a flow name, a corresponding initial library table name, and whether to synchronize to the data warehouse. The field information includes the chinese-english name of the field, the field type (e.g., the field is character-type, link-type, or time-stamp, etc.), whether the primary key (i.e., whether the field is unique), whether it is null (i.e., whether the field can be null), and default values (i.e., whether there is a default fill-in value when the field is null).
Step 140: and sending the filled acquisition template to a client terminal corresponding to a rechecking person, and storing target data in the filled acquisition template into an acquisition database after receiving a rechecking pass instruction sent by the client terminal.
Specifically, after each field of the collection template is collected and written, the filled collection template is sent to a client terminal corresponding to a rechecking person, and the rechecking person can recheck the collection template by logging in the corresponding client terminal on the client terminal. After the rechecking is passed, a rechecking personnel sends a rechecking pass instruction, after the rechecking pass instruction is received, the target data acquired in the acquisition template is correct after rechecking, and the target data in the filled acquisition template is stored in an acquisition database.
When all the target data corresponding to all the fields on the acquisition template are checked and all the target data are correct, the rechecking personnel send rechecking passing instructions to instruct the acquisition template to pass the manual rechecking.
According to the data acquisition method provided by the invention, the acquisition task is determined, the acquisition template corresponding to the acquisition task is called, the automatic acquisition of the target data is carried out from the corresponding data source for each acquisition template, the target data is filled in the acquisition template after the acquisition, and the target data in the filled acquisition template is stored in the acquisition database after the review personnel review, so that the automatic acquisition of the data is realized, and the personnel management cost is reduced.
In one embodiment, the collecting the target data from the data source corresponding to the collecting template and filling the target data into the collecting template includes:
collecting target data from the data source, performing primary verification on the target data, and determining whether the target data accords with preset field information of each field in the collection template, wherein the preset field information at least comprises field type information and field unique information;
and filling the target data into corresponding fields in the acquisition template when the target data accords with the preset field information.
Specifically, the target data is collected from the data source, the target data is subjected to primary verification after the target data is collected, and the target data can be filled into the collection template only after the verification is successful. And when the target data is verified for the first time, determining whether the collected target data accords with preset field information of each field in the collection template, and if the target data does not accord with the preset field information, filling corresponding fields can not be performed.
The preset field information includes field type information and whether the field is unique, but may also include other information, which is not specifically limited herein. For example, when the field type of a certain field in the acquisition template is a numeric type, and when the acquired target data corresponding to the field is a date, it is indicated that the target data acquired for the field is wrong, so that the initial verification is not qualified, and the data cannot be filled into the field corresponding to the acquisition template. For another example, when a field in the acquisition template is a unique field (i.e., the field in the acquisition template only appears in one line of records), after the acquired target data corresponding to the field is found, when the target data already appears in other line of records of the acquisition template, the target data corresponding to the field is incorrect, so that the primary verification is not qualified, and the field corresponding to the acquisition template cannot be filled in.
According to the data acquisition method provided by the embodiment, after the target data is acquired, the target data is subjected to primary verification based on the preset field information of each field, so that the accuracy of the target data written into the acquisition template is improved.
In one embodiment, the collecting target data from the data source includes:
and when the target data at the data source is unstructured data, calling an identification model corresponding to the unstructured data to identify and grab the target data so as to acquire the target data.
In particular, unstructured data is data that is irregular or incomplete in data structure, has no predefined data model, and is inconvenient to represent by a two-dimensional logical table of a database. Including office documents, text, pictures, HTML, various types of reports, image and audio/video information, etc. in all formats.
When the target data at the data source is unstructured data, for example, the target data needs to be acquired from a bulletin, and the bulletin is usually in a PDF format and is unstructured data, at this time, an identification model corresponding to the unstructured data can be called to identify and capture the target data from the bulletin so as to acquire the target data, and the identification model corresponding to the unstructured data is an identification model capable of identifying the unstructured data.
Illustratively, based on machine learning algorithm, image recognition technology and natural language processing technology, an artificial intelligent model (i.e. recognition model) is constructed for various announcements and picture files in the data management so as to automatically recognize and extract information elements related to subsequent acquisition.
After the target data is identified and captured by adopting the identification model, the identified and captured target data is directly filled into the corresponding field in the acquisition template when the target data accords with the preset field information, and if the identified and captured target data does not accord with the target data of a certain preset field information or the captured target data is wrong data, the identification model needs to be retrained. After multiple data acquisition, the error case information of automatic recognition can be collected in a concentrated mode, recognition model training staff can perform secondary iterative training on the recognition model with more problems, and corresponding notices can be collected through extraction logs during the secondary iterative training so as to perform tuning training on the recognition model until the obtained recognition model meets expectations. The method specifically can carry out sample labeling through a general labeling method, a model developer locally loads corresponding identification models and labeling sample data, adjusts relevant parameters of the identification models, carries out iterative training, releases the identification models on line after the training is completed, replaces the original identification models, and further continuously supports automatic identification and extraction of unstructured data during data acquisition.
In one embodiment, before the step of sending the filled collection template to the client terminal corresponding to the recheck personnel, the method further includes:
and determining whether fields needing manual acquisition exist in the acquisition template, and filling corresponding fields based on operation instructions of acquisition personnel when the fields exist.
Specifically, after the automatic acquisition of the target data is performed, there may be fields that are not acquired, so before the acquisition template is sent to the client terminal corresponding to the recheck personnel, it needs to be determined whether there are fields that need to be acquired manually in the acquisition template, and when there are fields, the corresponding fields are filled in based on the operation instruction of the acquisition personnel.
The method includes the steps of determining whether a field needing manual collection exists in a collection template, and sending a collection instruction to a client terminal of a collection person when the field needing manual collection exists so as to instruct the collection person to conduct manual collection. The acquisition personnel can log in the client through the client terminal, and then receive the acquisition instruction. After receiving the acquisition instruction, the acquisition personnel manually acquire the field, for example, the field needs to be acquired from the bulletin, the acquisition personnel can inquire the corresponding bulletin in the bulletin library, the bulletin library outputs the bulletin corresponding to the inquiry instruction after receiving the inquiry instruction, and the acquisition personnel extracts the target data corresponding to the field from the bulletin and fills the target data into the corresponding field of the acquisition template so as to complete manual acquisition.
It can be appreciated that the collection procedure may be preconfigured, and the collection procedure may include, for example, a double collection, a single review, a double collection, a double review, and the like. The collection flow can be configured for each collection item respectively, for example, the collection items can be collected by two persons for single review, some collection items can be collected by one person for single review, and the collection flow of each collection item can be configured by configuration manager based on the requirement of the collection item, without specific limitation.
Based on the foregoing, it can be appreciated that when a field that needs to be manually collected exists in a certain collection template, when a collection instruction is sent to a client terminal of a collection person, the collection instruction needs to be sent according to a configured collection flow. For example, when a collection procedure configured by a certain collection item is double collection, when a field needing manual collection exists in a certain collection template corresponding to the collection item, a collection instruction needs to be sent to client terminals of two collection personnel corresponding to the collection item respectively, that is, the two corresponding collection personnel all need to manually collect the field.
According to the data acquisition method provided by the embodiment, on the basis of automatic acquisition, the function of manual acquisition supplement is provided for the fields which are difficult to automatically acquire, and the integrity of data acquisition is improved.
In one embodiment, the sending the filled-in collection template to the client terminal corresponding to the recheck personnel includes:
and carrying out secondary verification on the filled acquisition template based on a preset rule, and after the secondary verification is passed, sending the filled acquisition template to the client terminal, wherein the preset rule is preset based on the acquisition requirement of acquisition items.
Specifically, after target data is filled in an acquisition template, carrying out secondary verification on the filled acquisition template based on a preset rule, and after the secondary verification is passed, sending the filled acquisition template to a client terminal of a rechecking person for rechecking.
The preset rules are preset, for example, may be preconfigured. The preset rule of the secondary verification can be flexibly set according to the requirement through the verification configuration of the secondary verification of the target data in the acquisition template. The configuration information of the preset rule comprises: and checking SQL, a transposition symbol, a relation operator, an expected value, a checking level, prompt information and the like, and flexibly configuring preset rules through the configuration information. The preset rule can be configured in an SQL mode, for example, external parameters can be transmitted through a transposer, comparison can be carried out between the external parameters and expected values through a relation operator, and prompt information is popped up beyond the expected values to indicate that fields with verification failure exist in the filled acquisition templates after secondary verification.
According to the data acquisition method provided by the embodiment, after the target data are filled in the acquisition template, secondary verification is carried out on the target data in the acquisition template, and error target data in the acquisition template can be further checked through the secondary verification, so that the time for subsequent manual review is saved, the workload of a review person is reduced, and the review efficiency of the review person is further improved.
In one embodiment, the collection database includes an initial library table, a copy table and a target table, wherein the initial library table is used for storing initial collected target data, the copy table is used for storing target data to be checked, and the target table is used for storing the checked target data;
the duplicate table and the target table are generated by mapping when the initial library table is stored, the initial library table is generated based on the configuration of a user, and the field information in the duplicate table and the field information in the target table are the same as the field information in the initial library table;
the method further comprises the steps of:
when it is determined that the target table needs to be synchronized to a data warehouse, the target table is synchronized to the data warehouse with the data storage of the fields of the target table completed.
Specifically, the acquisition database comprises an initial library table, a copy table and a target table, wherein the initial library table is used for storing initially acquired target data, the copy table is used for storing target data to be checked, and the target table is used for storing the checked target data.
The target table and the copy table may be generated by self-construction, that is, an initial library table may be built according to requirements for each acquisition item in a table building page, and the initial library table may be automatically mapped into the target table and the copy table when saved to the acquisition database after being built, and the target table and the copy table may be distinguished by adding a suffix to the rear of the table name. When the initial library table is automatically mapped into a target table and a duplicate table during storage, the number of the mapped duplicate tables is determined according to a pre-configured acquisition flow, for example, when the acquisition flow is single person acquisition, the duplicate tables are mapped into one, and when the acquisition flow is double person acquisition, the duplicate tables are mapped into two.
Furthermore, when the duplicate tables for the same collection item are more than two, the duplicate tables can be compared with each other to check, namely, the duplicate tables are combined, when the fields in the duplicate tables are corresponding to the same, the accuracy of the collected data is higher, and when the fields in the duplicate tables are not fully corresponding to the same, the collected data is wrong. After the copy tables are compared with each other, the comparison result can be output to the rechecking personnel so as to prompt the rechecking personnel, and further, the efficiency of the rechecking personnel for manual rechecking is improved.
The self-built initial library table information comprises: the initial library table may include, for example, a neutral/english name of a field, each field type (e.g., a field is a character, a link, a timestamp, etc., without specific limitation herein), whether a primary key (i.e., whether the field is unique), whether it is empty (i.e., whether the field may be empty), a default value (i.e., whether there is an auto-fill default value when the field does not collect the corresponding target data), and an initial library table state (i.e., whether the initial library table is in an enabled state or a disabled state, the state being consistent with a state of a collection item corresponding to the initial library table, the initial library table being enabled when the collection item is in the enabled state, and the initial library table being disabled when the collection item is in the disabled state). It will be appreciated that an initial library table may correspond to one or more collection templates, and that fields in the collection templates need to be in one-to-one correspondence with the initial library table, so that all fields in the initial library table can be collected through the collection templates.
According to the data acquisition method provided by the embodiment, the initial library table is generated based on the configuration of the user, and compared with the library table developed by the developer based on the requirement, the configuration and modification of the initial library table can be performed by the user according to the requirement of the acquisition project, so that the flexibility is higher. Moreover, the library table developed by the developer based on the requirement needs a test period, the time consumption is long, and the initial library table generated by user configuration can be directly used, so that the time can be further saved.
In one embodiment, before the determining at least one acquisition task, the method further includes:
outputting a configuration interface, the configuration interface comprising a plurality of configuration components;
and responding to a configuration instruction, and determining target configuration content, wherein the target configuration content comprises the acquisition template, the preset rule and the initial library table.
Specifically, the method further supports configuration of the data acquisition process, for example, a configuration interface can be output to a configuration manager based on a configuration instruction input by the configuration manager, and the configuration interface includes a plurality of configuration components, for example, the method can include: the data collection system is not particularly limited herein, and includes an IMIX/FTP/FIX/SQL data source configuration component, an announcement library configuration component, a data warehouse query scope configuration component, a data acquisition configuration component, a collection item configuration component, an initial library table configuration component, a collection template configuration component, a preset rule configuration component, a collection flow configuration component, and the like.
The configuration manager can perform corresponding configuration based on a plurality of configuration components on the configuration page, and can determine target configuration content after responding to the configuration instructions of the configuration manager based on the configuration instructions input by the configuration manager.
It can be understood that for each collection item, the collection item needs to be configured first in the configuration page, so that new addition, modification and deletion of the collection item can be performed, and the collection item can be enabled or disabled. Acquisition item disable indicates that no data acquisition is needed for the acquisition item next, and acquisition item enable indicates that data acquisition is needed for the acquisition item. The specific information of the acquisition project comprises: collecting data sources (namely data sources), collecting templates, preset rules, whether data are synchronous, collecting flow, collecting project states and the like.
The following describes a data acquisition device provided by the present invention, and the data acquisition device described below and the data acquisition method described above can be referred to correspondingly.
Fig. 2 is a schematic structural diagram of a data acquisition device according to an embodiment of the present invention, and as shown in fig. 2, the data acquisition device 200 includes: the device comprises a determining module 210, a calling module 220, an acquisition module 230 and a saving module 240;
a determining module 210, configured to determine at least one acquisition task;
the calling module 220 is configured to call an acquisition template corresponding to each acquisition task, where the acquisition template is generated based on a template configuration instruction or is generated based on an acquisition item corresponding to the acquisition task;
The collection module 230 is configured to collect, for each collection template, target data from a data source corresponding to the collection template, and fill the target data into the collection template;
and the storage module 240 is configured to send the filled collection template to a client terminal corresponding to a rechecking person, and store the target data in the filled collection template to a collection database after receiving a rechecking pass instruction sent by the client terminal.
In one embodiment, the collecting module 230 is specifically configured to:
collecting target data from the data source, performing primary verification on the target data, and determining whether the target data accords with preset field information of each field in the collection template, wherein the preset field information at least comprises field type information and field unique information;
and filling the target data into corresponding fields in the acquisition template when the target data accords with the preset field information.
In one embodiment, the collecting module 230 is specifically configured to:
and when the target data at the data source is unstructured data, calling an identification model corresponding to the unstructured data to identify and grab the target data so as to acquire the target data.
In one embodiment, before the filled-in acquisition template is sent to the client terminal corresponding to the recheck person, the acquisition module 230 is further configured to:
and determining whether fields needing manual acquisition exist in the acquisition template, and filling corresponding fields based on operation instructions of acquisition personnel when the fields exist.
In one embodiment, the saving module 240 is specifically configured to:
and carrying out secondary verification on the filled acquisition template based on a preset rule, and after the secondary verification is passed, sending the filled acquisition template to the client terminal, wherein the preset rule is preset based on the acquisition requirement of acquisition items.
In one embodiment, the collection database includes an initial library table, a copy table and a target table, wherein the initial library table is used for storing initial collected target data, the copy table is used for storing target data to be checked, and the target table is used for storing the checked target data;
the duplicate table and the target table are generated by mapping when the initial library table is stored, the initial library table is generated based on the configuration of a user, and the field information in the duplicate table and the field information in the target table are the same as the field information in the initial library table;
The saving module 240 is further configured to:
when it is determined that the target table needs to be synchronized to a data warehouse, the target table is synchronized to the data warehouse with the data storage of the fields of the target table completed.
In one embodiment, the data acquisition device further comprises a configuration module for:
outputting a configuration interface, the configuration interface comprising a plurality of configuration components;
and responding to a configuration instruction, and determining target configuration content, wherein the target configuration content comprises the acquisition template, the preset rule and the initial library table.
According to the data acquisition device provided by the invention, the acquisition task is determined, the acquisition template corresponding to the acquisition task is called, the automatic acquisition of the target data is carried out on each acquisition template from the corresponding data source, the target data is filled in the acquisition template after the acquisition, and the target data in the filled acquisition template is stored in the acquisition database after the review personnel review, so that the automatic acquisition of the data is realized, and the personnel management cost is reduced.
Fig. 3 illustrates a physical schematic diagram of an electronic device, as shown in fig. 3, where the electronic device may include: processor 310, communication interface (Communications Interface) 320, memory 330 and communication bus 340, wherein processor 310, communication interface 320, memory 330 accomplish communication with each other through communication bus 340. The processor 310 may invoke logic instructions in the memory 330 to perform a data acquisition method comprising:
Determining at least one acquisition task;
calling an acquisition template corresponding to each acquisition task, wherein the acquisition template is generated based on a template configuration instruction or is generated based on an acquisition item corresponding to the acquisition task;
for each acquisition template, acquiring target data from a data source corresponding to the acquisition template, and filling the target data into the acquisition template;
and sending the filled acquisition template to a client terminal corresponding to a rechecking person, and storing target data in the filled acquisition template into an acquisition database after receiving a rechecking pass instruction sent by the client terminal.
Further, the logic instructions in the memory 330 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product comprising a computer program, the computer program being storable on a non-transitory computer readable storage medium, the computer program, when executed by a processor, being capable of performing the data collection method provided by the methods described above, the method comprising:
determining at least one acquisition task;
calling an acquisition template corresponding to each acquisition task, wherein the acquisition template is generated based on a template configuration instruction or is generated based on an acquisition item corresponding to the acquisition task;
for each acquisition template, acquiring target data from a data source corresponding to the acquisition template, and filling the target data into the acquisition template;
and sending the filled acquisition template to a client terminal corresponding to a rechecking person, and storing target data in the filled acquisition template into an acquisition database after receiving a rechecking pass instruction sent by the client terminal.
In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the data acquisition method provided by the above methods, the method comprising:
Determining at least one acquisition task;
calling an acquisition template corresponding to each acquisition task, wherein the acquisition template is generated based on a template configuration instruction or is generated based on an acquisition item corresponding to the acquisition task;
for each acquisition template, acquiring target data from a data source corresponding to the acquisition template, and filling the target data into the acquisition template;
and sending the filled acquisition template to a client terminal corresponding to a rechecking person, and storing target data in the filled acquisition template into an acquisition database after receiving a rechecking pass instruction sent by the client terminal.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A method of data acquisition, comprising:
determining at least one acquisition task;
calling an acquisition template corresponding to each acquisition task, wherein the acquisition template is generated based on a template configuration instruction or is generated based on an acquisition item corresponding to the acquisition task;
for each acquisition template, acquiring target data from a data source corresponding to the acquisition template, and filling the target data into the acquisition template;
and sending the filled acquisition template to a client terminal corresponding to a rechecking person, and storing target data in the filled acquisition template into an acquisition database after receiving a rechecking pass instruction sent by the client terminal.
2. The data collection method according to claim 1, wherein the collecting target data from the data source corresponding to the collection template and filling the target data into the collection template includes:
collecting target data from the data source, performing primary verification on the target data, and determining whether the target data accords with preset field information of each field in the collection template, wherein the preset field information at least comprises field type information and field unique information;
And filling the target data into corresponding fields in the acquisition template when the target data accords with the preset field information.
3. The data acquisition method of claim 2, wherein the acquisition of target data from the data source comprises:
and when the target data at the data source is unstructured data, calling an identification model corresponding to the unstructured data to identify and grab the target data so as to acquire the target data.
4. The method for collecting data according to claim 1, wherein before the step of transmitting the filled-in collection template to the client terminal corresponding to the recheck person, the method further comprises:
and determining whether fields needing manual acquisition exist in the acquisition template, and filling corresponding fields based on operation instructions of acquisition personnel when the fields exist.
5. The data collection method according to claim 1, wherein the sending the filled collection template to the client terminal corresponding to the recheck staff includes:
and carrying out secondary verification on the filled acquisition template based on a preset rule, and after the secondary verification is passed, sending the filled acquisition template to the client terminal, wherein the preset rule is preset based on the acquisition requirement of acquisition items.
6. The data acquisition method according to claim 5, wherein the acquisition database comprises an initial library table, a copy table and a target table, the initial library table is used for storing initial acquired target data, the copy table is used for storing target data to be checked, and the target table is used for storing target data after being checked;
the duplicate table and the target table are generated by mapping when the initial library table is stored, the initial library table is generated based on the configuration of a user, and the field information in the duplicate table and the field information in the target table are the same as the field information in the initial library table;
the method further comprises the steps of:
when it is determined that the target table needs to be synchronized to a data warehouse, the target table is synchronized to the data warehouse with the data storage of the fields of the target table completed.
7. The method of claim 6, wherein prior to determining the at least one acquisition task, further comprising:
outputting a configuration interface, the configuration interface comprising a plurality of configuration components;
and responding to a configuration instruction, and determining target configuration content, wherein the target configuration content comprises the acquisition template, the preset rule and the initial library table.
8. A data acquisition device, comprising:
the determining module is used for determining at least one acquisition task;
the calling module is used for calling the acquisition templates corresponding to the acquisition tasks, wherein the acquisition templates are generated based on template configuration instructions or are generated based on the acquisition items corresponding to the acquisition tasks;
the acquisition module is used for acquiring target data from a data source corresponding to each acquisition template and filling the target data into the acquisition templates;
the storage module is used for sending the filled collection template to a client terminal corresponding to a rechecking person, and storing target data in the filled collection template into the collection database after receiving a rechecking pass instruction sent by the client terminal.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the data acquisition method of any one of claims 1 to 7 when the program is executed by the processor.
10. A non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when executed by a processor, implements the data acquisition method according to any one of claims 1 to 7.
CN202410010013.0A 2024-01-03 2024-01-03 Data acquisition method, device, electronic equipment and storage medium Pending CN117851487A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410010013.0A CN117851487A (en) 2024-01-03 2024-01-03 Data acquisition method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410010013.0A CN117851487A (en) 2024-01-03 2024-01-03 Data acquisition method, device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117851487A true CN117851487A (en) 2024-04-09

Family

ID=90541383

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410010013.0A Pending CN117851487A (en) 2024-01-03 2024-01-03 Data acquisition method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117851487A (en)

Similar Documents

Publication Publication Date Title
CN109656934B (en) Source Oracle database DDL synchronization method and device based on log analysis
CN111737227B (en) Data modification method and system
CN111638908A (en) Interface document generation method and device, electronic equipment and medium
CN114780370A (en) Data correction method and device based on log, electronic equipment and storage medium
CN114416877A (en) Data processing method, device and equipment and readable storage medium
CN111435367A (en) Knowledge graph construction method, system, equipment and storage medium
CN111460241A (en) Data query method and device, electronic equipment and storage medium
CN113268232B (en) Page skin generation method and device and computer readable storage medium
CN113220588A (en) Automatic testing method, device and equipment for data processing and storage medium
CN117271645A (en) Test data processing method and device and computer readable storage medium
CN117851487A (en) Data acquisition method, device, electronic equipment and storage medium
CN116204428A (en) Test case generation method and device
CN116166533A (en) Interface testing method, device, terminal equipment and storage medium
CN115858352A (en) Cloud baffle test system calling system and method
CN115509637A (en) Form-based intelligent filling method, system, equipment and medium
US20220214899A1 (en) Integration flow execution renew
CN114386853A (en) Data auditing processing method, device and equipment based on universal auditing model
CN114819631A (en) Multitask visualization method and device, computer equipment and storage medium
CN114996246A (en) Data cleaning method for checking multiple fields of table based on NiFi assembly
CN111353116B (en) Content detection method, system and device, client device and storage medium
CN114115831A (en) Data processing method, device, equipment and storage medium
CN110807037B (en) Data modification method and device, electronic equipment and storage medium
CN116340411B (en) Data processing method and device
CN114115833B (en) Automatic protocol audit code generation method and device
CN114547404B (en) Big data platform system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination