CN112115145A - Data acquisition method and device, electronic equipment and storage medium - Google Patents

Data acquisition method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN112115145A
CN112115145A CN202010966723.2A CN202010966723A CN112115145A CN 112115145 A CN112115145 A CN 112115145A CN 202010966723 A CN202010966723 A CN 202010966723A CN 112115145 A CN112115145 A CN 112115145A
Authority
CN
China
Prior art keywords
data
field
data acquisition
value
acquisition model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202010966723.2A
Other languages
Chinese (zh)
Inventor
熊志国
张冕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Merchants Finance Technology Co Ltd
Original Assignee
China Merchants Finance Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Merchants Finance Technology Co Ltd filed Critical China Merchants Finance Technology Co Ltd
Priority to CN202010966723.2A priority Critical patent/CN112115145A/en
Publication of CN112115145A publication Critical patent/CN112115145A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention relates to the field of big data, and discloses a data acquisition method, which comprises the following steps: acquiring an original data set, carrying out preprocessing operation on the original data set to generate a standard data set, and storing the standard data set into a preset database; constructing a corresponding data acquisition model according to a data table in the preset database; configuring a target data acquisition model of the data acquisition model according to the field set of the data to be acquired and the acquisition timestamp; and acquiring standard data corresponding to the data to be acquired from the preset database within the time of the acquisition timestamp based on the target data acquisition model to obtain a target data set. The invention also provides a data acquisition device, an electronic device and a computer readable storage medium. The invention can improve the efficiency of data acquisition.

Description

Data acquisition method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of big data, and in particular, to a data acquisition method and apparatus, an electronic device, and a computer-readable storage medium.
Background
With the increasing development of big data, data acquisition becomes an indispensable process. The data acquisition method commonly used at present usually carries out data acquisition based on a data acquisition tool of a database language, but when the data acquisition tool is used for acquiring data, screening conditions need to be configured to acquire corresponding data, so that the problem that the database language can be packed by a plurality of layers is easily caused, and therefore when the data acquisition is carried out, all data can be firstly pulled, and then data screening is carried out one layer by one layer, so that the data acquisition efficiency is extremely low.
Disclosure of Invention
The invention provides a data acquisition method, a data acquisition device, electronic equipment and a computer-readable storage medium, and mainly aims to avoid the problem of system resource waste during data acquisition and improve the data acquisition efficiency.
In order to achieve the above object, the present invention provides a data acquisition method, including:
acquiring an original data set, carrying out preprocessing operation on the original data set to generate a standard data set, and storing the standard data set into a preset database;
constructing a corresponding data acquisition model according to a data table in the preset database;
acquiring a field set of data to be acquired, calculating a matching value between the field set and a field in the data acquisition model, and selecting the data acquisition model with the matching value larger than a preset threshold value;
acquiring an acquisition time stamp of the data to be acquired, and filling the acquisition time stamp into the selected data acquisition model to obtain a target data acquisition model;
and acquiring standard data corresponding to the data to be acquired from the preset database within the acquisition time stamp time based on the target data acquisition model to obtain a target data set.
Optionally, the performing a preprocessing operation on the original data set to generate a standard data set includes:
carrying out duplication removal operation on the original data set, and detecting whether a data missing value exists in the duplicated original data set;
if no data missing value exists, the original data set after the duplication removal is used as a standard data set;
and if the data missing value exists, filling the data missing value to obtain a standard data set.
Optionally, the populating the data missing value includes:
filling the data missing value by the following method:
Figure BDA0002682602460000021
wherein L (θ) represents a filled data missing value, xiRepresenting the ith data missing value, theta representing the probability parameter corresponding to the filled data missing value, n representing the number of the original data sets after the duplication removal, p (x)i| θ) represents the probability of the data missing value of the padding.
Optionally, the constructing a corresponding data acquisition model according to the data table in the preset database includes:
acquiring all data tables in the preset database, and clustering the data tables of the same type to obtain one or more initial data table sets;
creating a data table matrix of the initial set of data tables;
calculating the expected value of each initial data table in the data table matrix;
and taking the initial data table with the same expected value as a data acquisition model.
Optionally, the calculating an expected value of each initial data table in the data table matrix includes:
calculating an expected value for each initial data table in the data table matrix using:
Figure BDA0002682602460000022
wherein, CiIndicating the expected value of the ith initial data table in the data table matrix, EiRepresenting the eigenvectors of the ith initial data table in the data table matrix,
Figure BDA0002682602460000023
represents the eigenvector covariance of the ith initial data table in the data table matrix, and trace () represents the spatial filter function.
Optionally, the calculating matching values of the field set and the fields in the data acquisition model includes:
acquiring the same field of the field set and the field in the data acquisition model, and identifying the same field to obtain a target field set;
summarizing the field length of each field in the field set to obtain a first field length, summarizing the field length of each field in the data acquisition model to obtain a second field length, and summarizing the field length of each field in the target field set to obtain a third field length value;
calculating the ratio of the length value of the third field to the length value of the first field to obtain a first ratio, and calculating the ratio of the length value of the third field to the length value of the second field to obtain a second ratio;
and calculating the matching value of the field set and the field in the data acquisition model according to the first ratio and the second ratio.
Optionally, the filling the collection timestamp into the selected data collection model to obtain a target data collection model includes:
creating an object receiving script in the selected data acquisition model, and receiving the timestamp of the report to be generated by using the object receiving script to obtain a data acquisition object;
and transmitting the data acquisition object to the selected SQL configuration statement of the data acquisition model to obtain a target data acquisition model.
In order to solve the above problems, the present invention also provides a data acquisition apparatus comprising:
the system comprises a preprocessing module, a data processing module and a data processing module, wherein the preprocessing module is used for acquiring an original data set, carrying out preprocessing operation on the original data set to generate a standard data set, and storing the standard data set into a preset database;
the construction module is used for constructing a corresponding data acquisition model according to a data table in the preset database;
the calculation module is used for acquiring a field set of data to be acquired, calculating a matching value between the field set and a field in the data acquisition model, and selecting the data acquisition model with the matching value larger than a preset threshold value;
the filling module is used for acquiring the acquisition time stamp of the data to be acquired, and filling the acquisition time stamp into the selected data acquisition model to obtain a target data acquisition model;
and the acquisition module is used for acquiring standard data corresponding to the data to be acquired from the preset database within the acquisition time stamp time based on the target data acquisition model to obtain a target data set.
In order to solve the above problem, the present invention also provides an electronic device, including:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores computer program instructions executable by the at least one processor to implement the data acquisition method described above.
In order to solve the above problem, the present invention further provides a computer-readable storage medium, in which at least one computer program is stored, and the at least one computer program is executed by a processor in an electronic device to implement the data acquisition method described above.
The method comprises the steps of firstly preprocessing an acquired original data set to generate a standard data set, storing the standard data set into a preset database, ensuring the accuracy of data in the acquired standard data set and supporting the input of batch data; secondly, according to the data table in the preset database, the embodiment of the invention constructs the corresponding data acquisition model, thereby avoiding the configuration of screening conditions during the subsequent data acquisition and improving the efficiency of data acquisition; further, according to the field set of the data to be acquired and the acquisition timestamp, the embodiment of the invention configures the target data acquisition model of the data acquisition model to acquire the standard data corresponding to the data to be acquired from the preset database to obtain the target data set. Therefore, the data acquisition method, the data acquisition device, the electronic equipment and the storage medium can improve the data acquisition efficiency.
Drawings
Fig. 1 is a schematic flow chart of a data acquisition method according to an embodiment of the present invention;
FIG. 2 is a detailed schematic flow chart of one step of the data acquisition method provided in FIG. 1;
FIG. 3 is a detailed schematic flow chart of another step of the data acquisition method provided in FIG. 1;
fig. 4 is a schematic block diagram of a data acquisition device according to an embodiment of the present invention;
fig. 5 is a schematic diagram of an internal structure of an electronic device implementing a data acquisition method according to an embodiment of the present invention;
the implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The embodiment of the application provides a data acquisition method. The execution subject of the data acquisition method includes, but is not limited to, at least one of electronic devices that can be configured to execute the method provided by the embodiments of the present application, such as a server, a terminal, and the like. In other words, the data collection method may be executed by software or hardware installed in the terminal device or the server device, and the software may be a block chain platform. The server includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like.
Referring to fig. 1, a schematic flow chart of a data acquisition method according to an embodiment of the present invention is shown, in an embodiment of the present invention, the data acquisition method includes:
s1, acquiring an original data set, preprocessing the original data set to generate a standard data set, and storing the standard data set in a preset database.
In at least one embodiment of the present invention, the raw data set includes, but is not limited to: business data, financial data, and product data, among others. For example, in the insurance field, the service data may include: number of applications, amount of applications, type of applications, etc., the financial data may include: the fund proportion of the application, the profit of the application, the return rate of the application and the like, and the product data can comprise: a life insurance category, a car insurance category, an accident insurance category, and the like.
In the preferred embodiment of the present invention, the preprocessing operation includes deduplication and missing value padding.
Preferably, in the embodiment of the present invention, the original data set is subjected to a deduplication operation by using a distance formula, where the distance formula includes:
Figure BDA0002682602460000051
where d represents the distance value of any two data in the original data set, w1jAnd w2jRepresenting any two data in the original data set. When the distance value is smaller than the preset distance value, any one of the data is deleted, and if the distance value is not smaller than the preset distance value, the two data are simultaneously retained. Preferably, the preset distance value may be 0.1.
In an embodiment of the present invention, the missing values include: completely random deletions, and non-random deletions. In detail, the completely random deletion refers to completely random deletion of a variable deletion value independent of any other reason; the random deletion refers to that the deletion of a variable is related to other variables but not related to the value of the variable; the non-random absence refers to the absence of a variable in relation to the value of the variable itself.
Further, the invention detects whether the original data set after the duplication removal has a data missing value through a missmap function missing function, if the original data set after the duplication removal has no data missing value, the original data set after the duplication removal is not processed, and the original data set after the duplication removal is used as the standard data set.
If a data missing value occurs in the original data set after the duplication removal is detected, preferably, the data missing value is filled through a preset filling algorithm to obtain the standard data set. In detail, the preset filling algorithm includes:
Figure BDA0002682602460000052
wherein L (θ) represents a filled data missing value, xiRepresenting the ith data missing value, theta representing the probability parameter corresponding to the filled data missing value, n representing the number of the original data sets after the duplication removal, p (x)i| θ) represents the probability of the data missing value of the padding.
Based on the above embodiment, the method and the device provided by the invention can ensure the accuracy of the data in the obtained standard data set and support the input of batch data after preprocessing the original data set.
It is to be noted that, in order to increase the acquisition speed of the raw data, the embodiment of the present invention stores the standard data set in a preset database. Preferably, the preset database may be an Oracle database.
And S2, constructing a corresponding data acquisition model according to the data table in the preset database.
In at least one embodiment of the invention, the data collection model is used for querying the database to improve the efficiency of data collection.
In detail, referring to fig. 2, the S2 includes:
s20, acquiring all data tables in the preset database, and clustering the data tables of the same type by using a placeholder identification algorithm to obtain one or more initial data table sets;
s21, creating a data table matrix of the initial data table set;
s22, calculating the expected value of each initial data table in the data table matrix;
and S23, taking the initial data table with the same expected value as a data acquisition model.
In an alternative embodiment, the placeholder identification algorithm may be a k-means algorithm.
In the embodiment of the present invention, the expected value may be understood as an importance value of each initial data table in a corresponding data table matrix, and the expected value of each initial data table in the data table matrix is calculated by using the following method:
Figure BDA0002682602460000061
wherein, CiIndicating the expected value of the ith initial data table in the data table matrix, EiRepresenting the eigenvectors of the ith initial data table in the data table matrix,
Figure BDA0002682602460000062
represents the eigenvector covariance of the ith initial data table in the data table matrix, and trace () represents the spatial filter function.
And constructing a data acquisition model based on the placeholders, avoiding the configuration of screening conditions in the subsequent data acquisition process, and directly filling the data to be acquired into the corresponding data acquisition model to obtain the corresponding target data.
S3, acquiring a field set of the data to be acquired, calculating a matching value between the field set and a field in the data acquisition model, and selecting the data acquisition model with the matching value larger than a preset threshold value.
In at least one embodiment of the present invention, the field set of the data to be collected includes, but is not limited to, id, date, password, username, and data. For example, if the data to be collected is the above-mentioned service data, the field set included in the data to be collected may be: number, time, and uerame, etc.
In detail, referring to fig. 3, the calculating the matching values of the field set and the fields in the data collection model includes:
s30, obtaining the same field of the field set and the field in the data acquisition model, and identifying the same field to obtain a target field set;
s31, summarizing the field length of each field in the field set to obtain a first field length value, summarizing the field length of each field in the data acquisition model to obtain a second field length value, and summarizing the field length of each field in the target field set to obtain a third field length value;
s32, calculating the ratio of the length value of the third field to the length value of the first field to obtain a first ratio, and calculating the ratio of the length value of the third field to the length value of the second field to obtain a second ratio;
and S33, calculating the matching value of the field set and the field in the data acquisition model according to the first ratio and the second ratio.
The field set and the field in the data acquisition model are obtained through SQL query statements, for example, the field set has field number, the SQL query statements are used for carrying out field query on the data acquisition model, and if the field number is queried, the field number is identified.
In an embodiment of the present invention, the field length refers to the number of characters contained in the corresponding field, for example, if the field student contains 7 characters, the field degree of the field student is 7.
Optionally, in the embodiment of the present invention, the ratio of the length value of the third field to the length value of the first field is calculated by the following method:
Figure BDA0002682602460000071
where P1 denotes the first percentage, m denotes the third field length value, and n denotes the first field length value.
Optionally, in the embodiment of the present invention, the ratio of the length value of the third field to the length value of the second field is calculated by the following method:
Figure BDA0002682602460000072
where P2 denotes the second ratio, m denotes the third field length value, and t denotes the second field length value.
In one embodiment of the present invention, calculating matching values of the field set and fields in the data acquisition model according to the first and second ratios comprises: and taking the average value of the first ratio and the second ratio as a matching value of the field set and a field in the data acquisition model.
Optionally, in the embodiment of the present invention, the average value of the first ratio and the second ratio is calculated by using the following method:
Figure BDA0002682602460000073
wherein P represents an average of the first and second ratios.
Further, in a preferred embodiment of the present invention, a data acquisition model with the matching value greater than a preset threshold is selected, and optionally, the preset threshold is 0.6.
And S4, acquiring the acquisition time stamp of the data to be acquired, and filling the acquisition time stamp into the selected data acquisition model to obtain a target data acquisition model.
In at least one embodiment of the present invention, the collected data time stamps are different based on different requirements, for example, if the user a requirement is monthly service data, the corresponding collection time stamp is from the beginning of the month to the end of the month. Further, in the embodiment of the present invention, the acquisition timestamp is filled in the selected data acquisition model to obtain a target data acquisition model, so as to directly perform corresponding data acquisition in the preset database.
In detail, the step of filling the collection timestamp into the selected data collection model to obtain a target data collection model includes:
creating an object receiving script in the selected data acquisition model, and receiving the timestamp of the report to be generated by using the object receiving script to obtain a data acquisition object; and transmitting the data acquisition object to the selected SQL configuration statement of the data acquisition model by using the sheet object to obtain the target data acquisition model.
In a preferred embodiment, the object receiving script is created using JavaScript technology.
And S5, acquiring standard data corresponding to the data to be acquired from the preset database within the acquisition time stamp time based on the target data acquisition model to obtain a target data set.
In an embodiment of the present invention, the acquiring, based on the target data acquisition model, standard data corresponding to the data to be acquired from the preset database within the acquisition timestamp time to obtain a target data set includes:
and operating the SQL configuration statement of the target data acquisition model, and inquiring standard data corresponding to the data to be acquired in the preset database within the acquisition timestamp time to obtain a target data set.
In summary, in the embodiment of the present invention, firstly, a preprocessing operation is performed on an acquired original data set to generate a standard data set, and the standard data set is stored in a preset database, so that the accuracy of the data in the obtained standard data set is ensured and batch data entry can be supported; secondly, according to the data table in the preset database, the embodiment of the invention constructs the corresponding data acquisition model, thereby avoiding the configuration of screening conditions during the subsequent data acquisition and improving the efficiency of data acquisition; further, according to the field set of the data to be acquired and the acquisition timestamp, the embodiment of the invention configures the target data acquisition model of the data acquisition model to acquire the standard data corresponding to the data to be acquired from the preset database to obtain the target data set. Therefore, the data acquisition method provided by the invention can improve the data acquisition efficiency.
Fig. 4 is a functional block diagram of the data acquisition device according to the present invention.
The data acquisition device 100 of the present invention can be installed in an electronic device. Depending on the functionality implemented, the data acquisition apparatus may include a preprocessing module 101, a construction module 102, a calculation module 103, a population module 104, and an acquisition module 105. A module according to the present invention, which may also be referred to as a unit, refers to a series of computer program segments that can be executed by a processor of an electronic device and that can perform a fixed function, and that are stored in a memory of the electronic device.
In the present embodiment, the functions regarding the respective modules/units are as follows:
the preprocessing module 101 is configured to obtain an original data set, perform preprocessing operation on the original data set, generate a standard data set, and store the standard data set in a preset database.
In at least one embodiment of the present invention, the raw data set includes, but is not limited to: business data, financial data, and product data, among others. For example, in the insurance field, the service data may include: number of applications, amount of applications, type of applications, etc., the financial data may include: the fund proportion of the application, the profit of the application, the return rate of the application and the like, and the product data can comprise: a life insurance category, a car insurance category, an accident insurance category, and the like.
In the preferred embodiment of the present invention, the preprocessing operation includes deduplication and missing value padding.
Preferably, in the embodiment of the present invention, the preprocessing module 101 performs a deduplication operation on the original data set through a distance formula, where the distance formula includes:
Figure BDA0002682602460000091
where d represents the distance value of any two data in the original data set, w1jAnd w2jRepresenting any two data in the original data set. When the distance value is smaller than the preset distance value, any one of the data is deleted, and if the distance value is not smaller than the preset distance value, the two data are simultaneously retained. Preferably, the preset distance value may be 0.1.
In an embodiment of the present invention, the missing values include: completely random deletions, and non-random deletions. In detail, the completely random deletion refers to completely random deletion of a variable deletion value independent of any other reason; the random deletion refers to that the deletion of a variable is related to other variables but not related to the value of the variable; the non-random absence refers to the absence of a variable in relation to the value of the variable itself.
Further, the present invention detects whether the original data set after deduplication has a data missing value through a missmap function missing function, and if it is detected that the original data set after deduplication has no data missing value, the processing is not performed, and the preprocessing module 101 uses the original data set after deduplication as the standard data set.
If a data missing value occurs in the original data set after the duplication removal is detected, the preprocessing module 101 performs padding on the data missing value through a preset padding algorithm to obtain the standard data set. In detail, the preset filling algorithm includes:
Figure BDA0002682602460000092
wherein L (θ) represents a filled data missing value, xiRepresenting the ith data missing value, theta representing the probability parameter corresponding to the filled data missing value, n representing the number of the original data sets after the duplication removal, p (x)i| θ) represents the probability of the data missing value of the padding.
Based on the above embodiment, the method and the device provided by the invention can ensure the accuracy of the data in the obtained standard data set and support the input of batch data after preprocessing the original data set.
It is to be noted that, in order to increase the acquisition speed of the raw data, the embodiment of the present invention stores the standard data set in a preset database. Preferably, the preset database may be an Oracle database.
Further, it is emphasized that the standard data set may also be stored in a blockchain node in order to ensure privacy and security of the standard data set.
The building module 102 is configured to build a corresponding data acquisition model according to a data table in the preset database.
In at least one embodiment of the invention, the data collection model is used for querying the database to improve the efficiency of data collection.
In detail, the construction module 102 constructs the data collection model in the following manner:
step A, acquiring all data tables in the preset database, and clustering the data tables of the same type by using a placeholder identification algorithm to obtain one or more initial data table sets;
b, creating a data table matrix of the initial data table set;
step C, calculating the expected value of each initial data table in the data table matrix;
and D, taking the initial data table with the same expected value as a data acquisition model.
In an alternative embodiment, the placeholder identification algorithm may be a k-means algorithm.
In this embodiment of the present invention, the expected value may be understood as an importance value of each initial data table in a corresponding data table matrix, and preferably, the building module 102 calculates the expected value of each initial data table in the data table matrix by using the following method:
Figure BDA0002682602460000101
wherein, CiIndicating the expected value of the ith initial data table in the data table matrix, EiRepresenting the eigenvectors of the ith initial data table in the data table matrix,
Figure BDA0002682602460000102
represents the eigenvector covariance of the ith initial data table in the data table matrix, and trace () represents the spatial filter function.
And constructing a data acquisition model based on the placeholders, avoiding the configuration of screening conditions in the subsequent data acquisition process, and directly filling the data to be acquired into the corresponding data acquisition model to obtain the corresponding target data.
The calculation module 103 is configured to obtain a field set of data to be acquired, calculate a matching value between the field set and a field in the data acquisition model, and select the data acquisition model with the matching value being greater than a preset threshold.
In at least one embodiment of the present invention, the field set of the data to be collected includes, but is not limited to, id, date, password, username, and data. For example, if the data to be collected is the above-mentioned service data, the field set included in the data to be collected may be: number, time, and uerame, etc.
In detail, the calculation module 103 calculates matching values of the field set and the fields in the data collection model by adopting the following ways:
step a, acquiring the same field of the field set and the field in the data acquisition model, and identifying the same field to obtain a target field set;
b, summarizing the field length of each field in the field set to obtain a first field length value, summarizing the field length of each field in the data acquisition model to obtain a second field length value, and summarizing the field length of each field in the target field set to obtain a third field length value;
step c, calculating the ratio of the length value of the third field to the length value of the first field to obtain a first ratio, and calculating the ratio of the length value of the third field to the length value of the second field to obtain a second ratio;
and d, calculating the matching value of the field set and the field in the data acquisition model according to the first ratio and the second ratio.
The field set and the field in the data acquisition model are obtained through SQL query statements, for example, the field set has field number, the SQL query statements are used for carrying out field query on the data acquisition model, and if the field number is queried, the field number is identified.
In an embodiment of the present invention, the field length refers to the number of characters contained in the corresponding field, for example, if the field student contains 7 characters, the field degree of the field student is 7.
Optionally, the calculating module 102 calculates a ratio of the third field length value to the first field length value by using the following method:
Figure BDA0002682602460000111
where P1 denotes the first percentage, m denotes the third field length value, and n denotes the first field length value.
Optionally, the calculating module 102 calculates a ratio of the third field length value to the second field length value by using the following method:
Figure BDA0002682602460000112
where P2 denotes the second ratio, m denotes the third field length value, and t denotes the second field length value.
In an embodiment of the present invention, the calculating module 102 calculates matching values of the field set and fields in the data acquisition model according to the first and second ratios, including: and taking the average value of the first ratio and the second ratio as a matching value of the field set and a field in the data acquisition model.
Optionally, the calculating module 102 calculates an average value of the first ratio and the second ratio by using the following method:
Figure BDA0002682602460000113
wherein P represents an average of the first and second ratios.
Further, in a preferred embodiment of the present invention, a data acquisition model with the matching value greater than a preset threshold is selected, and optionally, the preset threshold is 0.6.
The filling module 104 is configured to obtain a collection timestamp of the data to be collected, and fill the collection timestamp into the selected data collection model to obtain a target data collection model.
In at least one embodiment of the present invention, the collected data time stamps are different based on different requirements, for example, if the user a requirement is monthly service data, the corresponding collection time stamp is from the beginning of the month to the end of the month. Further, in the embodiment of the present invention, the acquisition timestamp is filled in the selected data acquisition model to obtain a target data acquisition model, so as to directly perform corresponding data acquisition in the preset database.
In detail, the filling module 104 fills the collection timestamp into the selected data collection model to obtain a target data collection model, and executes the following steps:
creating an object receiving script in the selected data acquisition model, and receiving the timestamp of the report to be generated by using the object receiving script to obtain a data acquisition object; and transmitting the data acquisition object to the selected SQL configuration statement of the data acquisition model by using the sheet object to obtain the target data acquisition model.
In a preferred embodiment, the object receiving script is created using JavaScript technology.
The acquisition module 105 is configured to acquire, based on the target data acquisition model, standard data corresponding to the data to be acquired from the preset database within the acquisition time stamp time to obtain a target data set.
In an embodiment of the present invention, the acquiring module 105 acquires standard data corresponding to the data to be acquired from the preset database within the acquisition time stamp based on the target data acquisition model to obtain a target data set, and executes the following steps:
and operating the SQL configuration statement of the target data acquisition model, and inquiring standard data corresponding to the data to be acquired in the preset database within the acquisition timestamp time to obtain a target data set.
In summary, in the embodiment of the present invention, firstly, a preprocessing operation is performed on an acquired original data set to generate a standard data set, and the standard data set is stored in a preset database, so that the accuracy of the data in the obtained standard data set is ensured and batch data entry can be supported; secondly, according to the data table in the preset database, the embodiment of the invention constructs the corresponding data acquisition model, thereby avoiding the configuration of screening conditions during the subsequent data acquisition and improving the efficiency of data acquisition; further, according to the field set of the data to be acquired and the acquisition timestamp, the embodiment of the invention configures the target data acquisition model of the data acquisition model to acquire the standard data corresponding to the data to be acquired from the preset database to obtain the target data set. Therefore, the data acquisition device provided by the invention can improve the data acquisition efficiency. Fig. 5 is a schematic structural diagram of an electronic device implementing the data acquisition method according to the present invention.
The electronic device 1 may comprise a processor 10, a memory 11 and a bus, and may further comprise a computer program, such as a data acquisition program 12, stored in the memory 11 and operable on the processor 10.
The memory 11 includes at least one type of readable storage medium, which includes flash memory, removable hard disk, multimedia card, card-type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device 1, such as a removable hard disk of the electronic device 1. The memory 11 may also be an external storage device of the electronic device 1 in other embodiments, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the electronic device 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 11 may be used not only to store application software installed in the electronic device 1 and various types of data, such as codes for data collection, but also to temporarily store data that has been output or is to be output.
The processor 10 may be composed of an integrated circuit in some embodiments, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects various components of the electronic device by using various interfaces and lines, and executes various functions and processes data of the electronic device 1 by running or executing programs or modules (for example, executing data acquisition, etc.) stored in the memory 11 and calling data stored in the memory 11.
The bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. The bus is arranged to enable connection communication between the memory 11 and at least one processor 10 or the like.
Fig. 5 only shows an electronic device with components, and it will be understood by a person skilled in the art that the structure shown in fig. 5 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than shown, or a combination of certain components, or a different arrangement of components.
For example, although not shown, the electronic device 1 may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor 10 through a power management device, so as to implement functions of charge management, discharge management, power consumption management, and the like through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The electronic device 1 may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.
Further, the electronic device 1 may further include a network interface, and optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a bluetooth interface, etc.), which are generally used for establishing a communication connection between the electronic device 1 and other electronic devices.
Optionally, the electronic device 1 may further comprise a user interface, which may be a Display (Display), an input unit (such as a Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the electronic device 1 and for displaying a visualized user interface, among other things.
It is to be understood that the described embodiments are for purposes of illustration only and that the scope of the appended claims is not limited to such structures.
The data acquisition 12 stored by the memory 11 in the electronic device 1 is a combination of instructions that, when executed in the processor 10, enable:
acquiring an original data set, carrying out preprocessing operation on the original data set to generate a standard data set, and storing the standard data set into a preset database;
constructing a corresponding data acquisition model according to a data table in the preset database;
acquiring a field set of data to be acquired, calculating a matching value between the field set and a field in the data acquisition model, and selecting the data acquisition model with the matching value larger than a preset threshold value;
acquiring an acquisition time stamp of the data to be acquired, and filling the acquisition time stamp into the selected data acquisition model to obtain a target data acquisition model;
and acquiring standard data corresponding to the data to be acquired from the preset database within the acquisition time stamp time based on the target data acquisition model to obtain a target data set.
Specifically, the specific implementation method of the processor 10 for the instruction may refer to the description of the relevant steps in the embodiment corresponding to fig. 1, which is not described herein again.
Further, the integrated modules/units of the electronic device 1, if implemented in the form of software functional units and sold or used as separate products, may be stored in a non-volatile computer-readable storage medium. The computer-readable medium may include: any entity or device capable of carrying said computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM).
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof.
The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms second, etc. are used to denote names, but not any particular order.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims (10)

1. A method of data acquisition, the method comprising:
acquiring an original data set, carrying out preprocessing operation on the original data set to generate a standard data set, and storing the standard data set into a preset database;
constructing a corresponding data acquisition model according to a data table in the preset database;
acquiring a field set of data to be acquired, calculating a matching value between the field set and a field in the data acquisition model, and selecting the data acquisition model with the matching value larger than a preset threshold value;
acquiring an acquisition time stamp of the data to be acquired, and filling the acquisition time stamp into the selected data acquisition model to obtain a target data acquisition model;
and acquiring standard data corresponding to the data to be acquired from the preset database within the acquisition time stamp time based on the target data acquisition model to obtain a target data set.
2. The data acquisition method as set forth in claim 1, wherein said preprocessing the raw data set to generate a standard data set comprises:
carrying out duplication removal operation on the original data set, and detecting whether a data missing value exists in the duplicated original data set;
if no data missing value exists, the original data set after the duplication removal is used as a standard data set;
and if the data missing value exists, filling the data missing value to obtain a standard data set.
3. The data acquisition method as set forth in claim 2, wherein said populating the data deficiency value comprises:
filling the data missing value by the following method:
Figure FDA0002682602450000011
wherein L (θ) represents a filled data missing value, xiRepresenting the ith data missing value, theta representing the probability parameter corresponding to the filled data missing value, n representing the number of the original data sets after the duplication removal, p (x)i| θ) represents the probability of the data missing value of the padding.
4. The data acquisition method as claimed in claim 1, wherein the constructing of the corresponding data acquisition model according to the data table in the preset database comprises:
acquiring all data tables in the preset database, and clustering the data tables of the same type to obtain one or more initial data table sets;
creating a data table matrix of the initial set of data tables;
calculating the expected value of each initial data table in the data table matrix;
and taking the initial data table with the same expected value as a data acquisition model.
5. The data collection method of claim 4, wherein said calculating an expected value for each initial data table in said matrix of data tables comprises:
calculating an expected value for each initial data table in the data table matrix using:
Figure FDA0002682602450000021
wherein, CiIndicating the expected value of the ith initial data table in the data table matrix, EiRepresenting the eigenvectors of the ith initial data table in the data table matrix,
Figure FDA0002682602450000022
represents the eigenvector covariance of the ith initial data table in the data table matrix, and trace () represents the spatial filter function.
6. The data acquisition method as set forth in claim 1, wherein said calculating matching values for the set of fields to fields in the data acquisition model comprises:
acquiring the same field of the field set and the field in the data acquisition model, and identifying the same field to obtain a target field set;
summarizing the field length of each field in the field set to obtain a first field length, summarizing the field length of each field in the data acquisition model to obtain a second field length, and summarizing the field length of each field in the target field set to obtain a third field length value;
calculating the ratio of the length value of the third field to the length value of the first field to obtain a first ratio, and calculating the ratio of the length value of the third field to the length value of the second field to obtain a second ratio;
and calculating the matching value of the field set and the field in the data acquisition model according to the first ratio and the second ratio.
7. The data acquisition method as claimed in any one of claims 1 to 6, wherein the populating the acquisition time stamp into the selected data acquisition model to obtain a target data acquisition model comprises:
creating an object receiving script in the selected data acquisition model, and receiving the timestamp of the report to be generated by using the object receiving script to obtain a data acquisition object;
and transmitting the data acquisition object to the selected SQL configuration statement of the data acquisition model to obtain a target data acquisition model.
8. A data acquisition device, the device comprising:
the system comprises a preprocessing module, a data processing module and a data processing module, wherein the preprocessing module is used for acquiring an original data set, carrying out preprocessing operation on the original data set to generate a standard data set, and storing the standard data set into a preset database;
the construction module is used for constructing a corresponding data acquisition model according to a data table in the preset database;
the calculation module is used for acquiring a field set of data to be acquired, calculating a matching value between the field set and a field in the data acquisition model, and selecting the data acquisition model with the matching value larger than a preset threshold value;
the filling module is used for acquiring the acquisition time stamp of the data to be acquired, and filling the acquisition time stamp into the selected data acquisition model to obtain a target data acquisition model;
and the acquisition module is used for acquiring standard data corresponding to the data to be acquired from the preset database within the acquisition time stamp time based on the target data acquisition model to obtain a target data set.
9. An electronic device, characterized in that the electronic device comprises:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores computer program instructions executable by the at least one processor to enable the at least one processor to perform a data acquisition method as claimed in any one of claims 1 to 7.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out a data acquisition method as claimed in any one of claims 1 to 7.
CN202010966723.2A 2020-09-15 2020-09-15 Data acquisition method and device, electronic equipment and storage medium Withdrawn CN112115145A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010966723.2A CN112115145A (en) 2020-09-15 2020-09-15 Data acquisition method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010966723.2A CN112115145A (en) 2020-09-15 2020-09-15 Data acquisition method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112115145A true CN112115145A (en) 2020-12-22

Family

ID=73802099

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010966723.2A Withdrawn CN112115145A (en) 2020-09-15 2020-09-15 Data acquisition method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112115145A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112560789A (en) * 2020-12-28 2021-03-26 平安银行股份有限公司 Image object detection method and device, electronic equipment and storage medium
CN112685384A (en) * 2020-12-30 2021-04-20 平安普惠企业管理有限公司 Data migration method and device, electronic equipment and storage medium
CN113325797A (en) * 2021-06-11 2021-08-31 中山凯旋真空科技股份有限公司 Data acquisition method and device for control equipment, storage medium and electronic equipment
CN113360723A (en) * 2021-07-08 2021-09-07 北京智思迪科技有限公司 Data acquisition method and device
CN114328302A (en) * 2021-12-28 2022-04-12 威创集团股份有限公司 Multi-host input control method, system, equipment and storage medium

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112560789A (en) * 2020-12-28 2021-03-26 平安银行股份有限公司 Image object detection method and device, electronic equipment and storage medium
CN112560789B (en) * 2020-12-28 2024-06-04 平安银行股份有限公司 Image object detection method, device, electronic equipment and storage medium
CN112685384A (en) * 2020-12-30 2021-04-20 平安普惠企业管理有限公司 Data migration method and device, electronic equipment and storage medium
CN113325797A (en) * 2021-06-11 2021-08-31 中山凯旋真空科技股份有限公司 Data acquisition method and device for control equipment, storage medium and electronic equipment
CN113325797B (en) * 2021-06-11 2022-07-12 中山凯旋真空科技股份有限公司 Data acquisition method and device for control equipment, storage medium and electronic equipment
CN113360723A (en) * 2021-07-08 2021-09-07 北京智思迪科技有限公司 Data acquisition method and device
CN114328302A (en) * 2021-12-28 2022-04-12 威创集团股份有限公司 Multi-host input control method, system, equipment and storage medium
CN114328302B (en) * 2021-12-28 2023-10-10 威创集团股份有限公司 Multi-host input control method, system, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN112115145A (en) Data acquisition method and device, electronic equipment and storage medium
CN112052370A (en) Message generation method and device, electronic equipment and computer readable storage medium
CN112115143A (en) Automatic data updating and synchronizing method and device, electronic equipment and storage medium
CN113434674A (en) Data analysis method and device, electronic equipment and readable storage medium
CN112115152A (en) Data increment updating and querying method and device, electronic equipment and storage medium
CN112883042A (en) Data updating and displaying method and device, electronic equipment and storage medium
CN114491047A (en) Multi-label text classification method and device, electronic equipment and storage medium
CN112580079A (en) Authority configuration method and device, electronic equipment and readable storage medium
CN114979120A (en) Data uploading method, device, equipment and storage medium
CN112579621A (en) Data display method and device, electronic equipment and computer storage medium
CN114185895A (en) Data import and export method and device, electronic equipment and storage medium
CN112699142A (en) Cold and hot data processing method and device, electronic equipment and storage medium
CN113327136A (en) Attribution analysis method and device, electronic equipment and storage medium
CN112256783A (en) Data export method and device, electronic equipment and storage medium
CN112949278A (en) Data checking method and device, electronic equipment and readable storage medium
CN113434542A (en) Data relation identification method and device, electronic equipment and storage medium
CN112948380A (en) Data storage method and device based on big data, electronic equipment and storage medium
CN112541688A (en) Service data checking method and device, electronic equipment and computer storage medium
CN112685384A (en) Data migration method and device, electronic equipment and storage medium
CN112637341A (en) File uploading method and device, electronic equipment and storage medium
CN115033605A (en) Data query method and device, electronic equipment and storage medium
CN114547011A (en) Data extraction method and device, electronic equipment and storage medium
CN114185588A (en) Incremental package generation method, device, equipment and storage medium
CN114611477A (en) Design recommendation method and device for data table, electronic equipment and medium
CN113343103A (en) Report form pushing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20201222