CN113094393A

CN113094393A - Data aggregation method and device and electronic equipment

Info

Publication number: CN113094393A
Application number: CN202110281432.4A
Authority: CN
Inventors: 张洪彬; 褚占峰; 叶姣荣
Original assignee: Hangzhou Dt Dream Technology Co Ltd
Current assignee: Hangzhou Dt Dream Technology Co Ltd
Priority date: 2021-03-16
Filing date: 2021-03-16
Publication date: 2021-07-09
Anticipated expiration: 2041-03-16
Also published as: CN113094393B

Abstract

The embodiment of the application provides a data aggregation method and device and electronic equipment. The method comprises the following steps: receiving a data aggregation request sent by a first client; wherein the data aggregation request comprises an aggregation parameter; responding to the data aggregation request, generating a data directory and a data standard based on the aggregation parameters, creating a data table for storing aggregated data in the data directory, and generating a data aggregation task corresponding to a data provider; acquiring to-be-aggregated data which is uploaded by a second client corresponding to the data provider and is determined based on the data aggregation task and accords with the data standard; and establishing a data aggregation channel which takes the data to be aggregated as a source end and the data table as a destination end, and writing the data to be aggregated into the data table based on the data aggregation channel.

Description

Data aggregation method and device and electronic equipment

Technical Field

The embodiment of the application relates to the technical field of internet, in particular to a data aggregation method and device and electronic equipment.

Background

With the wide application of big data, the data needs to be transferred and exchanged between different departments increasingly. It is common in practical applications that an upper department or unit collects data of a plurality of lower departments or units, or collects data of other departments in a horizontal direction across the departments. Such bodies of different departments or units are generally referred to as different data fields.

Because data isolation exists between different data domains, data cannot be directly acquired from different data domains through a conventional means, and data of different data domains cannot be directly gathered together.

In the related art, data exchange of different data domains can be generally solved through a data sharing interactive platform. When needing to use data of other subjects, a data demand side needs to log in a data sharing interaction platform and find a portal (such as a homepage) of a data provider in the data sharing interaction platform; therefore, data is retrieved from the portal and applied, and the data of the data provider can be used by the data demander after the data provider authorizes the data demander. When the data demand side needs to use the data of other multiple departments, the data needs to be retrieved and applied to the portals of different departments, and then the data acquired from all the departments is gathered. Different departments may adopt different data standards, data acquired from each department cannot be used directly, and data of different data standards needs to be converted into data of the same data standard before data aggregation, so that the data aggregation efficiency is low.

Disclosure of Invention

The embodiment of the specification provides a data aggregation method and device, and an electronic device:

according to a first aspect of embodiments herein, there is provided a data aggregation method, the method comprising:

receiving a data aggregation request sent by a first client; wherein the data aggregation request comprises an aggregation parameter;

responding to the data aggregation request, generating a data directory and a data standard based on the aggregation parameters, creating a data table for storing aggregated data in the data directory, and generating a data aggregation task corresponding to a data provider;

acquiring to-be-aggregated data which is uploaded by a second client corresponding to the data provider and is determined based on the data aggregation task and accords with the data standard;

and establishing a data aggregation channel which takes the data to be aggregated as a source end and the data table as a destination end, and writing the data to be aggregated into the data table based on the data aggregation channel.

Optionally, the data provider includes a data provider specified in the aggregation parameter;

the generating of the data aggregation task corresponding to the data provider comprises:

and generating a corresponding data aggregation task aiming at the data provider appointed in the aggregation parameters.

Optionally, the method further includes:

based on a task distribution mechanism, pushing a corresponding data aggregation task to the data provider; so that the data provider determines the data to be aggregated from the local data based on the pushed data aggregation task.

Optionally, the method further includes:

if the task distribution mechanism does not exist, the data provider logs in a portal of the data demander to inquire whether a data aggregation task to be executed exists in the data directory or not; applying for the data demand party after the data convergence task to be executed is inquired; and after the data demand side applies for the data demand side, the data provider side determines the data to be converged from the local data based on the data convergence task to be executed.

Optionally, the generating a data directory and a data standard based on the aggregation parameter includes:

generating a data directory based on the directory attributes in the convergence parameters;

generating a data criterion based on a criterion field in the aggregation parameter.

Optionally, the creating a data table for storing aggregated data in the data directory includes:

configuring a database associated with the data directory based on a database address specified in the convergence parameter;

creating a data table of the database in the data directory based on a standard field in the aggregation parameter; wherein, the field in the data table is the standard field.

Optionally, the data to be aggregated conforming to the data standard is composed of the standard field and a local data value corresponding to the standard field;

the writing the data to be aggregated into the data table based on the data aggregation channel includes:

and writing local data corresponding to the standard fields in the data to be aggregated into the field values of the same standard fields in the data table based on the data aggregation channel.

Optionally, before the acquiring the to-be-aggregated data which is uploaded by the second client corresponding to the data provider and is determined based on the data aggregation task and meets the data standard, the method further includes:

receiving source data uploaded by a second client corresponding to the data provider, wherein the source data comprises a local field of the data provider;

checking a field of a local field of the source data based on a standard field of the data standard;

and issuing a convergence instruction to the second client after the verification is passed so that the second client determines the data to be converged which accord with the data standard from the local data based on the data convergence task.

Optionally, the field checking of the local field of the source data based on the standard field of the data standard includes:

establishing a mapping relation between a standard field and a local field with the same meaning through a field matching algorithm;

after the mapping relation between all local fields and the standard field is established, the verification is determined to be passed;

based on the data aggregation channel, writing data corresponding to a local field in the data to be aggregated into a field value of a standard field of the data table, wherein the local field is mapped in the data table.

Optionally, before the establishing a data aggregation channel that takes the data to be aggregated as a source end and the data table as a destination end, and writing the data to be aggregated into the data table based on the data aggregation channel, the method further includes:

if the carrier type of the data to be aggregated is different from the carrier type of the data table, converting the carrier type of the data to be aggregated into the carrier type of the data table based on a data conversion tool.

Optionally, the writing the data to be aggregated into the data table based on the data aggregation channel includes:

in the process of writing the data to be converged into the data table, detecting the data to be converged according to a value range corresponding to a standard field;

and writing the converged data meeting the value range into the data table.

Optionally, the method further includes:

responding to an aggregated task query request initiated by the first client or the second client;

and returning the progress information of the convergence task to the first client or the second client.

Optionally, the aggregation parameter is generated after a data demander corresponding to the first client triggers a shortcut option displayed in an operation interface.

According to a second aspect of embodiments herein, there is provided a data aggregation apparatus, the apparatus comprising:

the receiving unit is used for receiving a data aggregation request sent by a first client; wherein the data aggregation request comprises an aggregation parameter;

the response unit is used for responding to the data aggregation request, generating a data directory and a data standard based on the aggregation parameters, creating a data table for storing aggregated data in the data directory and generating a data aggregation task corresponding to a data provider;

the acquisition unit acquires to-be-aggregated data which is uploaded by a second client corresponding to the data provider and is determined based on the data aggregation task and accords with the data standard;

the convergence unit is used for establishing a data convergence channel which takes the data to be converged as a source end and the data table as a destination end, and writing the data to be converged into the data table based on the data convergence channel.

in the response unit, generating a data aggregation task corresponding to a data provider includes:

Optionally, the apparatus further comprises:

the processing unit is used for pushing a corresponding data aggregation task to the data provider based on a task distribution mechanism; so that the data provider determines the data to be aggregated from the local data based on the pushed data aggregation task.

Optionally, the processing unit further includes: if the task distribution mechanism does not exist, the data provider logs in a portal of the data demander to inquire whether a data aggregation task to be executed exists in the data directory or not; applying for the data demand party after the data convergence task to be executed is inquired; and after the data demand side applies for the data demand side, the data provider side determines the data to be converged from the local data based on the data convergence task to be executed.

Optionally, in the response unit, generating a data directory and a data standard based on the aggregation parameter includes:

Optionally, in the response unit, creating a data table for storing aggregated data in the data directory, where the data table includes:

configuring a database associated with the data directory based on a database address specified in the convergence parameter; creating a data table of the database in the data directory based on a standard field in the aggregation parameter; wherein, the field in the data table is the standard field.

in the aggregation unit, writing the data to be aggregated into the data table based on the data aggregation channel includes:

Optionally, before the obtaining unit, the apparatus further includes:

the receiving subunit is used for receiving source data uploaded by a second client corresponding to the data provider, wherein the source data comprises a local field of the data provider;

the checking subunit is used for checking a field of the local field of the source data based on the standard field of the data standard;

and the issuing subunit issues a convergence instruction to the second client after the verification is passed so that the second client determines the data to be converged meeting the data standard from the local data based on the data convergence task.

Optionally, the verifying subunit includes:

establishing a mapping relation between a standard field and a local field with the same meaning through a field matching algorithm; after the mapping relation between all local fields and the standard field is established, the verification is determined to be passed;

Optionally, before the convergence unit, the apparatus further includes:

and the conversion unit is used for converting the carrier type of the data to be aggregated into the carrier type of the data table based on a data conversion tool if the carrier type of the data to be aggregated is different from the carrier type of the data table.

Optionally, the aggregation unit further includes:

and writing the converged data meeting the value range into the data table.

Optionally, the apparatus further comprises:

the query unit responds to a convergence task query request initiated by the first client or the second client; and returning the progress information of the convergence task to the first client or the second client.

According to a third aspect of embodiments herein, there is provided an electronic apparatus including:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured as any one of the data aggregation methods described above.

The embodiment of the specification provides a data aggregation scheme, and a data directory technology is utilized to provide a function of customizing a data standard for a data demand party; through the data standard defined by the data demand side, the data provider can provide the data to be aggregated which accords with the data standard; because each data to be converged before data convergence conforms to the same data standard, the data quality is higher, and additional data conversion is not needed, the data convergence efficiency can be improved.

On the other hand, by providing the rapid convergence setting for the data demander, the data demander can complete the configuration of the data catalog, the definition of the data standard and the designation of the data provider only by selecting a rapid option in the operation interface. And then, the convergence service system fully and automatically performs data convergence, so that the data convergence efficiency is greatly improved.

Drawings

FIG. 1 is a schematic diagram of a data sharing switching platform provided by an embodiment of the present specification;

FIG. 2 is a schematic diagram of data aggregation for multiple departments provided by an embodiment of the present disclosure;

fig. 3 is a schematic diagram of a data convergence service system provided in an embodiment of the present specification;

FIG. 4 is a flow chart of a data aggregation method for multi-party interaction provided by an embodiment of the present specification;

FIG. 5 is a schematic diagram of an operation interface provided in an embodiment of the present disclosure;

fig. 6 is a flowchart of a method for using a server as an execution subject according to an embodiment of the present disclosure;

fig. 7 is a hardware configuration diagram of a data aggregation apparatus provided in an embodiment of the present specification;

fig. 8 is a block diagram of a data aggregation device according to an embodiment of the present disclosure.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present specification. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the specification, as detailed in the appended claims.

The terminology used in the description herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the description. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, etc. may be used herein to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, the first information may also be referred to as second information, and similarly, the second information may also be referred to as first information, without departing from the scope of the present specification. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

As mentioned above, with the wide application of big data, the need for data to be transferred and exchanged between different departments is increasing. It is common in practical applications that an upper department or unit collects data of a plurality of lower departments or units, or collects data of other departments in a horizontal direction across the departments. Such bodies of different departments or units are generally referred to as different data fields.

A schematic diagram of a data sharing switching platform as shown in figure 1. When needing to use data of other subjects, a data demand side needs to log in a data sharing interaction platform and find a portal (such as a homepage) of a data provider in the data sharing interaction platform; and the data is searched in the portal and applied, the data provider audits the data, the required data is issued to the data sharing exchange platform after the audit is passed, and the data sharing exchange platform sends the data to the data demander.

When a data consumer needs to use data from multiple data providers, a data aggregation operation involving multiple data domains is involved. A schematic diagram of data aggregation for multiple departments is shown in fig. 2. When the data demander needs to use the data of a plurality of departments (3 departments, i.e., the door A, B and the door C in the middle of fig. 2), the data demander needs to search the portals of different departments to apply for data retrieval, and then, the data acquired from each department are gathered. The data demand side not only needs to query the portal for many times and apply for many times, but also needs to search the required data in the massive data of the portal, and all the operations involve a large amount of offline work, so that for the data demand side, the operation is complex and error is easy to occur during data aggregation, and the data aggregation efficiency is low.

In addition, different departments may employ different data standards, for example, department a in fig. 2 employs database tables to store data, and department B, C employs file record data; and the data fields stored by different departments also differ. The data acquired from each department cannot be directly used, so that the quality of the collected data is poor; therefore, data of different data standards need to be converted into data of the same data standard before data aggregation, which results in low data aggregation efficiency.

In order to solve the above problems, the present application provides a data aggregation scheme based on a data directory. The data directory technology is utilized, and a function of customizing a data standard is provided for a data demand party; through the data standard defined by the data demand side, the data provider can provide the data to be aggregated which accords with the data standard; because each data to be converged before data convergence conforms to the same data standard, the data quality is higher, and additional data conversion is not needed, the data convergence efficiency can be improved.

The database can be used only by professional skills, for example, the database language needs to be learned to operate the database; therefore, there is a technical threshold for the average person to use the database. Data catalogs are a visualization technique for exposing databases in the form of catalogs (e.g., folders). The data tables in the data directory correspond to the database tables in the database, and the fields in the data tables are also the fields in the database tables. The data directory can enable users to quickly know the data stored in the database. For example, it may be known which database tables the database has, which fields each table has, what meaning is represented, which data is under each field, and so on.

The convergence service system creates a data directory associated with the database, issues a data standard, creates a data table according to the data standard, and generates a data convergence task corresponding to each data provider based on the convergence parameters of the data demanders.

And each data provider uploads the data to be aggregated which accord with the data standard in the local data according to the data aggregation task, and the aggregation service system automatically maps the data to be aggregated into the data table one by one.

In addition, the convergence service system can also perform pre-check on the field of the local data of the data provider before the data provider uploads the data to be converged, so as to ensure that the data to be converged can be mapped to the correct data table field.

A schematic diagram of the converged service system shown in fig. 3. When the data demander needs to use data of a plurality of data providers (3 departments, such as the department A, B and C shown in fig. 3), the data demander only needs to designate the 3 departments as data providers in the aggregated parameters submitted to the aggregated service system; then, the convergence service system can automatically distribute the corresponding data convergence tasks to the 3 parts; then, each department actively submits the data, and finally, the convergence service system converges the data uploaded by each part into a data table. This data table is also generated by the converged service system in a data destination specified by the data consumer.

Thus, the data demander does not need to perform a great deal of offline work, such as accessing a portal of each data provider and retrieving required data in the portal; only the convergence parameters need to be submitted, and other convergence service systems are used for completing the operation, so that the operation content of a data demand party is greatly reduced, and the data convergence efficiency is greatly improved through the automatic execution of the convergence service.

Refer to fig. 4 for a flow chart of a data aggregation method for multi-party interaction. The multiple parties may include a server, a first client, and a second client; the server is coupled with the first client and the second client for data interaction.

The server is the server of the convergence service system and is used for providing data convergence service. The data aggregation service may include creating a data directory for a data demander, defining data standards, distributing a data aggregation task, and aggregating data to be aggregated uploaded by a plurality of data providers.

Each data provider provides data pre-check service to help the data providers to judge whether the data to be aggregated meet the data standard in advance; and converting different data carriers uploaded by each data provider into a uniform data carrier so as to realize data aggregation.

And providing a converged service progress query service for a data provider or a data demander so as to visually display the progress of data convergence. And scoring the aggregation result of the data providers, and ranking the data providers to visually represent the service states of the data providers.

The server may refer to a server, a server cluster, or a cloud platform constructed by the server cluster.

The first client may refer to a terminal device of a data demander, a software program corresponding to the data convergence service provided by the server is installed on the terminal device, and the data demander may request data convergence through the software program.

The second client may refer to a terminal device of a data provider, where a software program corresponding to the data aggregation service provided by the server is also installed on the terminal device, and the provider may provide data required for data aggregation through the software program.

Specifically, the method shown in fig. 4 may include the following steps:

step 110: the first client side obtains the aggregation parameters determined by the data provider side.

Generally, a data requiring party can log in the converged service system and edit the converged parameters in an operation interface of the converged service. Reference is now made to the schematic illustration of the operator interface shown in fig. 5.

The data demand side can edit the convergence parameters in the data convergence operation interface and input the related content of each convergence parameter.

As shown in fig. 5, the "convergence parameters" displayed under the "online edit" function may include:

and the information resource name is used for describing the characteristics of the data required by the data demander so as to facilitate the retrieval, positioning and acquisition of the required data.

The information resource English name is an English name corresponding to the information resource name and is used for conveniently searching, positioning and acquiring the required data in an English mode.

A category for describing a classification of the data directory.

A data source for identifying a source of the desired data.

And the data provider is used for specifying the owner of the required data.

And the data resource abstract is used for displaying the summary information of the required data.

And the information resource format classification is used for describing the format classification of the required data, such as files, databases, API services and the like.

And the information resource format type is used for describing the format type of the required data, such as json, KV and the like.

And the release date is used for describing the release time of the data directory.

And the sharing type is used for describing the sharing type of the shared data, such as conditional sharing, unconditional sharing, no sharing and the like.

And the sharing condition is used for describing the constraint of sharing the shared data.

And the sharing mode classification is used for describing the sharing mode classification of the shared data, such as a sharing platform mode, an email or medium mode, a file mode and the like.

And the sharing mode type is used for describing the sharing mode type of the shared data, such as a database, an interface and the like.

Whether the data is open to the outside or not is used for describing whether the data is visible to the outside or not.

And the open condition is used for describing the condition that the data is visible to the outside.

The convergence parameter shown in fig. 5 is only an example and is not a specific limitation thereof.

It is worth mentioning that, as shown in fig. 5, in order to simplify the operation difficulty of the user, the operation interface may display the shortcut option, and the data demanding party triggers the shortcut option to determine the corresponding convergence parameter. The data demander can select or fill in the convergence parameters in a pull-down mode. And the first client automatically generates corresponding convergence parameters after the shortcut option is triggered by the data demand side. In the shortcut option of the pull-down mode, the option content may be configured in advance by a system developer.

Take the shortcut option of the data provider as an example. Developers can register accounts for departments with data aggregation requirements on the aggregation service system (certainly, the developers can register accounts by each part of the system); for departments with upper and lower subordination relations, the association relation between the upper department and the lower department can be established. In this way, after the upper-level department as a data demander enters the interface of fig. 5 and clicks the shortcut option of the data provider, the options of the lower-level departments associated with the department can be displayed in a pull-down manner.

Generally, the shortcut option of the data provider can support multiple options. A plurality of selected data providers may be deposited into the data list.

It is worth mentioning that in some scenarios, the data demander may also fill in the data provider that has no association with the data demander. Such as a cross-department data aggregation scenario with no superior and inferior membership.

Step 112: a first client sends a data aggregation request to a server; the data aggregation request includes the aggregation parameter.

In response to a sending action of the data demander (e.g., clicking a "submit" button in fig. 5), the first client may add all aggregation parameters in the operation interface to the data aggregation request, and send the data aggregation request to the server.

Step 120: and the server generates a data directory and a data standard.

The data demander can also define data standards in a visual operation interface through the form of data items. For example, the data items may include the contents shown in table 1 below:

data item name	English name	Data type	Whether or not to make a key	Data length	Description of the invention
						Name (I)	Name	Character string type C	Whether or not	10	Name (I)
Age (age)	Age	Numerical type N	Whether or not	10	Age (age)
						Sex	Sex	Character string type C	Whether or not	--	Sex
Body weight	Weight	Numerical type N	Whether or not	--	Body weight
						Height of a person	Height	Numerical type N	Whether or not	--	Height of a person
Home address	Address	Character string type C	Whether or not	10	Home address

TABLE 1

As shown in table 1, the data item may include information such as a name of the data item, an english name (i.e., a field) of the data item, a data type, whether a primary key is present, a data length, and a description. By defining a data standard, a data provider may be prepared for data that conforms to the data standard.

The data item can also be used as a convergence parameter to be sent to the server along with the data convergence request.

Correspondingly, after receiving the data aggregation request, the server responds to the data aggregation request and generates a data directory based on the aggregation parameters. Specifically, the server may create a data directory of the directory name at the directory address based on the directory name and the directory address specified in the aggregation parameter.

Further, after the server creates the data directory, the server may configure a database associated with the data directory based on the database address specified in the aggregation parameter. Thus, the association relationship between the data directory and the database is established, and preparation is made for the subsequent steps.

In addition, the server can also generate data standard based on the standard field defined by the data item in the aggregation parameter.

Step 122: the server creates a data table.

The server side can create a data table of the database in the data directory according to the data standard; wherein, the table name of the data table can be English abbreviation of the data directory; the fields in the database are the standard fields, for example, the fields corresponding to the aforementioned table 1 are established in the database.

Because the data directory is associated with the database, and the data directory after the database is associated is the visualized display content corresponding to the database in the form of a directory (such as a folder). Thus, creating a data table in the data object is equivalent to creating a data table in the database, which can actually be understood to be located in the database.

Step 124: and the server generates a data aggregation task corresponding to the data provider.

As previously described, the aggregated parameters may include a data provider specified by a data consumer. And the server generates a corresponding data aggregation task for the data provider specified in the aggregation parameters.

It is worth mentioning that generating the data aggregation requires any data provider to be obtained, and if there is no data provider in the aforementioned aggregation parameters (i.e. the data provider is not specified by the requester), then the specified data provider needs to be obtained to the data requester before step 124 is executed.

Further, the server side can push the corresponding data aggregation tasks to the data provider; so that the data provider determines the data to be aggregated from the local data based on the pushed data aggregation task.

If the task distribution mechanism does not exist, the data provider can log in a portal of the data demander to inquire whether a data aggregation task to be executed exists in the data directory or not; applying for the data demand party after the data convergence task to be executed is inquired; and after the data demand side applies for the data demand side, the data provider side determines the data to be converged from the local data based on the data convergence task to be executed.

Through the convergence service system, both the data demand side and the data supply side can log in and access the system through respective accounts. And the data provider can log in a portal of the data demander to inquire whether the data directory has a data aggregation task to be executed or not.

Since the data in each data field relates to the data security of the corresponding subject. Particularly, data of lower departments need to be checked by upper departments when the lower departments exit the field; therefore, when a task distribution mechanism is not provided, even if a data provider inquires that a data aggregation task related to the data provider exists, the data cannot be uploaded autonomously due to the fact that the data provider is not authorized by an upper-level department. For this purpose, the data provider may issue an application to the data demander, which is audited by the data demander to authorize the data provider to export data in the data domain.

After the data requiring party passes the audit, the data providing party can execute the data aggregation task to be executed to determine the data to be aggregated from the local data, and then the data to be aggregated is uploaded to the aggregation service system.

Step 130: and the second client uploads the source data to the server.

Before uploading the data to be aggregated, the data provider can check whether the local fields of the locally stored data can be in one-to-one correspondence with the standard fields through a data pre-check service provided by the aggregation service system. Specifically, the data provider may upload a local field of the local data as the source data to the server.

Step 132: and the server side checks the local field of the source data based on the data standard.

The server can establish the mapping relation between the standard field and the local field with the same meaning through a field matching algorithm.

For example, the standard field and the local field with the same meaning are determined through algorithms such as exact matching, fuzzy matching, similar similarity matching and the like; and then the standard fields and the local fields with the same meanings are mapped.

Screening out local fields without mapping relation and returning the local fields to a data provider, and reorganizing the local fields by the data provider, for example, adding field meaning to the local fields, or modifying the local fields into standard fields with the same meaning based on issued standard fields; then, uploading the data and checking the field. And calculating that the verification is passed until all the local fields establish a mapping relation with the standard field.

It is worth mentioning that, because the carrier types of the data sources of the data providers may be diversified, some data providers record data by using a database, and some data providers record data by using a file. Thus, the data carrier of the source data uploaded by the data provider may also be multivariate.

Where data are recorded on a data carrier, such as a file, a two-dimensional pattern of rows and columns is usually used, e.g. row fields and column field values, or row field values and column fields. It can be seen that the concept of fields also exists in the data carrier file, so that it is technically possible to verify the fields of the source data in the file.

In this embodiment, the server may convert the carrier type of the source data into the carrier type of the data table based on a data conversion tool. Thus, when the carrier type of the source data is the same as that of the data table, the field can be checked more efficiently.

The data transformation tool may include an ETL (Extract-Transform-Load) tool, among others. ETL is a tool for efficiently converting a data type of current data into another data type specified, and the work process thereof is divided into extraction (extract), conversion (transform), and loading (load); the extraction refers to extracting data from a current data carrier, the conversion refers to converting the data type of the extracted data into a specified data type, and the loading refers to storing the converted data into a data carrier of the specified data type.

And the server side can issue a convergence instruction to the second client side after the verification is passed so that the second client side can determine the data to be converged from the local data based on the data convergence task.

Step 140: and the second client determines the data to be converged from the local data based on the data convergence task.

After receiving the convergence instruction issued by the server, the second client can determine the data to be converged from the local data based on the data convergence task and upload the data to be converged to the server.

Step 150: and the server writes the data to be converged into the data table.

After receiving the data to be aggregated on each data provider, the server may establish a data aggregation channel that takes the data to be aggregated as a source end and the data table as a destination end, and write the data to be aggregated into the data table based on the data aggregation channel.

Writing the data to be aggregated into the data table based on the data aggregation channel, specifically executing according to the established mapping relationship:

Therefore, the data to be converged can be converged into the data table appointed by the data demander one by one through the convergence channel. In the process of writing the data to be aggregated into the data table, the server can also detect the data to be aggregated according to the value range corresponding to the standard field; if the data which does not conform to the value range is found, the data can be recorded independently without being written into a data table; only aggregated data that fits the range of values is written to the database.

For example, the range of the value range of the gender field is generally male or female, and it is assumed that 0 represents male and 1 represents female; then its range of values is [0,1 ]; in the aggregation process, if the field value of the gender field in a certain piece of data is 2; then it is not possible to determine what this 2-to-last representation is based on the existing interpretation of the gender field, so the data would be considered invalid.

Through the detection of the value range, the data which do not meet the requirements can be screened out, so that the data quality of the converged data is improved.

It is worth mentioning that, because the types of the carriers of the data providers to be aggregated may be diversified, some data providers record data by using a database, and some data providers record data by using a file. Thus, data carriers uploaded by data providers to be aggregated may also be diverse.

For example, the data provider employs a different database, or a different version of the same database, than the database associated with the data catalog; these all result in differences in the data carriers.

Where data are recorded on a data carrier, such as a file, a two-dimensional pattern of rows and columns is usually used, e.g. row fields and column field values, or row field values and column fields.

Since the writing of data cannot be done accurately with different data carriers. Therefore, the server can convert the carrier type of the data to be aggregated into the carrier type of the data table based on the data conversion tool. Wherein the data transformation tool may comprise the ETL tool described above.

Therefore, when the type of the carrier of the data to be gathered is the same as that of the data table, the data can be gathered more efficiently.

Step 152: and the server monitors the progress of the convergence task.

The server side can provide the aggregated task progress inquiry and viewing service for the data provider and the data demander.

After the first client or the second client initiates the aggregated task query request, the server responds to the aggregated task query request and returns the progress information of the aggregated task. Such as the number of pieces of aggregated data, the percentage of aggregated data, the number of invalid pieces of data, the elapsed time length, the remaining time length, etc.

The progress condition of the convergence tasks can be intuitively known through progress monitoring, and a user can conveniently plan a subsequent convergence plan.

Step 154: and the server scores the data providers corresponding to the convergence tasks.

And after the convergence task is completed, scoring each data provider according to a plurality of preset indexes. For example, the data provider is scored according to indexes such as completion progress, data quality and data quantity of the aggregation task. In addition, different indices may have corresponding weights.

Generally, the higher the completion progress of the convergence task, the higher the score is correspondingly; conversely, the score is correspondingly lower; the higher the data quality, the higher the score correspondingly; conversely, the score is correspondingly lower; the more the data volume, the higher the score correspondingly; conversely, the score is correspondingly lower.

By scoring the data providers and sequencing the scores provided by the data providers, the performance of each data provider on the aggregated tasks can be summarized.

Further, rewards may be offered for better performing data providers and penalties may be offered for less performing data providers. Therefore, the data provider is forced to improve the service enthusiasm for the convergence task, so that the data provider can finish the convergence task more quickly and actively, provide data with higher quality, provide more complete and comprehensive data and the like; eventually promoting the convergence service to develop to a better state.

Step 160: the first client acquires the convergence result.

The first client may browse the completed aggregated tasks in the logged-in portal, thereby viewing the aggregated results of the aggregated tasks. Or, the server may push the aggregation result to the first client after the aggregation task is completed.

An embodiment of a method for performing with a server side in the present specification is described below with reference to fig. 6, and this embodiment may correspond to fig. 4 described above. This embodiment is similar to the embodiment of fig. 4, and the same features can be found in the embodiment of fig. 4. Since the embodiment uses the server as the execution main body, the steps performed by the non-server will be simply described in detail with reference to the embodiment of fig. 4, which will not be described again, and the steps performed by the server will be described in detail below.

Step 210: receiving a data aggregation request sent by a first client; wherein the data aggregation request comprises an aggregation parameter.

In an embodiment, as shown in fig. 4, the aggregation parameter is generated after the data demander corresponding to the first client triggers the shortcut option displayed in the operation interface.

Step 220: responding to the data aggregation request, generating a data directory and a data standard based on the aggregation parameters, creating a data table for storing aggregated data in the data directory, and generating a data aggregation task corresponding to a data provider;

the server may generate a data directory based on the directory attributes specified in the aggregation parameter, e.g., based on the directory name and directory address specified in the aggregation parameter, at which the data directory of the directory name is created.

In an embodiment, the server may further generate a data standard based on a standard field in the aggregation parameter. Wherein the standard field may be a standard field defined by the aforementioned data item.

In an embodiment, after the data standard is generated, a database associated with the data directory may be further configured based on a database address specified in the aggregation parameter;

creating a data table of the database in the data directory based on standard fields defined by data items in the aggregation parameters; wherein, the field in the data table is the standard field.

As previously described in fig. 4, the data providers include the data provider specified in the aggregated parameter; the generating of the data aggregation task corresponding to the data provider comprises:

If a task distribution mechanism exists, the server side can push a corresponding data aggregation task to the data provider based on the task distribution mechanism; so that the data provider determines the data to be aggregated from the local data based on the pushed data aggregation task.

After the data demander passes the audit, the data provider can execute the data aggregation task to be executed to determine the data to be aggregated from the local data, and then upload the data to be aggregated to the aggregation service system.

Step 230: and acquiring to-be-aggregated data which is uploaded by a second client corresponding to the data provider and is determined based on the data aggregation task and accords with the data standard.

Determining to-be-aggregated data meeting the data standard based on the data aggregation task uploaded by the second client; and the data to be aggregated consists of the standard field and a local data value corresponding to the standard field.

In this example, the second client may locally organize the data to be aggregated, which is composed of the standard field and the local data value corresponding to the standard field, according to the standard field defined by the data demander.

As described in fig. 4, the server may provide a data quality preview service for the data provider.

Before uploading the data to be aggregated, the data provider can check whether the local fields of the locally stored data can be in one-to-one correspondence with the standard fields through a data pre-check service provided by the aggregation service system.

Specifically, the data provider may upload a local field of the local data as the source data to the server. Correspondingly, the server receives source data uploaded by a second client corresponding to the data provider, wherein the source data comprises a local field of the data provider;

and issuing a convergence instruction to the second client after the verification is passed so that the second client determines the data to be converged from the local data based on the data convergence task.

Wherein the field checking of the local field of the source data based on the standard field of the data standard includes:

and after the mapping relation between all the local fields and the standard field is established, determining that the verification is passed.

Through data quality pre-inspection, a data provider can upload data to be aggregated according to data standards, and therefore data aggregation can be performed more efficiently.

Step 240: and establishing a data aggregation channel which takes the data to be aggregated as a source end and the data table as a destination end, and writing the data to be aggregated into the data table based on the data aggregation channel.

For a mode in which data pre-check is not performed, that is, the aggregated data uploaded by the second client is composed of the standard field and the local data value corresponding to the standard field, step 240 may include:

establishing a data aggregation channel which takes the data to be aggregated as a source end and the data table as a destination end, and writing local data corresponding to a standard field in the data to be aggregated into a field value of the same standard field in the data table based on the data aggregation channel.

Since the field in the data to be aggregated is the same standard field as the data table, the local data can be directly written into the field value corresponding to the standard field in the data table.

For the way of performing data pre-check, that is, the aggregated data uploaded by the second client is composed of the local field and the local data corresponding to the local field, step 240 may include:

establishing a data aggregation channel which takes data to be aggregated as a source end and the data table as a destination end, and writing data corresponding to a local field in the data to be aggregated into a field value of a standard field mapped in the data table by the local field based on the data aggregation channel.

Since the mapping relation between the local field and the standard field is already established during the data previewing, the local data can be written into the field value of the standard field in the data table based on the mapping relation.

In an embodiment, in the process of writing the data to be aggregated into the data table, detecting the data to be aggregated according to a value range corresponding to a standard field; and writing the converged data meeting the value range into the data table.

In an embodiment, before the step 240, the method further comprises:

Since the writing of data cannot be done accurately with different data carriers. And converting the carrier type of the data to be converged into the carrier type of the data table through a data conversion tool, so that the carrier type of the data to be converged is the same as the carrier type of the data table, and the data convergence can be carried out more efficiently.

In an embodiment, the method further comprises:

responding to an aggregated task query request initiated by the first client or the second client; and returning the progress information of the convergence task to the first client or the second client. The progress condition of the convergence tasks can be intuitively known through progress monitoring, and a user can conveniently plan a subsequent convergence plan.

In an embodiment, the method further comprises:

and scoring the data providers corresponding to the aggregated tasks.

By scoring the data providers and sequencing the scores provided by the data providers, the performance of each data provider on the aggregated tasks can be summarized. Further, rewards may be offered for better performing data providers and penalties may be offered for less performing data providers. Therefore, the data provider is forced to improve the service enthusiasm for the convergence task, so that the data provider can finish the convergence task more quickly and actively, provide data with higher quality, provide more complete and comprehensive data and the like; eventually promoting the convergence service to develop to a better state.

In summary, according to the embodiment of the present specification, on one hand, a data directory technology is utilized to provide a function of customizing a data standard for a data demander; through the data standard defined by the data demand side, the data provider can provide the data to be aggregated which accords with the data standard; because each data to be converged before data convergence conforms to the same data standard, the data quality is higher, and additional data conversion is not needed, the data convergence efficiency can be improved.

On the other hand, by providing the rapid convergence setting for the data demander, the data demander can complete the configuration of the data catalog, the definition of the data standard and the designation of the data provider only by selecting a rapid option in the operation interface. And then, the convergence service system fully and automatically performs data convergence, so that the data convergence efficiency is improved.

The convergence service system creates a data directory associated with the database, issues a data standard, creates a data table according to the data standard, and generates a data convergence task corresponding to each data provider based on the convergence parameters of the data demanders. And each data provider uploads the data to be aggregated which accord with the data standard in the local data according to the data aggregation task, and the aggregation service system automatically maps the data to be aggregated into the data table one by one.

On the other hand, the convergence service system can also perform pre-check on the field of the local data of the data provider before the data provider uploads the data to be converged, so as to ensure that the data to be converged can be mapped to the correct data table field.

Corresponding to the foregoing data aggregation method embodiments, the present specification also provides embodiments of a data aggregation apparatus. The device embodiments may be implemented by software, or by hardware, or by a combination of hardware and software. The software implementation is taken as an example, and as a logical device, the device is formed by reading corresponding computer business program instructions in the nonvolatile memory into the memory for operation through the processor of the device in which the device is located. From a hardware aspect, as shown in fig. 7, the hardware structure diagram of the device in which the data aggregation apparatus is located in this specification is shown, except for the processor, the network interface, the memory, and the nonvolatile memory shown in fig. 7, the device in which the apparatus is located in the embodiment may also include other hardware according to the actual data aggregation function, which is not described again.

Referring to fig. 8, a block diagram of a data aggregation apparatus according to an embodiment of the present disclosure is provided, where the apparatus corresponds to the embodiment shown in fig. 6, and the apparatus includes:

a receiving unit 310, configured to receive a data aggregation request sent by a first client; wherein the data aggregation request comprises an aggregation parameter;

a response unit 320, configured to generate a data directory and a data standard based on the aggregation parameter in response to the data aggregation request, create a data table for storing aggregated data in the data directory, and generate a data aggregation task corresponding to a data provider;

the acquiring unit 330 is configured to acquire to-be-aggregated data which is uploaded by a second client corresponding to the data provider and is determined based on the data aggregation task and meets the data standard;

the aggregation unit 340 establishes a data aggregation channel using the data to be aggregated as a source end and the data table as a destination end, and writes the data to be aggregated into the data table based on the data aggregation channel.

in the response unit 320, generating a data aggregation task corresponding to a data provider includes:

Optionally, the apparatus further comprises: the processing unit is used for pushing a corresponding data aggregation task to the data provider based on a task distribution mechanism; so that the data provider determines the data to be aggregated from the local data based on the pushed data aggregation task.

Optionally, in the response unit 320, generating a data directory and a data standard based on the aggregation parameter includes: generating a data directory based on the directory attributes in the convergence parameters; generating a data criterion based on a criterion field in the aggregation parameter.

Optionally, in the response unit 320, creating a data table for storing aggregated data in the data directory, where the data table includes: configuring a database associated with the data directory based on a database address specified in the convergence parameter; creating a data table of the database in the data directory based on a standard field in the aggregation parameter; wherein, the field in the data table is the standard field.

in the aggregation unit 340, writing the data to be aggregated into the data table based on the data aggregation channel includes: and writing local data corresponding to the standard fields in the data to be aggregated into the field values of the same standard fields in the data table based on the data aggregation channel.

Optionally, before the obtaining unit 330, the apparatus further includes: the receiving subunit is used for receiving source data uploaded by a second client corresponding to the data provider, wherein the source data comprises a local field of the data provider;

and the issuing subunit issues a convergence instruction to the second client after the verification is passed so that the second client determines the data to be converged which accords with the data standard from the local data based on the data convergence task.

Optionally, the verifying subunit includes: establishing a mapping relation between a standard field and a local field with the same meaning through a field matching algorithm; after the mapping relation between all local fields and the standard field is established, the verification is determined to be passed;

in the aggregation unit 340, writing the data to be aggregated into the data table based on the data aggregation channel includes: based on the data aggregation channel, writing data corresponding to a local field in the data to be aggregated into a field value of a standard field of the data table, wherein the local field is mapped in the data table.

Optionally, before the aggregation unit 340, the apparatus further includes: and the conversion unit is used for converting the carrier type of the data to be aggregated into the carrier type of the data table based on a data conversion tool if the carrier type of the data to be aggregated is different from the carrier type of the data table.

Optionally, the aggregation unit 340 further includes: in the process of writing the data to be converged into the data table, detecting the data to be converged according to a value range corresponding to a standard field; and writing the converged data meeting the value range into the data table.

Optionally, the apparatus further comprises: the query unit responds to a convergence task query request initiated by the first client or the second client; and returning the progress information of the convergence task to the first client or the second client.

The system, apparatus, module or unit of the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. A typical implementation device is a computer, which may take the form of a personal computer, laptop computer, cellular telephone, smart phone, personal digital assistant, media player, navigation device, email messaging device, game console, tablet computer, wearable device, or a combination of any of these devices.

The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again. For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution in the specification. One of ordinary skill in the art can understand and implement it without inventive effort.

In the above embodiments of the electronic device, it should be understood that the Processor may be a Central Processing Unit (CPU), other general-purpose processors, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. The general-purpose processor may be a microprocessor, or the processor may be any conventional processor, and the aforementioned memory may be a read-only memory (ROM), a Random Access Memory (RAM), a flash memory, a hard disk, or a solid state disk. The steps of a method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware processor, or may be implemented by a combination of hardware and software modules in a processor.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the embodiment of the electronic device, since it is substantially similar to the embodiment of the method, the description is simple, and for the relevant points, reference may be made to part of the description of the embodiment of the method.

Other embodiments of the present disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This specification is intended to cover any variations, uses, or adaptations of the specification following, in general, the principles of the specification and including such departures from the present disclosure as come within known or customary practice within the art to which the specification pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the specification being indicated by the following claims. It will be understood that the present description is not limited to the precise arrangements described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the present description is limited only by the appended claims.

Claims

1. A method for data aggregation, the method comprising:

2. The method of claim 1, wherein the data provider comprises a data provider specified in the aggregation parameter;

3. The method of claim 1, further comprising;

4. The method of claim 3, further comprising:

5. The method of claim 1, wherein generating a data catalog and data criteria based on the aggregation parameters comprises:

6. The method of claim 4, wherein creating a data table in the data directory for storing aggregated data comprises:

7. The method according to claim 5, wherein the data to be aggregated conforming to the data standard is composed of the standard field and local data corresponding to the standard field;

8. The method according to claim 5, wherein before the obtaining of the data to be aggregated which is uploaded by the second client corresponding to the data provider and is determined to meet the data standard based on the data aggregation task, the method further comprises:

9. The method of claim 8, wherein the field checking the local field of the source data based on the standard field of the data standard comprises:

10. The method of claim 1, wherein before the establishing a data aggregation channel with data to be aggregated as a source and the data table as a destination, and writing the data to be aggregated into the data table based on the data aggregation channel, the method further comprises:

11. The method according to claim 1, wherein the writing the data to be aggregated into the data table based on the data aggregation channel comprises:

and writing the converged data meeting the value range into the data table.

12. The method of claim 1, further comprising:

13. The method according to claim 1, wherein the aggregation parameter is generated after a data demander corresponding to the first client triggers a shortcut option displayed in an operation interface.

14. A data aggregation device, the device comprising:

15. The apparatus of claim 14, wherein the data provider comprises a data provider specified in the aggregation parameter;

16. The apparatus of claim 14, further comprising:

17. The apparatus of claim 16, wherein the processing unit further comprises: if the task distribution mechanism does not exist, the data provider logs in a portal of the data demander to inquire whether a data aggregation task to be executed exists in the data directory or not; applying for the data demand party after the data convergence task to be executed is inquired; and after the data demand side applies for the data demand side, the data provider side determines the data to be converged from the local data based on the data convergence task to be executed.

18. The apparatus of claim 14, wherein the response unit, generating a data directory and data criteria based on the aggregation parameters, comprises:

19. The apparatus of claim 18, wherein the response unit creates a data table in the data directory for storing the aggregated data, and wherein the data table comprises:

20. The apparatus according to claim 18, wherein the data to be aggregated conforming to the data standard is composed of the standard field and local data corresponding to the standard field;

21. The apparatus of claim 18, wherein prior to the obtaining unit, the apparatus further comprises:

22. The apparatus of claim 21, wherein the syndrome unit comprises:

23. The apparatus of claim 14, wherein prior to the convergence unit, the apparatus further comprises:

24. The apparatus of claim 14, wherein the convergence unit further comprises:

and writing the converged data meeting the value range into the data table.

25. The apparatus of claim 14, further comprising:

26. The device according to claim 14, wherein the aggregation parameter is generated after a data demander corresponding to the first client triggers a shortcut option displayed in an operation interface.

27. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured as the method of any of the preceding claims 1-13.