CN113094393B

CN113094393B - Data aggregation method and device and electronic equipment

Info

Publication number: CN113094393B
Application number: CN202110281432.4A
Authority: CN
Inventors: 张洪彬; 褚占峰; 叶姣荣
Original assignee: Hangzhou Dt Dream Technology Co Ltd
Current assignee: Hangzhou Dt Dream Technology Co Ltd
Priority date: 2021-03-16
Filing date: 2021-03-16
Publication date: 2023-07-14
Anticipated expiration: 2041-03-16
Also published as: CN113094393A

Abstract

The embodiment of the application provides a data aggregation method and device and electronic equipment. The method comprises the following steps: receiving a data aggregation request sent by a first client; wherein the data aggregation request includes an aggregation parameter; responding to the data convergence request, generating a data catalog and a data standard based on the convergence parameters, creating a data table for storing convergence data in the data catalog, and generating a data convergence task corresponding to a data provider; acquiring data to be aggregated, which is determined based on the data aggregation task and meets the data standard, uploaded by a second client corresponding to the data provider; establishing a data convergence channel taking data to be converged as a source end and the data table as a destination end, and writing the data to be converged into the data table based on the data convergence channel.

Description

Data aggregation method and device and electronic equipment

Technical Field

The embodiment of the application relates to the technical field of internet, in particular to a data aggregation method and device and electronic equipment.

Background

With the wide application of big data, the data needs to be transferred and exchanged among different departments increasingly. It is often the case that an upper department or unit gathers data of a plurality of lower departments or units, or gathers data of other departments across the lateral directions of departments, etc. in practical applications. Such different departments or units, etc., are commonly referred to as different data fields.

Because of the data isolation between the different data domains, the data cannot be directly acquired from the different data domains by conventional means, and the data of the different data domains cannot be directly gathered together.

In the related art, data exchange of different data domains can be generally solved through a data sharing interaction platform. When the data of other main bodies are needed to be used, the data demander needs to log in the data sharing interaction platform and find out the portal (such as homepage) of the data provider in the data sharing interaction platform; so that the data is retrieved and applied in the portal, and the data provider's data is available to the data requesting party after being authorized by the data provider. When the data demand side needs to use the data of other multiple departments, the data is required to be retrieved and applied to portals of different departments, and then the data acquired from the departments are gathered. Different departments may adopt different data standards, the data acquired from each department cannot be directly used, and the data with different data standards need to be converted into the data with the same data standard before data aggregation, so that the data aggregation efficiency is lower.

Disclosure of Invention

The embodiment of the specification provides a data aggregation method and device and electronic equipment:

according to a first aspect of embodiments of the present specification, there is provided a data aggregation method, the method comprising:

receiving a data aggregation request sent by a first client; wherein the data aggregation request includes an aggregation parameter;

responding to the data convergence request, generating a data catalog and a data standard based on the convergence parameters, creating a data table for storing convergence data in the data catalog, and generating a data convergence task corresponding to a data provider;

acquiring data to be aggregated, which is determined based on the data aggregation task and meets the data standard, uploaded by a second client corresponding to the data provider;

establishing a data convergence channel taking data to be converged as a source end and the data table as a destination end, and writing the data to be converged into the data table based on the data convergence channel.

Optionally, the data provider includes a data provider specified in the aggregation parameter;

the generating the data convergence task corresponding to the data provider comprises the following steps:

and generating a corresponding data convergence task aiming at the data provider appointed in the convergence parameter.

Optionally, the method further comprises:

based on a task distribution mechanism, pushing corresponding data convergence tasks to the data provider; and the data provider determines data to be aggregated from the local data based on the pushed data aggregation task.

Optionally, the method further comprises:

if the task distribution mechanism does not exist, the portal of the data provider logging in the data demander inquires whether a data convergence task to be executed exists in the data catalog; after inquiring the data convergence task to be executed, applying for the data requiring party; after the data demand side passes the application, the data provider determines data to be aggregated from local data based on the data aggregation task to be executed.

Optionally, the generating the data directory and the data standard based on the aggregation parameter includes:

generating a data directory based on directory attributes in the aggregation parameters;

generating a data standard based on the standard field in the aggregation parameter.

Optionally, the creating a data table for storing the aggregated data in the data directory includes:

configuring a database associated with the data directory based on the database address specified in the aggregation parameter;

Creating a data table of the database in the data directory based on standard fields in the aggregation parameters; wherein, the fields in the data table are the standard fields.

Optionally, the data to be aggregated meeting the data standard is composed of the standard field and a local data value corresponding to the standard field;

the writing the data to be aggregated into the data table based on the data aggregation channel comprises the following steps:

and writing local data corresponding to the standard field in the data to be aggregated into the field value of the same standard field in the data table based on the data aggregation channel.

Optionally, before the obtaining the data to be aggregated, which is determined to meet the data standard and is determined to meet the data standard based on the data aggregation task and is uploaded by the second client corresponding to the data provider, the method further includes:

receiving source data uploaded by a second client corresponding to the data provider, wherein the source data comprises a local field of the data provider;

performing field verification on the local field of the source data based on the standard field of the data standard;

and after the verification is passed, sending a convergence instruction to the second client so that the second client determines data to be converged, which accords with the data standard, from the local data based on the data convergence task.

Optionally, the verifying the field of the local field of the source data based on the standard field of the data standard includes:

establishing a mapping relation between standard fields and local fields with the same meaning through a field matching algorithm;

after all local fields and standard fields establish a mapping relation, determining that the verification passes;

and writing the data corresponding to the local field in the data to be aggregated into a field value of a standard field of which the local field is mapped in the data table based on the data aggregation channel.

Optionally, before the establishing a data aggregation channel with the data to be aggregated as a source end and the data table as a destination end, and writing the data to be aggregated into the data table based on the data aggregation channel, the method further includes:

and if the carrier type of the data to be aggregated is different from the carrier type of the data table, converting the carrier type of the data to be aggregated into the carrier type of the data table based on a data conversion tool.

Optionally, the writing the data to be aggregated into the data table based on the data aggregation channel includes:

In the process of writing the data to be aggregated into the data table, detecting the data to be aggregated according to a value range corresponding to a standard field;

and writing the converged data conforming to the value range into the data table.

Optionally, the method further comprises:

responding to an aggregate task query request initiated by the first client or the second client;

and returning the progress information of the convergence task to the first client or the second client.

Optionally, the aggregation parameter is generated after the data demander corresponding to the first client triggers the shortcut option displayed in the operation interface.

According to a second aspect of embodiments of the present specification, there is provided a data aggregation apparatus, the apparatus comprising:

the receiving unit receives a data aggregation request sent by a first client; wherein the data aggregation request includes an aggregation parameter;

the response unit is used for responding to the data convergence request, generating a data catalog and a data standard based on the convergence parameters, creating a data table for storing convergence data in the data catalog and generating a data convergence task corresponding to a data provider;

the acquisition unit acquires data to be aggregated, which is determined based on the data aggregation task and meets the data standard, on a second client corresponding to the data provider;

The aggregation unit establishes a data aggregation channel taking data to be aggregated as a source end and the data table as a destination end, and writes the data to be aggregated into the data table based on the data aggregation channel.

in the response unit, generating a data convergence task corresponding to the data provider, including:

Optionally, the apparatus further includes:

the processing unit is used for pushing corresponding data convergence tasks to the data provider based on a task distribution mechanism; and the data provider determines data to be aggregated from the local data based on the pushed data aggregation task.

Optionally, the processing unit further includes: if the task distribution mechanism does not exist, the portal of the data provider logging in the data demander inquires whether a data convergence task to be executed exists in the data catalog; after inquiring the data convergence task to be executed, applying for the data requiring party; after the data demand side passes the application, the data provider determines data to be aggregated from local data based on the data aggregation task to be executed.

Optionally, the response unit generates a data directory and a data standard based on the aggregation parameter, including:

Optionally, in the response unit, creating a data table for storing aggregated data in the data directory, including:

configuring a database associated with the data directory based on the database address specified in the aggregation parameter; creating a data table of the database in the data directory based on standard fields in the aggregation parameters; wherein, the fields in the data table are the standard fields.

the aggregation unit writes the data to be aggregated into the data table based on the data aggregation channel, and the method comprises the following steps:

Optionally, before the acquiring unit, the apparatus further includes:

A receiving subunit, configured to receive source data uploaded by a second client corresponding to the data provider, where the source data includes a local field of the data provider;

a verification subunit, configured to perform field verification on the local field of the source data based on the standard field of the data standard;

and the issuing subunit issues an aggregation instruction to the second client after the verification is passed, so that the second client determines data to be aggregated, which accords with the data standard, from the local data based on the data aggregation task.

Optionally, the verification subunit includes:

establishing a mapping relation between standard fields and local fields with the same meaning through a field matching algorithm; after all local fields and standard fields establish a mapping relation, determining that the verification passes;

Optionally, before the aggregation unit, the apparatus further includes:

And the conversion unit is used for converting the carrier type of the data to be aggregated into the carrier type of the data table based on a data conversion tool if the carrier type of the data to be aggregated is different from the carrier type of the data table.

Optionally, the aggregation unit further includes:

Optionally, the apparatus further includes:

the query unit responds to an aggregate task query request initiated by the first client or the second client; and returning the progress information of the convergence task to the first client or the second client.

According to a third aspect of embodiments of the present specification, there is provided an electronic device comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to perform any of the data aggregation methods described above.

The embodiment of the specification provides a scheme for data aggregation, and a function of customizing data standards is provided for a data requiring party by utilizing a data directory technology; the data provider can provide data to be aggregated which accords with the data standard through the data standard defined by the data demander; because each data to be converged before data convergence accords with the same data standard, the data quality is higher, and additional data conversion is not needed, so that the data convergence efficiency can be improved.

On the other hand, by providing the shortcut convergence setting for the data demander, the data demander can complete the configuration of the data catalogue, the definition of the data standard and the assignment of the data provider only by selecting the shortcut option in the operation interface. And then, the data aggregation is fully automatically executed by the aggregation service system, so that the data aggregation efficiency is greatly improved.

Drawings

FIG. 1 is a schematic diagram of a data sharing exchange platform according to an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of data aggregation for multiple departments provided by an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a data convergence service system according to an embodiment of the disclosure;

FIG. 4 is a flow chart of a method for data aggregation for multiparty interactions provided in an embodiment of the present disclosure;

FIG. 5 is a schematic diagram of an operation interface according to an embodiment of the present disclosure;

FIG. 6 is a flowchart of a method for server-side execution according to an embodiment of the present disclosure;

fig. 7 is a hardware configuration diagram of a data aggregation device according to an embodiment of the present disclosure;

fig. 8 is a schematic block diagram of a data aggregation device according to an embodiment of the present disclosure.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the present specification. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present description as detailed in the accompanying claims.

The terminology used in the description presented herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the description. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.

It should be understood that, although the terms first, second, etc. may be used in this specification to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, the first information may also be referred to as second information, and similarly, the second information may also be referred to as first information, without departing from the scope of the present description. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.

As described above, with the wide application of big data, the demands for data to be transferred and exchanged between different departments are increasing. It is often the case that an upper department or unit gathers data of a plurality of lower departments or units, or gathers data of other departments across the lateral directions of departments, etc. in practical applications. Such different departments or units, etc., are commonly referred to as different data fields.

A schematic diagram of a data sharing switching platform is shown in fig. 1. When the data of other main bodies are needed to be used, the data demander needs to log in the data sharing interaction platform and find out the portal (such as homepage) of the data provider in the data sharing interaction platform; and searching the data in the portal, applying for the data, checking by the data provider, releasing the required data to the data sharing exchange platform after the checking is passed, and transmitting the data to the data requiring party by the data sharing exchange platform.

When a data consumer needs to use data from multiple data providers, a data aggregation operation for multiple data domains is involved. A schematic diagram of data aggregation for multiple departments as shown in fig. 2. When the data demand side needs to use the data of a plurality of departments (3 departments of the middle doors A, B and C in fig. 2), the data needs to be retrieved and applied to the portals of different departments, and then the data acquired from the departments are aggregated. The data demand party not only needs to query the portal for multiple times and apply for multiple times, but also needs to search in mass data of the portal for the needed data, and the operations involve a large amount of offline work, so that for the data demand party, the operation is complicated when data aggregation is performed, errors are easy to occur, and the data aggregation efficiency is low.

In addition, different departments may employ different data standards, for example, department a in fig. 2 employs database tables to store data, and department B, C employs file record data; and the data fields stored by different departments are also different. The data obtained from each department cannot be directly used, and the quality of the collected data is poor; for this reason, it is necessary to convert the data of different data standards into the data of the same data standard before data aggregation is performed, resulting in low data aggregation efficiency.

In order to solve the above problems, the present application provides a data aggregation scheme based on a data directory. Providing a function of customizing data standards for a data requiring party by utilizing a data directory technology; the data provider can provide data to be aggregated which accords with the data standard through the data standard defined by the data demander; because each data to be converged before data convergence accords with the same data standard, the data quality is higher, and additional data conversion is not needed, so that the data convergence efficiency can be improved.

Wherein, as the database needs to have professional skills to be used, for example, the database language needs to be learned to be operated; therefore, a certain technical threshold exists for the common people to use the database. And data catalogs are a visualization technique that exposes databases in the form of catalogs (e.g., folders). The data tables in the data directory correspond to database tables in the database, and the fields in the data tables are also fields in the database tables. The user can quickly learn the data stored in the database through the data directory. For example, it may be known which database tables the database has, which fields in each database table represent what meaning, which data under each field, and so on.

The data aggregation service system creates a data catalog associated with the database based on the aggregation parameters of the data demander, issues data standards, creates a data table according to the data standards, and generates data aggregation tasks corresponding to all the data providers.

And uploading the data to be aggregated in the local data according to the data aggregation task by each data provider, and automatically mapping the data to be aggregated into a data table one by the aggregation service system.

In addition, the convergence service system can pre-check the field of the local data of the data provider before the data provider uploads the data to be converged, so as to ensure that the data to be converged can be mapped to the correct data table field.

A schematic diagram of an converged service system as shown in fig. 3. When the data demander needs to use data of a plurality of data providers (3 departments, such as departments A, B and C shown in fig. 3), the data demander only needs to designate these 3 departments as data providers in the aggregated parameters submitted to the aggregated service system; then the corresponding data convergence tasks can be automatically distributed to the 3 parts by the convergence service system; each department will actively submit data, and finally the data uploaded by each part is converged into a data table by the convergence service system. This data table is also generated by the converged service system in the data object specified by the data demander.

It follows that the data demander does not need to do a lot of offline work, e.g. does not need to access the portal of each data provider, nor does it need to retrieve the required data in the portal; only the convergence parameters are submitted, and other convergence service systems are completed, so that the operation content of the data demander is greatly reduced, and the data convergence efficiency is greatly improved through the automatic execution of the convergence service.

Reference is now made to the flowchart of the multi-party interactive data aggregation method shown in fig. 4. The multiparty may include a server, a first client, and a second client; the server is coupled with the first client and the second client for data interaction.

The service end is the service end of the convergence service system and is used for providing data convergence service. The data aggregation service may include creating a data directory for a data demander, defining a data standard, distributing a data aggregation task, and aggregating data to be aggregated uploaded by a plurality of data providers.

Each data provider provides data pre-checking service to help the data provider judge whether the data to be aggregated meets the data standard in advance; different data carriers uploaded by each data provider are converted into a unified data carrier so as to realize data aggregation.

An aggregated service progress query service is provided for a data provider or a data demander to visually demonstrate the progress of data aggregation. And scoring the aggregated results of the data providers, and ranking the plurality of data providers to intuitively embody the service state of each data provider.

The server may refer to a server, a server cluster, or a cloud platform constructed by the server cluster.

The first client may be a terminal device of the data demander, on which a software program corresponding to the data convergence service provided by the server is installed, through which the data demander may request data convergence.

The second client may be a terminal device of a data provider, and a software program corresponding to the data aggregation service provided by the service end is also installed on the terminal device, and the provider may provide data required by data aggregation through the software program.

Specifically, the method shown in fig. 4 may include the following steps:

step 110: the first client acquires the aggregation parameters determined by the data provider.

In general, the data demander can log in to the convergence service system and edit the convergence parameters in the operation interface of the convergence service. Reference is made to the schematic illustration of the operation interface shown in fig. 5.

The data demand party can edit the aggregation parameters in the data aggregation operation interface, and input the related content of each aggregation parameter.

As shown in fig. 5, under the "online editing" function, the displayed "convergence parameters" may include:

and the information resource name is used for describing the characteristics of the data required by the data demander so as to facilitate the retrieval, positioning and acquisition of the required data.

The English name of the information resource is the English name corresponding to the information resource name, and is used for conveniently searching, positioning and acquiring the required data in an English mode.

And category for describing classification of the data directory.

The source of the data is used to identify the source of the desired data.

A data provider for specifying an owner of the desired data.

And the data resource abstract is used for displaying the summary information of the required data.

Information resource format classifications are used to describe the format classifications of the desired data, such as files, databases, API services, etc.

Information resource format type, which is used to describe the format type of the required data, such as json, KV, etc.

Release date describing the release time of the data catalog.

The sharing type is used to describe the sharing type of the shared data, such as conditional sharing, unconditional sharing, not sharing, and the like.

And the sharing condition is used for describing the constraint of sharing the shared data.

The sharing mode classification is used for describing sharing mode classification of shared data, such as sharing platform mode, mail or media, file mode and the like.

The sharing mode type is used for describing the sharing mode type of the sharing data, such as a database, an interface and the like.

Whether or not to open to the outside is used to describe whether or not the data is visible to the outside.

Open conditions, which describe conditions under which the data is externally visible.

The convergence parameters shown in fig. 5 are only one example and are not particularly limited thereto.

It should be noted that, as shown in fig. 5, in order to simplify the operation difficulty of the user, a shortcut option may be displayed in the operation interface, and the shortcut option is triggered by the data demander to determine the corresponding convergence parameter. The data demander can select or fill in the convergence parameters in a pull-down manner. And the first client automatically generates corresponding convergence parameters after the shortcut options are triggered by the data demand side. Among the shortcut options of the pull-down mode, the option content can be preconfigured by a system developer.

Taking the shortcut option of the data provider as an example. The developer can register accounts (of course, each part can also register itself) for departments with data aggregation requirements on the aggregation service system; for departments with the upper and lower membership, the association relationship between the upper and lower departments can be established. In this way, after the superior department enters the interface of fig. 5 as the data demander and clicks on the shortcut option of the data provider, the options of each subordinate department associated with the department can be displayed in a pull-down manner.

In general, the shortcut option of the data provider may support multiple options. A plurality of selected data providers may be deposited into a data list.

It should be noted that, in some situations, the data demander may fill out the data provider that has no association with it. Such as cross-department data convergence scenarios without upper and lower membership.

Step 112: the first client sends a data aggregation request to the server; the data aggregation request includes the aggregation parameter.

The first client may add all aggregation parameters in the operation interface to the data aggregation request in response to the sending action of the data demander (e.g. clicking the "submit" button in fig. 5), and send the data aggregation request to the server.

Step 120: the server generates a data catalog and a data standard.

The data demander may also define data criteria in the form of data items in the visual operation interface. By way of example, the data items may include the contents as shown in Table 1 below:

data item name	English name	Data type	Whether or not to use the primary key	Data length	Description of the invention
						Name of name	Name	Character string type C	Whether or not	10	Name of name
Age of	Age	Numerical value N	Whether or not	10	Age of
						Sex (sex)	Sex	Character string type C	Whether or not	--	Sex (sex)
Weight of body	Weight	Numerical value N	Whether or not	--	Weight of body
						Height of body	Height	Numerical value N	Whether or not	--	Height of body
Household address	Address	Character string type C	Whether or not	10	Household address

TABLE 1

As shown in table 1, the data item may include information of a data item name, an english name (i.e., field) of the data item, a data type, whether a primary key, a data length, a description, and the like. By defining the data standard, the data provider can be made to prepare data conforming to the data standard.

The data item can also be used as a convergence parameter to be sent to the server side along with the data convergence request.

Correspondingly, after receiving the data convergence request, the server responds to the data convergence request and generates a data catalog based on the convergence parameters. Specifically, the server may create a data directory of the directory name at the directory address based on the directory name and the directory address specified in the aggregation parameter.

Further, after the server creates the data directory, the database associated with the data directory may be configured based on the database address specified in the aggregation parameter. Thus, the association relation between the data directory and the database is established, and preparation is made for the subsequent steps.

In addition, the server may generate a data standard based on standard fields defined by the data items in the aggregation parameter.

Step 122: the server creates a data table.

The server side can create a data table of the database in the data catalog according to the data standard; wherein, the table name of the data table can be English shorthand of the data catalog; the fields in the database are the standard fields, for example, the fields corresponding to the foregoing table 1 are established in the database.

Because the data catalogue is related to the database, the data catalogue after the database is related is the visualized display content of the database corresponding to the form of the catalogue (such as a folder). Thus, creating a data table in a data object, which is actually understood to be located in a database, is equivalent to creating a data table in a database.

Step 124: and the server generates a data convergence task corresponding to the data provider.

As previously described, the aggregation parameter may include a data provider specified by the data demander. And the server generates a corresponding data convergence task for the data provider specified in the convergence parameter.

It should be noted that any data provider needs to be acquired to generate data aggregation, and if there is no data provider in the aforementioned aggregation parameter (i.e. the data provider is not specified by the demander), then the specified data provider needs to be acquired from the data demander before step 124 is executed.

Further, the server side can push corresponding data convergence tasks to the data provider; and the data provider determines data to be aggregated from the local data based on the pushed data aggregation task.

If the task distribution mechanism does not exist, the portal of the data demand party can be logged in by the data provider to inquire whether a data convergence task to be executed exists in the data catalog; after inquiring the data convergence task to be executed, applying for the data requiring party; after the data demand side passes the application, the data provider determines data to be aggregated from local data based on the data aggregation task to be executed.

Through the converged service system, both the data demander and the data provider can log in and access the system through their respective accounts. The data provider can log in the portal of the data demand party to inquire whether the data convergence task to be executed exists in the data catalog.

Since the data in each data field relates to the data security of the corresponding body. Especially, between the upper and lower departments, the data of the lower departments is output to the domain to be checked by the upper departments; thus, without the task distribution mechanism, even if the data provider queries that there is a data aggregation task associated with itself, there will be no self-upload of data due to the lack of authority of the superior department. To this end, the data provider may issue an application to the data consumer that is audited by the data consumer to authorize the data provider to export data from the data domain.

After the data demander checks, the data provider can execute the data aggregation task to be executed to determine the data to be aggregated from the local data, and then upload the data to be aggregated to the aggregation service system.

Step 130: and uploading the source data to the server by the second client.

Before uploading the data to be aggregated, the data provider can check whether the local field of the local storage data can be in one-to-one correspondence with the standard field through the data pre-checking service provided by the aggregation service system. Specifically, the data provider may upload the local field of the local data as the source data to the server.

Step 132: the server performs field verification on the local field of the source data based on the data standard.

The server can establish the mapping relation between the standard field and the local field with the same meaning through a field matching algorithm.

For example, standard fields and local fields with the same meaning are determined through algorithms such as exact matching, fuzzy matching, near-sense similarity matching, etc.; and then map these standard fields and local fields with the same meaning.

The local fields without mapping relation are filtered out and returned to the data provider, and the data provider reorganizes the local fields, for example, adds field meanings to the local fields, or modifies the local fields into standard fields with the same meanings based on published standard fields; then, the field is uploaded again for verification. And the verification is not calculated until the mapping relation between all the local fields and the standard fields is established.

It is worth mentioning that, since the carrier type of the data source of the data provider may be plural, some data providers use databases to record data, and some data providers use files to record data. Thus, the data carriers of the source data uploaded by the data provider may also be diverse.

For a file, a two-dimensional pattern of rows and columns is usually used, for example, a behavior field, a column of field values, or a behavior field value, a column of fields. It can be seen that the concept of a field also exists in a data carrier, which is a file, so that it is technically possible to check the field for the source data in the file.

In this embodiment, the server may convert the carrier type of the source data into the carrier type of the data table based on a data conversion tool. Thus, when the carrier type of the source data and the carrier type of the data table are the same, the verification of the field can be more efficiently performed.

The data conversion tool may include an ETL (Extract-Transform-Load) tool, among other things. ETL is a tool for efficiently converting the data type of the current data into another specified data type, and the working process of ETL is divided into extraction (extraction), conversion (transformation) and loading (load); extraction means that data are extracted from the current data carrier, conversion means that the data type of the extracted data are converted into the specified data type, and loading means that the converted data are stored in the data carrier of the specified data type.

After the verification is passed, the server side can issue a convergence instruction to the second client side, so that the second client side determines data to be converged from the local data based on a data convergence task.

Step 140: the second client determines data to be aggregated from the local data based on the data aggregation task.

After receiving the convergence instruction issued by the server, the second client can determine the data to be converged from the local data based on the data convergence task, and upload the data to be converged to the server.

Step 150: and the server writes the data to be aggregated into the data table.

After receiving the data to be aggregated on each data provider, the server can establish a data aggregation channel taking the data to be aggregated as a source end and the data table as a destination end, and write the data to be aggregated into the data table based on the data aggregation channel.

The writing of the data to be aggregated into the data table is performed based on the data aggregation channel, specifically according to the above-mentioned established mapping relationship:

Thus, the data to be aggregated can be aggregated into the data table appointed by the data demander through the aggregation channel one by one. In the process of writing the data to be aggregated into the data table, the server side can also detect the data to be aggregated according to the value range corresponding to the standard field; if the data which does not accord with the range of the value range is found, the data can be singly recorded and the data table is not written; only the aggregated data that fits the range of value ranges is written into the database.

For example, the value range of the gender field is generally two types of men and women, and it is assumed that 0 represents men and 1 represents women; then its value range is 0, 1; in the aggregation process, if the field value of the gender field in a piece of data is 2; then it cannot be determined what this 2-to-bottom representation is based on the existing interpretation of the gender field, so the data would be considered invalid data.

And through the detection of the value range, the data which do not meet the requirements can be screened out, so that the data quality of the converged data is improved.

It is worth mentioning that, since the carrier types of the data to be aggregated of the data providers may be plural, some data providers use databases to record data, and some data providers use files to record data. Thus, the data carriers of the data to be aggregated uploaded by the data provider may also be diverse.

For example, the data provider may employ a different database than the database associated with the data catalog, or different versions of the same database; these all lead to differences in the data carriers.

For a file, a two-dimensional pattern of rows and columns is usually used, for example, a behavior field, a column of field values, or a behavior field value, a column of fields.

Accurate data writing is not possible due to the different data carriers. Therefore, the server side can convert the carrier type of the data to be aggregated into the carrier type of the data table based on the data conversion tool. Wherein the data conversion tool may comprise the aforementioned ETL tool.

Thus, when the carrier type of the data to be aggregated is the same as the carrier type of the data table, the data aggregation can be performed more efficiently.

Step 152: and the server monitors the progress of the convergence task.

The service end can provide a converging task progress query and view service for the data provider and the data demander.

After the first client or the second client initiates the converging task query request, the server responds to the converging task query request and returns progress information of the converging task. Such as the number of data stripes that have been aggregated, the percentage of data that has been aggregated, the number of invalid data stripes, the length of time that has elapsed, the length of time that remains, etc.

The progress condition of the convergence task can be intuitively known through progress monitoring, and a user can conveniently plan a subsequent convergence plan.

Step 154: and the server scores the data provider corresponding to the convergence task.

And after the convergence task is completed, scoring each data provider according to a plurality of preset indexes. The data provider is scored according to indexes such as completion progress, data quality, data quantity and the like of the convergence task. In addition, different indicators may have corresponding weights.

Generally, the higher the completion progress of the aggregate task, the higher the score correspondingly; conversely, the lower the score is correspondingly; the higher the data quality, the higher the score accordingly; conversely, the lower the score is correspondingly; the more the data volume, the higher the score correspondingly; otherwise, the score is correspondingly lower.

By scoring the data providers and ordering the scores of the data providers, a summary of the performance of each data provider on the aggregate task can be made.

Further, rewards may be offered to data providers that perform better, while penalties may be offered to data providers that perform worse. Thus, the data provider can improve the service enthusiasm for the convergence task, so that the data provider can complete the convergence task more quickly and more actively, provide higher-quality data, provide more perfect and comprehensive data and the like; eventually promoting the evolution of converged services to a better state.

Step 160: the first client acquires an aggregation result.

The first client can browse the completed convergence tasks in the logged-in portal, so as to view convergence results of the convergence tasks. Or, the server may push the convergence result to the first client after the convergence task is completed.

An embodiment of the method of the present disclosure, which uses the server as the main execution body, is described below with reference to fig. 6, and this embodiment may correspond to fig. 4. This embodiment is similar to the previously described fig. 4 embodiment and the same technical features can be seen with reference to the fig. 4 embodiment. Since the embodiment uses the server as the execution body, the steps performed by the non-server will be described in detail simply, and the details will not be described again here with reference to the embodiment of fig. 4, and the steps performed by the server will be described in detail below.

Step 210: receiving a data aggregation request sent by a first client; wherein the data aggregation request includes an aggregation parameter.

In an embodiment, as shown in fig. 4, the aggregation parameter is generated after the shortcut option displayed in the operation interface is triggered by the data demander corresponding to the first client.

Step 220: responding to the data convergence request, generating a data catalog and a data standard based on the convergence parameters, creating a data table for storing convergence data in the data catalog, and generating a data convergence task corresponding to a data provider;

The server may generate a data directory based on the directory attributes specified in the aggregation parameter, e.g., based on the directory names and directory addresses specified in the aggregation parameter, creating a data directory for the directory names at the directory addresses.

In an embodiment, the server may further generate a data standard based on the standard field in the aggregation parameter. Wherein the standard field may be a standard field defined by the aforementioned data item.

In an embodiment, after the data standard is generated, a database associated with the data directory may be further configured based on a database address specified in the aggregation parameter;

creating a data table of the database in the data directory based on standard fields defined by data items in the aggregation parameters; wherein, the fields in the data table are the standard fields.

As previously described in fig. 4, the data provider includes the data provider specified in the aggregation parameter; the generating the data convergence task corresponding to the data provider comprises the following steps:

If a task distribution mechanism exists, the server can push the corresponding data convergence task to the data provider based on the task distribution mechanism; and the data provider determines data to be aggregated from the local data based on the pushed data aggregation task.

After the data demander passes the audit, the data provider can execute the data aggregation task to be executed to determine the data to be aggregated from the local data, and then upload the data to be aggregated to the aggregation service system.

Step 230: and acquiring data to be aggregated, which is determined based on the data aggregation task and meets the data standard, and is uploaded by a second client corresponding to the data provider.

The data to be aggregated, which accords with the data standard, is determined based on the data aggregation task uploaded by the second client; the data to be aggregated consists of the standard field and a local data value corresponding to the standard field.

In this example, the second client may locally organize, according to the standard field defined by the data demander, data to be aggregated that is formed by the standard field and the local data value corresponding to the standard field.

As described in fig. 4, the service side may provide a data quality pre-check service for the data provider.

Before uploading the data to be aggregated, the data provider can check whether the local field of the local storage data can be in one-to-one correspondence with the standard field through the data pre-checking service provided by the aggregation service system.

Specifically, the data provider may upload the local field of the local data as the source data to the server. Correspondingly, the server receives source data uploaded by a second client corresponding to the data provider, wherein the source data comprises a local field of the data provider;

and after the verification is passed, sending a convergence instruction to the second client so that the second client can determine the data to be converged from the local data based on the data convergence task.

Wherein the verifying the local field of the source data based on the standard field of the data standard includes:

and after all the local fields and the standard fields establish a mapping relation, determining that the verification passes.

Through the data quality pre-inspection, the data provider can upload the data to be aggregated according to the data standard, so that the data aggregation is more efficiently performed.

Step 240: establishing a data convergence channel taking data to be converged as a source end and the data table as a destination end, and writing the data to be converged into the data table based on the data convergence channel.

For the manner that the data pre-inspection is not performed, that is, the aggregate data uploaded by the second client is composed of the standard field and the local data value corresponding to the standard field, step 240 may include:

And establishing a data convergence channel taking the data to be converged as a source end and the data table as a destination end, and writing local data corresponding to the standard field in the data to be converged into the field value of the same standard field in the data table based on the data convergence channel.

Since the fields in the data to be aggregated are the same standard fields as the data table, the local data can be directly written into the field values corresponding to the standard fields in the data table.

For the data pre-inspection, that is, the manner that the aggregate data uploaded by the second client is composed of the local field and the local data corresponding to the local field, step 240 may include:

establishing a data convergence channel taking data to be converged as a source end and the data table as a destination end, and writing data corresponding to a local field in the data to be converged into a field value of a standard field of which the local field is mapped in the data table based on the data convergence channel.

Since the mapping relation between the local field and the standard field is established in the data pre-examination, the local data can be written into the field value of the standard field in the data table based on the mapping relation.

In an embodiment, in the process of writing the data to be aggregated into the data table, detecting the data to be aggregated according to a value range corresponding to a standard field; and writing the converged data conforming to the value range into the data table.

In an embodiment, before the step 240, the method further includes:

Accurate data writing is not possible due to the different data carriers. The carrier type of the data to be aggregated is converted into the carrier type of the data table through the data conversion tool, so that the carrier type of the data to be aggregated is the same as the carrier type of the data table, and the data aggregation can be performed more efficiently.

In an embodiment, the method further comprises:

responding to an aggregate task query request initiated by the first client or the second client; and returning the progress information of the convergence task to the first client or the second client. The progress condition of the convergence task can be intuitively known through progress monitoring, and a user can conveniently plan a subsequent convergence plan.

In an embodiment, the method further comprises:

And scoring the data provider corresponding to the convergence task.

By scoring the data providers and ordering the scores of the data providers, a summary of the performance of each data provider on the aggregate task can be made. Further, rewards may be offered to data providers that perform better, while penalties may be offered to data providers that perform worse. Thus, the data provider can improve the service enthusiasm for the convergence task, so that the data provider can complete the convergence task more quickly and more actively, provide higher-quality data, provide more perfect and comprehensive data and the like; eventually promoting the evolution of converged services to a better state.

In summary, through the embodiments of the present disclosure, on the one hand, a function of providing a custom data standard for a data demander is utilized by using a data directory technology; the data provider can provide data to be aggregated which accords with the data standard through the data standard defined by the data demander; because each data to be converged before data convergence accords with the same data standard, the data quality is higher, and additional data conversion is not needed, so that the data convergence efficiency can be improved.

On the other hand, by providing shortcut convergence setting for the data demander, the data demander can complete configuration of the data catalog, definition of the data standard and assignment of the data provider only by selecting shortcut options in the operation interface. And then, the data aggregation is fully automatically executed by the aggregation service system, so that the data aggregation efficiency is improved.

The data aggregation service system creates a data catalog associated with the database based on the aggregation parameters of the data demander, issues data standards, creates a data table according to the data standards, and generates data aggregation tasks corresponding to all the data providers. And uploading the data to be aggregated in the local data according to the data aggregation task by each data provider, and automatically mapping the data to be aggregated into a data table one by the aggregation service system.

In yet another aspect, the aggregation service system may pre-check fields of data local to the data provider before uploading the data to be aggregated by the data provider to ensure that the data to be aggregated may be mapped to the correct data table fields.

Corresponding to the foregoing data aggregation method embodiment, the present disclosure further provides an embodiment of a data aggregation apparatus. The embodiment of the device can be implemented by software, or can be implemented by hardware or a combination of hardware and software. Taking a software implementation as an example, the device in a logic sense is formed by reading corresponding computer service program instructions in the nonvolatile memory into the memory by the processor of the device where the device is located for operation. In terms of hardware, as shown in fig. 7, a hardware structure diagram of a device where the data aggregation apparatus is located in the present specification is shown in fig. 7, and in addition to the processor, the network interface, the memory and the nonvolatile memory shown in fig. 7, the device where the apparatus is located in the embodiment generally may include other hardware according to the actual data aggregation function, which is not described herein again.

Referring to fig. 8, a block diagram of a data aggregation apparatus according to an embodiment of the present disclosure corresponds to the embodiment shown in fig. 6, and the apparatus includes:

a receiving unit 310, configured to receive a data aggregation request sent by a first client; wherein the data aggregation request includes an aggregation parameter;

a response unit 320, responsive to the data aggregation request, generating a data directory and a data standard based on the aggregation parameter, and creating a data table for storing the aggregated data in the data directory and generating a data aggregation task corresponding to a data provider;

an obtaining unit 330, configured to obtain data to be aggregated, which is determined based on the data aggregation task and meets the data standard and is uploaded by a second client corresponding to the data provider;

and the aggregation unit 340 establishes a data aggregation channel taking the data to be aggregated as a source end and the data table as a destination end, and writes the data to be aggregated into the data table based on the data aggregation channel.

in the response unit 320, a data aggregation task corresponding to the data provider is generated, including:

Optionally, the apparatus further includes: the processing unit is used for pushing corresponding data convergence tasks to the data provider based on a task distribution mechanism; and the data provider determines data to be aggregated from the local data based on the pushed data aggregation task.

Optionally, in the response unit 320, generating a data directory and a data standard based on the aggregation parameter includes: generating a data directory based on directory attributes in the aggregation parameters; generating a data standard based on the standard field in the aggregation parameter.

Optionally, in the response unit 320, a data table for storing the aggregate data is created in the data directory, including: configuring a database associated with the data directory based on the database address specified in the aggregation parameter; creating a data table of the database in the data directory based on standard fields in the aggregation parameters; wherein, the fields in the data table are the standard fields.

the writing, in the aggregation unit 340, the data to be aggregated into the data table based on the data aggregation channel includes: and writing local data corresponding to the standard field in the data to be aggregated into the field value of the same standard field in the data table based on the data aggregation channel.

Optionally, before the acquiring unit 330, the apparatus further includes: a receiving subunit, configured to receive source data uploaded by a second client corresponding to the data provider, where the source data includes a local field of the data provider;

Optionally, the verification subunit includes: establishing a mapping relation between standard fields and local fields with the same meaning through a field matching algorithm; after all local fields and standard fields establish a mapping relation, determining that the verification passes;

The writing, in the aggregation unit 340, the data to be aggregated into the data table based on the data aggregation channel includes: and writing the data corresponding to the local field in the data to be aggregated into a field value of a standard field of which the local field is mapped in the data table based on the data aggregation channel.

Optionally, before the aggregation unit 340, the apparatus further includes: and the conversion unit is used for converting the carrier type of the data to be aggregated into the carrier type of the data table based on a data conversion tool if the carrier type of the data to be aggregated is different from the carrier type of the data table.

Optionally, the aggregation unit 340 further includes: in the process of writing the data to be aggregated into the data table, detecting the data to be aggregated according to a value range corresponding to a standard field; and writing the converged data conforming to the value range into the data table.

Optionally, the apparatus further includes: the query unit responds to an aggregate task query request initiated by the first client or the second client; and returning the progress information of the convergence task to the first client or the second client.

The system, apparatus, module or unit of the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. Typical implementation devices are computers, which may be in the form of personal computers, laptop computers, cellular telephones, smart phones, personal digital assistants, media players, navigation devices, email devices, game consoles, tablet computers, wearable devices, or a combination of any of several of these devices.

The implementation process of the functions and roles of each unit in the above device is specifically shown in the implementation process of the corresponding steps in the above method, and will not be described herein again. For the device embodiments, reference is made to the description of the method embodiments for the relevant points, since they essentially correspond to the method embodiments. The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purposes of the present description. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

In the above embodiment of the electronic device, it should be understood that the processor may be a central processing unit (english: central Processing Unit, abbreviated as CPU), or may be other general purpose processors, digital signal processors (english: digital Signal Processor, abbreviated as DSP), application specific integrated circuits (english: application Specific Integrated Circuit, abbreviated as ASIC), or the like. A general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc., and the aforementioned memory may be a read-only memory (ROM), a random access memory (random access memory, RAM), a flash memory, a hard disk, or a solid state disk. The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in the processor for execution.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the electronic device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments in part.

Other embodiments of the present disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This specification is intended to cover any variations, uses, or adaptations of the specification following, in general, the principles of the specification and including such departures from the present disclosure as come within known or customary practice within the art to which the specification pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the specification being indicated by the following claims. It is to be understood that the present description is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be made without departing from the scope thereof. The scope of the present description is limited only by the appended claims.

Claims

1. A method of data aggregation, the method comprising:

responding to the data convergence request, generating a data catalog and a data standard based on the convergence parameters, creating a data table for storing convergence data in the data catalog, and generating a data convergence task corresponding to a data provider; the data directory is a visualization technology for displaying a database in a directory form, and a data table in the data directory corresponds to a database table in the database;

after the mapping relation between the local field and the standard field is established, checking;

after the verification is passed, a convergence instruction is issued to the second client so that the second client can determine data to be converged, which accords with the data standard, from local data based on a data convergence task;

2. The method of claim 1, wherein the data provider comprises a data provider specified in the aggregation parameter;

3. The method of claim 1, wherein the method further comprises;

4. A method according to claim 3, characterized in that the method further comprises:

if the task distribution mechanism does not exist, the portal of the data provider login data demander inquires whether a data convergence task to be executed exists in the data catalog or not; after inquiring the data convergence task to be executed, applying for the data requiring party; after the data demand side passes the application, the data provider determines data to be aggregated from local data based on the data aggregation task to be executed.

5. The method of claim 1, wherein the generating a data catalog and a data standard based on the aggregation parameters comprises:

6. The method of claim 4, wherein creating a data table in the data directory for storing aggregated data comprises:

7. The method according to claim 5, wherein the data to be aggregated according to the data standard is composed of the standard field and local data corresponding to the standard field;

8. The method according to claim 1, wherein before the establishing a data aggregation channel with data to be aggregated as a source end and the data table as a destination end, and writing the data to be aggregated into the data table based on the data aggregation channel, the method further comprises:

9. The method of claim 1, wherein the writing the data to be aggregated into the data table based on the data aggregation channel comprises:

10. The method according to claim 1, wherein the method further comprises:

11. The method of claim 1, wherein the aggregation parameter is generated after a shortcut option displayed in an operation interface is triggered by a data demander corresponding to the first client.

12. A data aggregation apparatus, the apparatus comprising:

the response unit is used for responding to the data convergence request, generating a data catalog and a data standard based on the convergence parameters, creating a data table for storing convergence data in the data catalog and generating a data convergence task corresponding to a data provider; the data directory is a visualization technology for displaying a database in a directory form, and a data table in the data directory corresponds to a database table in the database;

the verification subunit establishes a mapping relation between standard fields and local fields with the same meaning through a field matching algorithm; after the mapping relation between the local field and the standard field is established, checking;

the issuing subunit issues an aggregation instruction to the second client after the verification is passed, so that the second client determines data to be aggregated, which accords with the data standard, from local data based on a data aggregation task;

the aggregation unit establishes a data aggregation channel taking data to be aggregated as a source end and the data table as a destination end, and writes data corresponding to a local field in the data to be aggregated into a field value of a standard field of which the local field is mapped in the data table based on the data aggregation channel.

13. The apparatus of claim 12, wherein the data provider comprises a data provider specified in the aggregation parameter;

14. The apparatus of claim 12, wherein the apparatus further comprises:

15. The apparatus of claim 14, wherein the processing unit further comprises: if the task distribution mechanism does not exist, the portal of the data provider login data demander inquires whether a data convergence task to be executed exists in the data catalog or not; after inquiring the data convergence task to be executed, applying for the data requiring party; after the data demand side passes the application, the data provider determines data to be aggregated from local data based on the data aggregation task to be executed.

16. The apparatus of claim 12, wherein the means for generating, in the response unit, a data directory and a data standard based on the aggregation parameter comprises:

17. The apparatus of claim 16, wherein creating, in the response unit, a data table for storing aggregated data in the data directory comprises:

18. The apparatus of claim 16, wherein the data to be aggregated that meets the data standard is composed of the standard field and local data corresponding to the standard field;

19. The apparatus of claim 12, wherein prior to the aggregation unit, the apparatus further comprises:

20. The apparatus of claim 12, wherein the aggregation unit further comprises:

21. The apparatus of claim 12, wherein the apparatus further comprises:

22. The apparatus of claim 12, wherein the aggregation parameter is generated after a shortcut option displayed in an operation interface is triggered by a data demander corresponding to the first client.

23. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to the method of any of the preceding claims 1-11.