CN113505147A

CN113505147A - Data processing method and device, electronic equipment and readable storage medium

Info

Publication number: CN113505147A
Application number: CN202110853991.8A
Authority: CN
Inventors: 丁佳; 张丽春; 梅玉婷; 蔡莉莉
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2021-07-27
Filing date: 2021-07-27
Publication date: 2021-10-15

Abstract

The disclosure provides a data processing method, a data processing device, an electronic device and a readable storage medium. Mainly relates to the technical field of big data, and can be used in the financial field or other fields. The data processing method comprises the following steps: collecting original data, wherein the original data has a plurality of types, and the different types of original data have an incidence relation; constructing a plurality of data models corresponding to the categories of the original data, each data model having a plurality of data categories, the data categories including metadata categories corresponding to the data models; establishing association rules among different data models according to the association relationship, and associating the same data categories among the different data models through the association rules to generate a new data model; acquiring a plurality of key information of original data, and establishing a mapping relation of the plurality of key information according to an association rule and a data model; extracting a plurality of key information, and establishing a data network of the key information according to the mapping relation.

Description

Data processing method and device, electronic equipment and readable storage medium

Technical Field

The present disclosure relates to the field of big data technologies, and more particularly, to a data processing method, an apparatus, an electronic device, a computer-readable storage medium, and a computer program product.

Background

With the development of information technology, the types and the types of data used by different enterprises are all inconsistent, and when data information management and retrieval are performed between enterprises, data association barriers exist in different data systems and between systems when data acquisition and data retrieval are performed due to the inconsistency of the types and the types of data, and the association relationship between different data cannot be effectively and accurately acquired, so that accurate acquisition of the association data between different systems or data types cannot be realized.

Disclosure of Invention

In view of the above, the present disclosure provides a data processing method, apparatus, electronic device, computer storage medium, and computer program product that can process different types and kinds of data.

A first aspect of the present disclosure provides a data processing method, including but not limited to: collecting original data, wherein the original data has a plurality of types, and the different types of the original data have an incidence relation; building a plurality of data models corresponding to the categories of the raw data, each of the data models having a plurality of data categories, the data categories including metadata categories corresponding to the data models; establishing association rules among different data models according to the association relationship, and associating the same data categories among different data models through the association rules to generate a new data model; acquiring a plurality of pieces of key information of original data, and establishing a mapping relation of the plurality of pieces of key information according to the association rule and the data model; extracting the plurality of key information, and establishing a data network of the original data corresponding to the key information according to the mapping relation.

In an embodiment of the present disclosure, the establishing a mapping relationship of the plurality of pieces of key information according to the association rule and the data model includes: establishing a mapping relation between the key information and the data model according to the association rule; and establishing a mapping relation between the key information and the data category according to the association rule.

In an embodiment of the present disclosure, after the establishing a mapping relationship of the plurality of pieces of key information according to the association rule and the data model, the method further includes: and checking the key information of other data models according to the metadata type of the data model.

In an embodiment of the present disclosure, the verifying the key information of the other data models according to the metadata category of the data model includes: comparing key information of the same data type and from the same original data in different data models, and if the data type of the data model where the key information is located is a metadata type, determining the key information as correct information; and updating other key information from the same original data in other data models that are inconsistent and belong to the same data category as the correct information.

In an embodiment of the present disclosure, the data network includes a first data subnet and a second data subnet, the first data subnet is connected to the second data subnet, the extracting the plurality of pieces of key information, and the establishing the data network of the original data corresponding to the key information according to the mapping relationship includes: establishing a first data subnet of the original data according to the mapping relation between the extracted key information and different data models; and establishing a second data subnet of the original data according to the mapping relation between the extracted plurality of key information and the data model.

In an embodiment of the present disclosure, the association rule includes a delivery rule and an inheritance rule; the associating the same data category between different data models through the association rule to generate a new data model includes: performing intersection calculation on the same data types among different data models through the transmission rule to generate an intersection data model; and performing union calculation on the intersection data model and other data models through the inheritance rule to generate a union data model.

In an embodiment of the present disclosure, the acquiring the plurality of pieces of key information of the raw data includes: decomposing each original data to generate original information corresponding to the type of the original data; and acquiring a plurality of pieces of key information in the original information, wherein the key information corresponds to the data categories.

In the embodiment of the present disclosure, the method further includes displaying the key information in the data network through a data display unit.

In the embodiment of the present disclosure, after the mapping relationship of the plurality of pieces of key information is established according to the association rule and the data model, the method further includes storing the key information, the plurality of data models, and the mapping relationship of the plurality of pieces of key information in a structured data manner.

In the embodiment of the present disclosure, after the data network of the original data corresponding to the key information is established according to the mapping relationship, the method further includes storing the original data in a data network form in an unstructured data manner.

In an embodiment of the present disclosure, the raw data includes at least one of project information, agreement information, contract information, supplier information, cost benefit information.

A second aspect of the present disclosure provides a data processing apparatus, including but not limited to: the system comprises a collecting module, a judging module and a judging module, wherein the collecting module is configured to collect original data, the original data has a plurality of types, and the different types of the original data have an incidence relation; a building module configured to build a plurality of data models corresponding to the categories of the raw data, each of the data models having a plurality of data categories, the data categories including metadata categories corresponding to the data models; the processing module is configured to establish association rules among different data models, and associate the same data categories among different data models through the association rules to generate a new data model; the mapping module is configured to acquire a plurality of pieces of key information of the original data and establish a mapping relation of the plurality of pieces of key information according to the association rule and the data model; and the data network module is configured to extract the plurality of key information and establish a data network of the original data corresponding to the key information according to the mapping relation.

According to an embodiment of the present disclosure, the mapping module includes a mapping sub-module configured to establish a mapping relationship between the key information and the data model according to the association rule; and establishing a mapping relation between the key information and the data category according to the association rule.

According to an embodiment of the present disclosure, the data processing apparatus further includes a checking module configured to check the key information of other data models according to a metadata category of the data model.

According to an embodiment of the present disclosure, the check module includes a first check submodule and a second check submodule. The first checking submodule is configured to compare key information of the same data type and from the same original data in different data models, and if the data type of the data model where the key information is located is a metadata type, the key information is determined to be correct information. The second check-up submodule is configured to update other key information from the same original data in other data models that are inconsistent with the correct information and belong to the same data category.

According to an embodiment of the present disclosure, the data network comprises a first data subnetwork and a second data subnetwork, and the data network module comprises a first data subnetwork module and a second data subnetwork module. The first data subnet module is configured to establish a first data subnet of the original data according to the mapping relation between the extracted key information and the different data models. And the second data subnet module is configured to establish a second data subnet of the original data according to the mapping relation between the extracted plurality of key information and one data model.

According to an embodiment of the present disclosure, the processing module includes an intersection processing module and a union processing module. The intersection processing module is configured to perform intersection calculation on the same data categories among different data models through the transmission rule to generate an intersection data model. And the union processing module is configured to perform union calculation on the intersection data model and other data models through the inheritance rule to generate a union data model.

According to the embodiment of the present disclosure, the mapping module further includes an obtaining sub-module, where the obtaining sub-module is configured to decompose each of the original data to generate original information corresponding to a type of the original data; and acquiring a plurality of pieces of key information in the original information, wherein the key information corresponds to the data categories.

According to the embodiment of the present disclosure, the data processing apparatus further includes a display module configured to display the key information in the data network through a data display unit.

According to an embodiment of the present disclosure, the data processing apparatus further includes a storage module, where the storage module is configured to store the key information, the data models, and the mapping relationships of the key information in a structured data manner after the mapping relationships of the key information are established according to the association rules and the data models, and store the original data in a non-structured data manner in a data network form after a data network of the original data corresponding to the key information is established according to the mapping relationships.

A third aspect of the present disclosure provides an electronic device, including but not limited to: one or more processors; storage means for storing executable instructions which, when executed by the processor, implement the data processing method according to the above.

A fourth aspect of the present disclosure provides a computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, implement a data processing method according to the above.

A fifth aspect of the disclosure provides a computer program product, wherein the product stores a computer program, which when executed is capable of implementing the data processing method according to the above.

According to the embodiment of the disclosure, data models are built according to the types of the original data, association rules among the data models are built based on the association relation existing in the original data, and a data network of the original data corresponding to the key information is built according to the association rules and the data models. The data network of the original information can represent the incidence relation among different original data, so that a user can quickly acquire other information associated with target information to be acquired when the user retrieves data or calls the data, and the retrieval or acquisition efficiency of the related information is improved.

Drawings

The above and other objects, features and advantages of the present disclosure will become more apparent from the following description of embodiments of the present disclosure with reference to the accompanying drawings, in which:

fig. 1 schematically illustrates an application scenario in which the data processing method and apparatus may be applied according to an embodiment of the present disclosure;

FIG. 2 schematically shows a flow chart of a data processing method according to an embodiment of the present disclosure;

FIG. 3 schematically illustrates a diagram of establishing association rules between different data models according to a data processing method of an embodiment of the present disclosure;

FIG. 4 schematically shows a process diagram for establishing a data network according to a data processing method of an embodiment of the present disclosure;

FIG. 5 schematically shows a schematic structural diagram of a data network of a data processing method according to an embodiment of the present disclosure;

FIG. 6 schematically shows a block diagram of a data processing apparatus according to an embodiment of the present disclosure;

fig. 7 schematically shows a block diagram of an electronic device adapted to implement the data processing method of the present disclosure, according to an embodiment of the present disclosure.

Detailed Description

Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.

Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.). It will be further understood by those within the art that virtually any disjunctive word and/or phrase presenting two or more alternative terms, whether in the description, claims, or drawings, should be understood to contemplate the possibilities of including one of the terms, either of the terms, or both terms. For example, the phrase "a or B" should be understood to include the possibility of "a" or "B", or "a and B".

The terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, features defined as "first", "second", may explicitly or implicitly include one or more of the described features.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, necessary security measures are taken, and the customs of the public order is not violated.

The term "metadata category" refers to a category of data in a data model that corresponds to a particular data model, and a data model may correspond to one or more metadata categories. Data in the metadata category is subjected to a specific check and verification process, and the data in the data category is accurate. In different data models, metadata classes that belong to the same raw data are more accurate relative to other data classes. When there is a data record conflict, the data in the metadata category may be considered correct data.

The term "structured data" refers to data that is highly organized and well-formatted, and may be the type of data placed in tables and spreadsheets. Information that can be represented by data or a uniform structure, such as numbers, symbols. Typical structured data includes, for example: number, date, amount, address, product name, etc.

The term "unstructured data" means everything other than structured data, including, for example, text files, emails, websites, images, and the like.

The embodiment of the disclosure provides a data processing method and device. The data processing method comprises the following steps: the method comprises the steps of collecting raw data, wherein the raw data has a plurality of types, and the different types of raw data have an association relation. A plurality of data models corresponding to the categories of the raw data are constructed, each data model having a plurality of data categories, the data categories including metadata categories corresponding to the data models. And establishing association rules among different data models according to the association relationship, and associating the same data categories among different data models through the association rules to generate a new data model. Acquiring a plurality of key information of the original data, and establishing a mapping relation of the plurality of key information according to the association rule and the data model. Extracting a plurality of key information, and establishing a data network of the original data corresponding to the key information according to the mapping relation.

Fig. 1 schematically shows an application scenario of a data processing method and apparatus according to an embodiment of the present disclosure. It should be noted that fig. 1 is only an example of a data processing method and apparatus to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios. It should be noted that the data processing method and apparatus provided by the embodiment of the present disclosure may be used in the related aspects of data processing in the field of big data technology and the financial field, and may also be used in various fields other than the financial field.

As shown in fig. 1, an exemplary system architecture 100 to which the data processing method of the embodiments of the present disclosure may be applied may include

terminal devices

101, 102, 103, 104, a network 105, and a server 106. The network 105 serves as a medium for providing communication links between the

terminal devices

101, 102, 103, 104 and the server 106. Network 105 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

A user may use the

terminal devices

101, 102, 103, 104 to interact with the server 106 via the network 105, e.g. to receive or transmit information, etc., including text information, video, audio, image information, etc. Various client applications may be installed on the

terminal devices

101, 102, 103, 104, such as an image capture-type application, a data scan-type application, a search-type application, a web browser application, a search-type application, an instant messaging tool, a mailbox client, social platform software, etc. (by way of example only).

The

terminal devices

101, 102, 103, 104 may be various electronic devices having an information entry function or an information collection function and supporting data retrieval, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 106 may be a server providing various services, such as a background management server (for example only) providing support for information retrieved or entered by users using the

terminal devices

101, 102, 103, 104. The background management server may perform processing such as analysis on received data or control instructions and the like sent by the client, and feed back a processing result (e.g., a result of analysis, data and the like according to a user request) to the terminal device.

The user utilizes the

terminal devices

101, 102, 103, 104 to collect and obtain raw data 107, wherein the raw data may be one or more of project information, agreement information, contract information, supplier information, cost benefit information, and the like. The raw data 107 acquired by the terminal device may be stored in the form of text, pictures, etc., or other feasible storage manners, for example.

It should be noted that the data processing method provided by the embodiment of the present disclosure may be generally executed by the server 106. Accordingly, the data processing apparatus provided by the embodiments of the present disclosure may be generally disposed in the server 106. The data processing method provided by the embodiment of the present disclosure may also be executed by a server or a server cluster different from the server 106 and capable of communicating with the

terminal devices

101, 102, 103, 104 and/or the server 106. Accordingly, the data processing apparatus provided in the embodiments of the present disclosure may also be disposed in a server or a server cluster different from the server 106 and capable of communicating with the

terminal devices

101, 102, 103, 104 and/or the server 106.

It should be understood that the number of terminal devices, networks, and servers in the embodiments of the disclosure are merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Fig. 2 schematically shows a flow chart of a data processing method according to an embodiment of the present disclosure.

As shown in fig. 2, a flow 200 of the data processing method of the embodiment of the present disclosure includes steps S201 to S205.

In operation S201, raw data is collected, the raw data having a plurality of kinds, and different kinds of raw data have an association relationship.

For example, the raw data has a plurality of categories, and may be at least one of project information, agreement information, contract information, supplier information, cost benefit information. For example, the item information corresponds to the item type, and the item information may be, for example, basic information of the item, including but not limited to information such as an item name, an item number, an item signing authority, an item standing time, an amount of money, and a related supplier, acquired by a data system such as an enterprise-internal item management platform. The protocol information corresponds to the protocol type, and the protocol information may be, for example, protocol basic information obtained through a data system such as an enterprise internal centralized procurement management platform, including but not limited to information such as a protocol name, a protocol number, a protocol year, an entrepreneur commodity, a protocol price, and a protocol validity period. The contract information corresponds to the contract category, and the contract information may be, for example, contract basic information obtained through a data system such as an enterprise internal financial expense sharing management platform, an electronic contract signing platform and the like, including but not limited to contract number, contract name, contract year, contract amount, contract first party, contract second party, contract commodity and the like. The supplier information corresponds to the supplier category, and the supplier information may be, for example, the basic information of the supplier, including but not limited to the name of the supplier, the code of the supplier, the qualification of the supplier, the service scope of the supplier, and the like, obtained through a data system such as an enterprise internal centralized procurement management platform. The cost benefit information corresponds to the cost benefit category, and the cost benefit information may be, for example, basic cost benefit information obtained through a data system such as enterprise internal cost allocation, including but not limited to cost and benefit of a project, cost investment of a contract, cost allocation of a provider, and the like.

In the process of collecting the raw data, the raw data can be collected by means of manual input. For example, data of an item number, item contents, and the like are manually input. The raw data can also be collected and acquired by a machine device. For example, the method may be performed by a scanning device for scanning, or a photographing device for photographing.

According to an embodiment of the present disclosure, there is an association between different kinds of raw data. For example, the item information includes a contract number of the item design and information about a supplier. The contract information also includes a contract number. The item information and the contract information have an association relationship through the contract number. For another example, the protocol information includes information such as a provider. The protocol information of the original data and the project information are associated by the provider.

In operation S202, a plurality of data models corresponding to the kinds of the original data are constructed, each data model having a plurality of data categories including metadata categories corresponding to the data models.

For example, there are a plurality of types of raw data, and different data models are constructed according to the types of raw data. For example, a project information data model corresponding to project information is constructed from the project information in the raw data. Each data model has a plurality of data categories therein, the data categories being associated with specific content of the project information. For example, the data categories that may be included in the project information data model include categories of project number, project name, project amount, supplier 1, supplier 2, supplier 3, and so on.

For example, a contract information data model corresponding to contract information is constructed from the contract information in the raw data. The data categories that may be included in the contract information data model include contract number, contract name, project number, contract amount, supplier 1, supplier 4, supplier 5.

For example, a supplier information data model corresponding to the supplier information is constructed from the supplier information in the raw data. The data categories that may be included in the supplier information data model include supplier 1, supplier 2, supplier 3, supplier 4, supplier 5, supplier number.

In an embodiment of the present disclosure, a metadata category is a specific data category that the data model has, and the metadata category may be a category of data in the data model or a plurality of categories of data in the data model. When data in the metadata category is acquired, the data in the data category is more accurate through the verification and check process of a specific program. For example, taking the contract information data model as an example, the data categories in the contract information data model include contract number, contract name, contract amount, project number, supplier 1, supplier 4, and supplier 5. And the contract number, the contract name and the contract amount are metadata types in the contract information data model, and if the same original data conflicts with the data model in the contract number, the contract name and the contract amount in other data models, the data in the metadata type in the data model is taken as the standard. For example, one item information model belonging to the same original data includes a contract number of 00012, and the contract information data model also includes a contract number of 00011, and these contract numbers are conflicting. Then the correct number of the contract is determined to be 00011, subject to the metadata category in the contract information data model.

In the embodiment of the disclosure, the data model is constructed according to the type of the original data, so that information can be acquired aiming at different original data, and the information acquisition efficiency is improved.

In operation S203, association rules between different data models are established according to the association relationship, and the same data categories between different data models are associated through the association rules to generate a new data model.

In the embodiment of the present disclosure, the association represents a case where the same information exists in the original data. When the data model is constructed, the association rule between different data models is established according to the association relation, so that the established data models have association. When data retrieval or data acquisition is carried out, information in other data models can be acquired according to association rules among different data models, and the data retrieval and data acquisition efficiency is effectively improved.

In an embodiment of the present disclosure, the association rules include delivery rules and inheritance rules; associating the same data categories between different data models by association rules to generate a new data model comprises: performing intersection calculation on the same data types among different data models through a transmission rule to generate an intersection data model; and performing union calculation on the intersection data model and other data models through inheritance rules to generate a union data model.

For example, taking a simplified data model as an example:

the data category included in the data model A is { a1| a2| a3| c1}, a1, a2, a3 are metadata categories of the data model A;

the data category included in the data model B is { B1| B2| B3| a1| c1}, B1, B2, B3 are metadata categories of the data model B;

the data category included in data model C is { C1| C2}, and C1, C2 are metadata categories of data model C.

And establishing an association rule between the data model A and the data model B through an association relation, so that the data type a1 and the data type c1 exist in both the data model A and the data model B. The data type C1 exists between the data model a, the data model B, and the data model C. And associating the same data types among the data model A, the data model B and the data model C through association rules.

In the embodiment of the disclosure, the intersection data model is generated by performing intersection calculation on the same data categories among different data models through the transmission rule. For example, the data model A and the data model B are subjected to intersection calculation to generate an intersection data model AB, and the data type included in the intersection data model AB is { a1| c1 }. And generating an intersection data model ABC among the data model A, the data model B and the data model C through intersection operation, wherein the data type included in the intersection data model ABC is { C1 }.

In the embodiment of the present disclosure, according to the intersection data model, other business models having an association relationship with the business model may be obtained. For example, metadata a2 in data model A is obtained, the metadata a2 can be located to data model A and related to data model B according to metadata a1 in data model A, and other metadata B1, B2, B3 and the like in data model B can be obtained according to the business class of data model B. By intersecting the data models, for example, metadata C2 of data model C is obtained, which can be related to data model A and data model B by data category C1 contained in data model C.

In the embodiment of the disclosure, the intersection data model and other data models are subjected to union calculation through inheritance rules, and a union data model is generated. For example, the generated intersection data model AB is subjected to union calculation with the data model C, and a union data model AB £ C is generated, where the data category of the union data model AB uecis { a1| C1| C2 }.

In the embodiment of the present disclosure, the inheritance relationship may be that when two data models have a transfer relationship themselves, one of the data models needs to extend the metadata of the other data model into a supplementary attribute of itself, that is, a transfer rule. For example, the data model a and the data model B generate the data model AB through the transfer rule, which has metadata categories a1 and C1, and perform attribute expansion with the data model C again, so as to finally obtain metadata categories a1, C1 and C2.

The basis of the data model establishment is to analyze the acquired original data, establish a plurality of data models, then decompose metadata (attributes or data information) of the data models, find an association relation, and finally determine a composite business model with a transmission rule or an inheritance rule, such as a data model AB and a data model AB U C, generated on one of the data models.

The process of obtaining metadata categories a1, a2, a3 and the like of the data model a can be achieved by means of manual experience obtaining (complex business scenes), automatic keyword retrieval splitting (for example, common names such as contract numbers, money amounts and Party A in product information are recognized), and artificial intelligence algorithm semantic analysis (for example, grammatical semantics in product information are recognized, such as subject, object names and names after common verbs) and the like.

Fig. 3 schematically shows a schematic diagram of establishing association rules between different data models according to a data processing method of an embodiment of the present disclosure.

As shown in fig. 3, the project information data model 310 includes a plurality of data categories, such as project name, project amount, project number, supplier 1, supplier 2, and supplier 3. The contract information data model 320 includes a plurality of data categories, such as contract number, contract name, contract amount, project number, supplier 1, supplier 4, supplier 5. The intersection of the project information data model 310 and the contract information data model 320 is calculated to obtain a new data model 330, which includes the project number and the data category of the supplier 1.

In operation S204, a plurality of pieces of key information of the original data are obtained, and a mapping relationship of the plurality of pieces of key information is established according to the association rule and the data model.

According to an embodiment of the present disclosure, the key information may be information in the raw data corresponding to data categories of the constructed data models, one data category of each data model corresponding to one key information. For example, as for the contract information, a contract number, a contract name, and the like may be extracted key information.

For example, the original data includes various data, and it is necessary to decompose the original data to generate original information corresponding to the kind of the original data. For example, the collected raw data includes project information, agreement information, contract information, cost benefit information, supplier information, and the like, and different information types are decomposed, and more detailed information can be acquired from each type of information. For example, information such as a contract number, a contract name, and a contract amount is extracted from the contract information, and the extracted information is provided as key information. The key information corresponds to a data category, e.g., the detailed contract number extracted from the original contract corresponds to the contract number data category in the contract information data model.

In an embodiment of the present disclosure, establishing a mapping relationship of the plurality of pieces of key information according to the association rule and the data model includes establishing a mapping relationship of the pieces of key information and the data model according to the association rule; and establishing a mapping relation between the key information and the data category according to the association rule.

Fig. 4 schematically shows a process diagram of establishing a data network according to the data processing method of the embodiment of the present disclosure.

For example, as shown in fig. 4, after the key information is acquired, the data model corresponding to the original data corresponding to the key information is determined, so as to establish a mapping relationship between the key information and the data model, thereby constructing a mapping relationship network 410. For example, after key information in a certain contract is obtained, it is determined that a data model corresponding to original information from which the key information comes is a contract information data model, and a mapping relationship between the key information and the contract information data model is established. For another example, after the key information in a certain contract is obtained, it is determined that the key information is from a certain type of data category in the data model corresponding to the original information (e.g., the contract number data category in the data model corresponding to the contract information), and then a mapping relationship between the key information and the contract number data category in the contract information data model is established.

According to the embodiment of the disclosure, the mapping relation is established by acquiring a plurality of key data of the original data, and the associated content can be quickly retrieved according to different information.

In the embodiment of the disclosure, after the mapping relationship of the plurality of pieces of key information is established according to the association rule and the data model, the method further includes storing the mapping relationship of the plurality of pieces of key information, the plurality of data models and the plurality of pieces of key information in a structured data manner.

For example, the data model corresponds to a certain kind of original data, and for a certain key information of the original data, a mapping relation to the data type of the data model is formed. According to the key information, the data model corresponding to the key information and the data type of the data model can be obtained.

In embodiments of the present disclosure, retrieval of information is facilitated by storing in structured data.

In operation S205, a plurality of pieces of key information are extracted, and a data network of original data corresponding to the key information is established according to the mapping relationship.

For example, as shown in fig. 4, a data network 420 of raw data is established according to the mapping relationship of a plurality of key information. The data network may be established according to the collected original data, and according to the association rule between the data models, the association relationship between the original data may be calculated, that is, the key information obtained from the original data has a mapping relationship with the plurality of data models and the data categories of the data models, and the data network is further established according to the mapping relationship. According to the constructed data network, when a user inputs original data, the metadata type associated with the original data can be judged according to the original data, the data model is further determined according to the metadata type, and the information of other related original data is pushed out according to the association rule of the data model and other data models.

For example, in a data network where original data corresponding to key information is established according to a mapping relationship, different original information has an association according to the mapping relationship. In the constructed data network, the user can be associated to the data information of other key information on the data network according to the input key information.

Fig. 5 schematically shows a schematic structural diagram of a data network of a data processing method according to an embodiment of the present disclosure.

As shown in fig. 5, the data model is constructed according to the kind of the original data, the entity in fig. 5 refers to the original data corresponding to the data model, for example, the entity in the contract information data model represents the details of each contract. The attributes in fig. 5 represent the corresponding key information in each raw data. For example, the content of each contract includes the contract number, the contract name, the contract amount, and the like, and these key information may be the attributes in fig. 5. Taking fig. 5 as an example, when a user inputs a single attribute, such as attribute 1 of entity 3 (for example, a contract number in a specific contract), the data information entities 1 and 2 corresponding to other attributes 1 on the data network can be associated (for example, associated with standing information and provider information related thereto). When the user simultaneously supplements the attribute 6 of the input entity 4, the data paths from the attribute 6 to the data model a and the data model B are simultaneously generated, and at this time, a new data link can be formed between the entities 2, 3, 4, and the entity 4 is associated to the entity 1 through the bridging entity 3 (the common attribute 1 exists between the entities 2 and 3 and the entity 1), so that a data path which is easy to ignore or omit through manual experience is formed. When the number of data entities is quite large, the traditional way of retrieving the relationship between the data information becomes almost impossible, so that the corners of the data information of the business product can be completely covered to the maximum extent by constructing a data network.

In the embodiment of the disclosure, after the data network of the original data corresponding to the key information is established according to the mapping relationship, the original data is stored in a data network form in an unstructured data manner.

For example, the constructed data network of the original data is stored through an unstructured data format, such as a hadoop database, database technologies such as a data warehouse are established, and an efficient and flexible information retrieval mode of the data network is provided for users by combining technical schemes such as data crawling.

In the embodiment of the disclosure, because the batch processing of the constructed data network needs to perform recursion and various regular operations on a large amount of entity data (collected original data) and data models, and the data volume of a large-scale enterprise often reaches over ten million levels, distributed computing can be adopted for scenes with high access data information level to improve the processing timeliness of the system.

In the embodiment of the disclosure, after the data network of the original data is established, the key information in the data network is displayed through the data display unit.

For example, a professional system in an enterprise provides user exchange scenes such as display, query and the like for data information results of a data network, and a data interface for providing display and number checking services for external clients or other professional terminals is provided through a channel interface connected with the system. The device for presentation may be, for example, an electronic device with a display screen capable of displaying information, including a smart phone, a tablet, a computer, a television, and so on.

In an embodiment of the present disclosure, after the mapping relationship of the plurality of pieces of key information is established according to the association rule and the data model, the method further includes: and checking the key information of other data models according to the metadata type of the data models.

For example, the obtained original data is not completely correct, and after a plurality of pieces of key information in the original data are obtained, other pieces of key information need to be checked, so that the accuracy of the data is improved.

In an embodiment of the present disclosure, verifying key information of other data models according to metadata categories of the data models includes: comparing key information of the same data type and from the same original data in different data models, and if the data type of the data model where the key information is located is a metadata type, determining the key information as correct information; and updating other key information from the same original data in other data models that are inconsistent and belong to the same data category as the correct information.

For example, taking the contract information data model as an example, the metadata categories in the contract information data model include contract number, contract name, and contract amount. For a contract from the raw data, key information in the raw data is extracted, such as a specific contract number, a contract name, and a contract amount. When the contract number is compared with the key information in the contract number in the project information data model, for example, the corresponding contract number data category does not belong to the metadata category of the project information data model. And if the data type of the specific contract number in the contract information data model is the metadata type, determining the key information as correct information. And when the contract number in the item information data model is not consistent with the contract number in the contract information data model, replacing the contract number in the item information data model by the contract number in the contract information data model, and taking the specific data in the metadata category in the contract information data model as correct information to update other data inconsistent with the specific data in the metadata category.

In an embodiment of the present disclosure, the data network includes a first data subnet and a second data subnet, the first data subnet and the second data subnet are connected, the plurality of pieces of key information are extracted, and the data network that establishes the original data corresponding to the key information according to the mapping relationship includes: establishing a first data subnet of the original data according to the mapping relation between the extracted key information and different data models; and establishing a second data subnet of the original data according to the mapping relation between the extracted plurality of key information and the data model.

For example, the same extracted key information may be from multiple original data, and for example, a specific contract number may be content in the contract information data model or content in the project information data model. And establishing the first data subnet according to the mapping relation between the key information and the data model corresponding to the key information.

For another example, the extracted key information may be multiple pieces, where the multiple pieces of key information belong to different data categories of a certain data model, and the second data subnet of the original data is established according to a mapping relationship between the key information and the different data categories of the certain data model.

According to the embodiment of the disclosure, the first data subnet and the second data subnet are established, so that the data of the original data form the correlation relationship with each other, when a user searches information, the user can obtain a plurality of original data related to the information to be searched by the user according to the data network formed by the first data subnet and the second data subnet, and the problem of information omission during searching is effectively avoided.

Fig. 6 schematically shows a block diagram of a data processing apparatus according to an embodiment of the present disclosure.

As shown in fig. 6, the data processing apparatus 600 includes a collection module 610, a construction module 620, a processing module 630, a mapping module 640, and a data network module 650.

The collection module 610 is configured to collect raw data, where the raw data has multiple categories, and different categories of raw data have an association relationship. In an embodiment, the collecting module 610 may be configured to perform the operation S201 described above, which is not described herein again.

A building module 620 configured to build a plurality of data models corresponding to the categories of the raw data, each data model having a plurality of data categories, the data categories including metadata categories corresponding to the data models. In an embodiment, the building module 620 may be configured to perform the operation S202 described above, which is not described herein again.

The processing module 630 is configured to establish association rules between different data models, and associate the same data categories between different data models through the association rules to generate a new data model. In an embodiment, the processing module 630 may be configured to perform the operation S203 described above, which is not described herein again.

The mapping module 640 is configured to obtain a plurality of pieces of key information of the original data, and establish a mapping relationship of the plurality of pieces of key information according to the association rule and the data model. In an embodiment, the mapping module 640 may be configured to perform the operation S204 described above, which is not described herein again.

The data network module 650 is configured to extract the plurality of key information, and establish a data network of the original data corresponding to the key information according to the mapping relationship. In an embodiment, the data network module 650 may be configured to perform the operation S205 described above, which is not described herein again.

According to an embodiment of the present disclosure, the mapping module includes a mapping submodule configured to establish a mapping relationship between the key information and the data model according to the association rule; and establishing a mapping relation between the key information and the data category according to the association rule.

According to an embodiment of the present disclosure, the data processing apparatus further includes a checking module 660, and the checking module 660 is configured to check key information of other data models according to metadata categories of the data models.

According to an embodiment of the present disclosure, the check module includes a first check submodule and a second check submodule. The first checking submodule is configured to compare key information of the same data type and from the same original data in different data models, and if the data type of the data model where the key information is located is a metadata type, the key information is determined to be correct information. The second check-up submodule is configured to update other key information from the same original data in the other data model that is inconsistent with the correct information and belongs to the same data category.

According to an embodiment of the present disclosure, the data network comprises a first data subnetwork and a second data subnetwork, and the data network module comprises a first data subnetwork module and a second data subnetwork module. The first data subnet module is configured to establish a first data subnet of the original data according to the mapping relation between the extracted key information and different data models. The second data subnet module is configured to establish a second data subnet of the original data according to the mapping relationship between the extracted plurality of key information and the data model.

According to an embodiment of the present disclosure, the processing module includes an intersection processing module and a union processing module. The intersection processing module is configured to perform intersection calculation on the same data categories among different data models through a transmission rule to generate an intersection data model. And the union processing module is configured to perform union calculation on the intersection data model and other data models through inheritance rules to generate a union data model.

According to the embodiment of the disclosure, the mapping module further comprises an obtaining submodule configured to decompose each original data and generate original information corresponding to the type of the original data; and acquiring a plurality of pieces of key information in the original information, wherein the key information corresponds to the data category.

According to an embodiment of the present disclosure, the data processing apparatus further includes a presentation module 670, and the presentation module 670 is configured to present the key information in the data network through the data presentation unit.

According to the embodiment of the disclosure, the data processing apparatus further includes a storage module 680, and the storage module 680 is configured to store the key information, the plurality of data models, and the mapping relationship of the plurality of key information in a structured data manner after the mapping relationship of the plurality of key information is established according to the association rule and the data model, and store the original data in a non-structured data manner in a data network form after a data network of the original data corresponding to the key information is established according to the mapping relationship.

Any number of modules, sub-modules, units, sub-units, or at least part of the functionality of any number thereof according to embodiments of the present disclosure may be implemented in one module. Any one or more of the modules, sub-modules, units, and sub-units according to the embodiments of the present disclosure may be implemented by being split into a plurality of modules. Any one or more of the modules, sub-modules, units, sub-units according to embodiments of the present disclosure may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in any other reasonable manner of hardware or firmware by integrating or packaging a circuit, or in any one of or a suitable combination of software, hardware, and software. Alternatively, one or more of the modules, sub-modules, units, sub-units according to embodiments of the disclosure may be at least partially implemented as a computer program module, which when executed may perform the corresponding functions.

For example, any plurality of the collection module 610, the construction module 620, the processing module 630, the mapping module 640, the data network module 650, the check module 660, the presentation module 670, the storage module 680, the mapping sub-module, the first check sub-module, the second check sub-module, the first data sub-network module, the second data sub-network module, the intersection processing module, the union processing module, and the obtaining sub-module may be combined and implemented in one module, or any one of them may be split into a plurality of modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of the other modules and implemented in one module. According to an embodiment of the present disclosure, at least one of the collection module 610, the construction module 620, the processing module 630, the mapping module 640, the data net module 650, the check module 660, the presentation module 670, the storage module 680, the mapping submodule, the first check submodule, the second check submodule, the first data subnet module, the second data subnet module, the intersection processing module, the union processing module, and the fetch submodule may be at least partially implemented as a hardware circuit, such as Field Programmable Gate Arrays (FPGAs), Programmable Logic Arrays (PLAs), systems on a chip, systems on a substrate, systems on a package, Application Specific Integrated Circuits (ASICs), or may be implemented in hardware or firmware in any other reasonable way of integrating or packaging circuits, or in any one of three implementations, software, hardware and firmware, or in any suitable combination of any of them. Alternatively, at least one of the collection module 610, the construction module 620, the processing module 630, the mapping module 640, the data network module 650, the check module 660, the presentation module 670, the storage module 680, the mapping submodule, the first check submodule, the second check submodule, the first data subnet module, the second data subnet module, the intersection processing module, the union processing module, and the acquisition submodule may be at least partially implemented as a computer program module, which when executed, may perform a corresponding function.

Fig. 7 schematically shows a block diagram of an electronic device adapted to implement the above described data processing method according to an embodiment of the present disclosure. The electronic device shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 7, an electronic device 700 according to an embodiment of the present disclosure includes a processor 701, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. The processor 701 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), among others. The processor 701 may also include on-board memory for caching purposes. The processor 701 may comprise a single processing unit or a plurality of processing units for performing the different actions of the method flows according to embodiments of the present disclosure.

In the RAM 703, various programs and data necessary for the operation of the electronic apparatus 700 are stored. The processor 701, the ROM 702, and the RAM 703 are connected to each other by a bus 704. The processor 701 performs various operations of the method flows according to the embodiments of the present disclosure by executing programs in the ROM 702 and/or the RAM 703. It is noted that the programs may also be stored in one or more memories other than the ROM 702 and RAM 703. The processor 701 may also perform various operations of method flows according to embodiments of the present disclosure by executing programs stored in the one or more memories.

Electronic device 700 may also include input/output (I/O) interface 705, which input/output (I/O) interface 705 is also connected to bus 704, according to an embodiment of the present disclosure. The system 700 may also include one or more of the following components connected to the I/O interface 705: an input portion 706 including a keyboard, a mouse, and the like; an output section 707 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 708 including a hard disk and the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. A drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read out therefrom is mounted into the storage section 708 as necessary.

The present disclosure also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement the method according to an embodiment of the disclosure.

According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, a computer-readable storage medium may include the ROM 702 and/or the RAM 703 and/or one or more memories other than the ROM 702 and the RAM 703 described above.

Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the method provided by the embodiments of the present disclosure, when the computer program product is run on an electronic device, the program code being adapted to cause the electronic device to carry out the data processing method provided by the embodiments of the present disclosure.

The computer program, when executed by the processor 701, performs the above-described functions defined in the system/apparatus of the embodiments of the present disclosure. The systems, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.

In one embodiment, the computer program may be hosted on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted in the form of a signal on a network medium, distributed, downloaded and installed via the communication section 709, and/or installed from the removable medium 711. The computer program containing program code may be transmitted using any suitable network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

In accordance with embodiments of the present disclosure, program code for executing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, these computer programs may be implemented using high level procedural and/or object oriented programming languages, and/or assembly/machine languages. The programming language includes, but is not limited to, programming languages such as Java, C + +, python, the "C" language, or the like. The program code may execute entirely on the user computing device, partly on the user device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not expressly recited in the present disclosure. In particular, various combinations and/or combinations of the features recited in the various embodiments and/or claims of the present disclosure may be made without departing from the spirit or teaching of the present disclosure. All such combinations and/or associations are within the scope of the present disclosure.

The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used in advantageous combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.

Claims

1. A data processing method, comprising:

collecting original data, wherein the original data has a plurality of types, and the different types of the original data have an incidence relation;

building a plurality of data models corresponding to the categories of the raw data, each of the data models having a plurality of data categories, the data categories including metadata categories corresponding to the data models;

establishing association rules among different data models according to the association relationship, and associating the same data categories among different data models through the association rules to generate a new data model;

acquiring a plurality of pieces of key information of original data, and establishing a mapping relation of the plurality of pieces of key information according to the association rule and the data model;

extracting the plurality of key information, and establishing a data network of the original data corresponding to the key information according to the mapping relation.

2. The data processing method of claim 1, wherein the establishing a mapping relationship of the plurality of key information according to the association rule and the data model comprises:

establishing a mapping relation between the key information and the data model according to the association rule; and

and establishing a mapping relation between the key information and the data category according to the association rule.

3. The data processing method according to claim 2, wherein after the establishing the mapping relationship of the plurality of key information according to the association rule and the data model, further comprising:

and checking the key information of other data models according to the metadata type of the data model.

4. The data processing method of claim 3, wherein the verifying the key information of other data models according to the metadata category of the data model comprises:

comparing key information of the same data type and from the same original data in different data models, and if the data type of the data model where the key information is located is a metadata type, determining the key information as correct information; and

and updating other key information which belongs to the same data category and is inconsistent with the correct information from the same original data in other data models.

5. The data processing method according to claim 2, wherein the data network comprises a first data subnetwork and a second data subnetwork, the first data subnetwork and the second data subnetwork being connected,

the extracting of the plurality of key information and the establishing of the data network of the original data corresponding to the key information according to the mapping relation comprise:

establishing a first data subnet of the original data according to the mapping relation between the extracted key information and different data models; and

and establishing a second data subnet of the original data according to the mapping relation between the extracted plurality of key information and one data model.

6. The data processing method of any of claims 1 to 5, wherein the association rules comprise delivery rules and inheritance rules;

the associating the same data category between different data models through the association rule to generate a new data model includes:

performing intersection calculation on the same data types among different data models through the transmission rule to generate an intersection data model; and

and performing union calculation on the intersection data model and other data models through the inheritance rule to generate a union data model.

7. The data processing method of any of claims 1 to 5, wherein the obtaining a plurality of key information of raw data comprises:

decomposing each original data to generate original information corresponding to the type of the original data;

and acquiring a plurality of pieces of key information in the original information, wherein the key information corresponds to the data categories.

8. The data processing method according to any one of claims 1 to 5, further comprising presenting the key information in the data network by a data presentation unit.

9. The data processing method according to any one of claims 1 to 5, wherein after the mapping relationship of the plurality of pieces of key information is established according to the association rule and the data model, the method further comprises storing the key information, the plurality of data models, and the mapping relationship of the plurality of pieces of key information in a structured data manner.

10. The data processing method according to any one of claims 1 to 5, wherein after establishing a data network of raw data corresponding to the key information according to the mapping relationship, the method further comprises storing the raw data in a data network form as unstructured data.

11. The data processing method of any of claims 1 to 5, the raw data comprising at least one of project information, agreement information, contract information, supplier information, cost benefit information.

12. A data processing apparatus comprising:

the system comprises a collecting module, a judging module and a judging module, wherein the collecting module is configured to collect original data, the original data has a plurality of types, and the different types of the original data have an incidence relation;

a building module configured to build a plurality of data models corresponding to the categories of the raw data, each of the data models having a plurality of data categories, the data categories including metadata categories corresponding to the data models;

the processing module is configured to establish association rules among different data models, and associate the same data categories among different data models through the association rules to generate a new data model;

the mapping module is configured to acquire a plurality of pieces of key information of the original data and establish a mapping relation of the plurality of pieces of key information according to the association rule and the data model;

and the data network module is configured to extract the plurality of key information and establish a data network of the original data corresponding to the key information according to the mapping relation.

13. An electronic device, comprising:

one or more processors;

storage means for storing executable instructions which, when executed by the processor, implement a data processing method according to any one of claims 1 to 11.

14. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, implement a data processing method according to any one of claims 1 to 11.

15. A computer program product, wherein the product stores a computer program which, when executed, is capable of implementing a data processing method according to any one of claims 1 to 11.