CN111813770B

CN111813770B - Data model construction method and device and computer readable storage medium

Info

Publication number: CN111813770B
Application number: CN202010913750.3A
Authority: CN
Inventors: 武正彪
Original assignee: Ping An International Smart City Technology Co Ltd
Current assignee: Ping An International Smart City Technology Co Ltd
Priority date: 2020-09-03
Filing date: 2020-09-03
Publication date: 2021-01-19
Anticipated expiration: 2040-09-03
Also published as: CN111813770A

Abstract

The invention relates to a big data technology, and discloses a method for constructing a data model, which comprises the following steps: the method comprises the steps of obtaining preset type data from a plurality of preset data sources, converting the preset type data into standardized data, constructing a standardized data pool based on the standardized data, determining a plurality of virtual labels and data extraction rules corresponding to the virtual labels, extracting the standardized data corresponding to the virtual labels from the standardized data pool according to the data extraction rules, constructing a virtual label pool based on the standardized data corresponding to the virtual labels, selecting core data from the standardized data pool and the virtual label pool, constructing a plurality of core data fields based on the core data, extracting data to be processed from the core data corresponding to the core data fields, and combining the data to be processed to form a plurality of fused data fields. The invention solves the problems of inconsistent data statistical apertures and difficult access in the process of collecting smart city data.

Description

Data model construction method and device and computer readable storage medium

Technical Field

The invention relates to the technical field of big data, in particular to a method and a device for constructing a data model, electronic equipment and a computer-readable storage medium.

Background

The smart city utilizes various information technologies or innovative concepts to communicate and integrate the system and service of the city, so as to improve the efficiency of resource application, optimize city management and service, and improve the quality of life of citizens.

With the continuous development of science and technology, data of smart cities are continuously gathered and accumulated, and deep application of various technologies such as big data, artificial intelligence and block chains strengthens integration and unified enabling of data commonality, so that the method becomes an inevitable choice for eliminating data islands and supporting linkage of upper-layer service bars and blocks. However, for a long time, the smart city data integration lacks of effective theoretical and technical methods, and a unified smart city data model construction method is not established, so that the problems of inconsistent data statistical apertures and difficult data acquisition in the smart city data collection process are caused.

Disclosure of Invention

The invention mainly aims to provide a method and a device for constructing a data model, electronic equipment and a computer readable storage medium, and aims to solve the problems of inconsistent data statistical apertures and difficult data acquisition in the data collection process of a smart city.

In order to achieve the above object, the present invention provides a method for constructing a data model, the method comprising:

acquiring preset type data from a plurality of preset data sources, and constructing a collection combined data pool based on the preset type data;

converting preset type data in the collection combined data pool into standardized data, and constructing a standardized data pool based on the standardized data;

determining a plurality of virtual labels and data extraction rules corresponding to the virtual labels, extracting standardized data corresponding to the virtual labels from the standardized data pool according to the data extraction rules corresponding to the virtual labels, and constructing the virtual label pool based on the standardized data corresponding to the virtual labels;

selecting core data from the standardized data pool and the virtual label pool, and constructing a plurality of core data domains based on the core data;

and extracting data to be processed from the core data corresponding to each core data domain, and combining the data to be processed to form a plurality of fusion data domains.

Optionally, the preset type data includes a plurality of fields, and the converting the preset type data in the collection combined data pool into standardized data includes:

selecting a field to be processed from the preset type data;

reading the field name of the field to be processed, and searching a corresponding standard data element in a preset standard data element library according to the field name;

reading the name, definition, identification, representation and allowable values of the data elements from the searched standard data elements, and acquiring the definition, identification, representation and allowable values of the fields to be processed from the collection combined data pool;

comparing the field name, definition, identification, representation and permission values of the field to be processed with the read name, definition, identification, representation and permission values of the data element respectively to determine whether a difference exists;

when the difference exists, the field name, the definition, the identification, the representation and the allowed value of the field to be processed are modified according to the read name, the definition, the identification, the representation and the allowed value of the data element.

Optionally, the extracting, according to the data extraction rule corresponding to each virtual tag, the normalized data corresponding to each virtual tag from the normalized data pool, and constructing the virtual tag pool based on the normalized data corresponding to each virtual tag includes:

extracting the structured query statement corresponding to each virtual tag from each data extraction rule;

querying standardized data corresponding to each virtual label in the standardized data pool according to the structured query statement corresponding to each virtual label;

judging whether the data extraction rules corresponding to the virtual labels comprise analysis sub-rules or not;

when the data extraction rule corresponding to one virtual label comprises an analysis sub-rule, analyzing and processing the standardized data corresponding to the virtual label according to the analysis sub-rule, and storing the data obtained by analysis as the associated data corresponding to the virtual label;

when the data extraction rule corresponding to one virtual label does not comprise an analysis sub-rule, storing the standardized data corresponding to the virtual label as the associated data corresponding to the virtual label;

and performing database modeling operation based on the associated data corresponding to each virtual label to obtain a virtual label pool.

Optionally, the core data field includes one or more of a demographic information data field, a legal unit data field, a city component data field, an electronic license data field, a macro-economic data field, a spatial geographic data field, and a social credit data field.

Optionally, the converged data domain includes one or more of an economic operation converged data domain, a public service converged data domain, a safety and stability maintenance converged data domain, a city management converged data domain, a carrier environment converged data domain, and a social security converged data domain.

Optionally, the obtaining preset type data from a plurality of preset data sources and constructing a collection combined data pool based on the preset type data includes:

establishing communication connection with a plurality of preset data sources, acquiring preset type data from each preset data source, and acquiring a data model of each preset data source;

and constructing a collection combined data pool based on the preset type data and the data model of each preset data source.

Optionally, the constructing a collection federated data pool includes:

determining a data structure and a data relationship of preset type data corresponding to each preset data source according to the data model of each preset data source;

and according to the preset type data and the data structure and data relation corresponding to the preset type data, performing database modeling operation to obtain a collection combined data pool.

In order to solve the above problem, the present invention further provides an apparatus for constructing a data model, the apparatus comprising:

the data collecting module is used for acquiring preset type data from a plurality of preset data sources and constructing a collecting combined data pool based on the preset type data;

the standardization module is used for converting preset type data in the collection combined data pool into standardized data and constructing a standardized data pool based on the standardized data;

the virtual label pool building module is used for determining a plurality of virtual labels and data extraction rules corresponding to the virtual labels, extracting standardized data corresponding to the virtual labels from the standardized data pool according to the data extraction rules corresponding to the virtual labels, and building the virtual label pool based on the standardized data corresponding to the virtual labels;

the core data domain building module is used for selecting core data from the standardized data pool and the virtual label pool and building a plurality of core data domains based on the core data;

and the fusion data domain construction module is used for extracting data to be processed from the core data corresponding to each core data domain, and combining the data to be processed to form a plurality of fusion data domains.

In order to solve the above problem, the present invention also provides an electronic device, including:

a memory storing at least one instruction; and

and the processor executes the instructions stored in the memory to realize the construction method of the data model.

In order to solve the above problem, the present invention further provides a computer-readable storage medium, in which at least one instruction is stored, and the at least one instruction is executed by a processor in an electronic device to implement the above-mentioned data model building method.

According to the invention, the data of a plurality of preset data sources are collected, and screening, combination and analysis are carried out based on the collected data, so that the problems of inconsistent data statistical apertures and difficulty in data acquisition in the process of collecting smart city data are solved, the efficiency and convenience for obtaining data by a user are improved, unnecessary data redundancy in city management is greatly reduced, the multiplexing of calculation results can be realized, and the storage and calculation cost in a large data system is greatly reduced.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the structures shown in the drawings without creative efforts.

Fig. 1 is a schematic flow chart of a method for constructing a data model according to an embodiment of the present invention;

FIG. 2 is a block diagram of an apparatus for constructing a data model according to an embodiment of the present invention;

fig. 3 is a schematic internal structural diagram of an electronic device implementing a method for constructing a data model according to an embodiment of the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.

The invention provides a construction method of a data model. Fig. 1 is a schematic flow chart of a method for constructing a data model according to an embodiment of the present invention. The method may be performed by an apparatus, which may be implemented by software and/or hardware.

In this embodiment, the method for constructing the data model includes:

step S10, acquiring preset type data from a plurality of preset data sources, and constructing a collection combined data pool based on the preset type data.

For example, the step S10 includes steps S11, S12 (not shown in the figure):

step S11, establishing communication connection with multiple preset data sources, obtaining preset type data from each preset data source, and obtaining a data model of each preset data source.

In detail, after establishing communication connection with each preset data source, a data model (the data model includes a conceptual model, a logical model, and a physical model) corresponding to each preset data source may be obtained, and preset type data may be collected in each preset data source, or a uniform WebService interface may be constructed, and some preset data sources may actively upload the preset type data through the WebService interface.

The preset data source may be set according to a specific application scenario, for example, a public information platform of a government organization (e.g., an industry and commerce bureau, a national tax bureau, a local tax bureau, a labor and social security bureau, a national resource bureau, a bidding center, a customs, a court, a public security bureau, etc.), a database of a financial institution (e.g., a bank, an insurance company, etc.), and the like.

The preset type data may be various types of data, and one or more types of data may be selected as the preset type data according to a specific application scenario, for example, city population information (such as name, age, political aspect, birth date, contact telephone, home address of city population), city enterprise information (such as enterprise name, registered address, operating address, registered fund), city economic information (such as GDP of city and each region, fixed asset investment, sales data of consumer market, etc., major project investment, external trade, finance, related index data of each industry), geospatial information (such as geological and mineral resources, marine environment state, geological earthquake structure, farmland grassland condition, wetland desert, water source and water system distribution, urban and rural construction planning, integrated traffic layout, airspace airline, network resource distribution, Key hydraulic engineering distribution, administrative divisions and place names, postal codes and addresses, geographic data resource information), city component information (such as city appearance environment, landscaping, road traffic, public facilities, plots, districts, buildings, house essential information), and the like.

And step S12, constructing a collection combined data pool based on the preset type data and the data model of each preset data source.

The collection unified data pool may be a predetermined storage space (e.g., a database) for storing preset type data acquired from a preset data source, and the preset type data may be stored in the collection unified data pool in the form of a data table.

The building and collecting combined data pool comprises the following steps:

And step S20, converting the preset type data in the collection combined data pool into standardized data, and constructing a standardized data pool based on the standardized data.

In this embodiment, if the preset type data is stored in the collection combined data pool in the form of a data table, the collection combined data pool includes a plurality of preset type data tables, each preset type data table includes at least one field (for example, some preset type data tables use columns as fields, and one column is one field), and the field attribute of each field is also stored in the collection combined data pool, where the field attribute includes: definition, identification, representation, and permission values.

The standardized data pool comprises a plurality of standardized data tables for storing standardized data and table information corresponding to each standardized data table, wherein the table information comprises a table name, a data table type, a storage mode, a storage position, creation time, a field name, a field type, a field attribute and the like.

Converting preset type data in the collection combined data pool into standardized data, wherein the method comprises the following steps:

selecting a field to be processed from the preset type data; reading the field name of the field to be processed, and searching a corresponding standard data element in a preset standard data element library according to the field name; reading the name, definition, identification, representation and allowable values of the data elements from the searched standard data elements, and acquiring the definition, identification, representation and allowable values of the fields to be processed from the collection combined data pool; comparing the field name, definition, identification, representation and permission values of the field to be processed with the read name, definition, identification, representation and permission values of the data element respectively to determine whether a difference exists; when the difference exists, the field name, the definition, the identification, the representation and the allowed value of the field to be processed are modified according to the read name, the definition, the identification, the representation and the allowed value of the data element.

For example, the converting the preset type data in the collection combined data pool into the standardized data may further include steps S21-S26 (not shown in the figure):

and step S21, selecting fields from the preset type data one by one as fields to be processed.

And step S22, reading the field name of the field to be processed, and searching the corresponding standard data element in a preset standard data element database according to the field name.

The preset standard data element library comprises a plurality of standard data elements.

A data element, also called a data element, is a unit of data whose definition, identification, representation and permission values are described by a set of attributes. And the standard data element is a data element set according to a preset standard. The preset standard may refer to a national, local, or industrial data metadata standard, or may set a corresponding data metadata standard according to a specific application scenario, which is not limited in the present invention.

Step S23, reading the name, definition, identifier, representation, and permission value of the data element from the found standard data element, and obtaining the definition, identifier, representation, and permission value of the field to be processed from the collection union data pool.

Step S24, comparing the field name, definition, identification, representation and permission value of the field to be processed with the read name, definition, identification, representation and permission value of the data element respectively to determine whether there is a difference, and when there is no difference, executing step S26, and when there is a difference, executing step S25.

Step S25, according to the read name, definition, identifier, representation and permission of the data element, modifying the field name, definition, identifier, representation and permission of the field to be processed, and proceeding to step S26.

Step S26, determining whether there is an unselected field in the preset type data, if yes, executing step S21, and if not, ending the process.

Step S30, determining a plurality of virtual tags and data extraction rules corresponding to the virtual tags, extracting standardized data corresponding to the virtual tags from the standardized data pool according to the data extraction rules corresponding to the virtual tags, and constructing a virtual tag pool based on the standardized data corresponding to the virtual tags.

In this embodiment, the standardized data is stored in the standardized data pool in the form of a data table, and the data table storing the standardized data may be referred to as a standardized data table. Before the step S30, steps S31, S32 (not shown in the figure) are further included:

step S31, reading the table information of each standardized data table from the standardized data pool, extracting the metadata corresponding to each standardized data table from the table information, and constructing a metadata pool based on the extracted metadata.

The metadata includes attribute categories, such as basic attributes (e.g., name, type, size, etc.), permission attributes (e.g., read-write permission, etc.), semantic attributes (e.g., semantic subject, etc.), feature attributes (e.g., feature with unique identification function, etc.), spatio-temporal attributes (e.g., modification time and address, etc.), etc. The specific field attribute can be determined according to the specific application scenario.

The extracting metadata corresponding to each standardized data table from the table information includes:

and extracting the field name and/or the field attribute of the corresponding field of each standardized data table from the table information, and determining the attribute type described by each field according to the field name and/or the field attribute of each field. For example, if the field name of a field is "name", the attribute category described by the field is determined to be the basic attribute. After determining the attribute types described by all the fields of each standardized data table, acquiring the identification information of each standardized data table, storing the identification information of each standardized data table and all the attribute types corresponding to the standardized data table in an associated manner, and constructing a metadata pool.

Step S32, determining the data relation between fields and the data relation between field values, and constructing a data relation pool based on the data relation.

The data relationships include: an inclusion relationship (e.g., belonging to an inclusion relationship between Guangdong and Shenzhen), a monitoring relationship (e.g., belonging to a monitoring relationship between a quality monitor and a device), and the like. The specific data relationship can be set as desired.

In this embodiment, the determining the plurality of virtual tags and the data extraction rule corresponding to each virtual tag includes: and acquiring a plurality of virtual labels and data extraction rules corresponding to the virtual labels from a predetermined virtual label set. Or responding to a virtual label creating instruction, receiving a virtual label name and a data extraction rule input by a user, and creating a corresponding virtual label based on the virtual label name and the data extraction rule. The virtual tags may be set according to specific application scenarios, for example, the virtual tags may be respectively set as: personal basic information, personal education information, personal marital information, personal business, personal loan information, enterprise basic information, enterprise listing information, enterprise legal dispute information, and the like.

The step of extracting the normalized data corresponding to each virtual tag from the normalized data pool according to the data extraction rule corresponding to each virtual tag, and constructing the virtual tag pool based on the normalized data corresponding to each virtual tag includes steps S33 to S38 (not shown in the figure):

step S33, extracting the structured query statement corresponding to each virtual tag from each data extraction rule.

Wherein the structured query statement may be a query statement containing a plurality of query conditions.

Step S34, according to the structured query statement corresponding to each virtual label, querying the standardized data corresponding to each virtual label in the standardized data pool.

When querying the standardized data corresponding to each virtual tag in the standardized data pool, if the query condition includes an attribute type condition and/or a data relationship condition, the query condition may be parsed by using the metadata in the metadata pool and/or the data relationship in the data relationship pool, and then the standardized data corresponding to each virtual tag is queried in the standardized data pool by using the parsed query condition.

For example, if a query condition is to query basic attributes of all enterprises in city a, first, the metadata pool is queried to find out that the identification information of all standardized data tables containing the basic attributes in the metadata is T1, T2, and T5, and the data relation pool is queried to find out that "city a" contains B, C, D three regions, and then the query condition is parsed as follows: the basic attributes of businesses in the three regions are queried B, C, D in a standardized data table with identifying information T1, T2, T5.

Step S35, it is determined whether the data extraction rule corresponding to each virtual tag includes an analysis sub-rule.

Step S36, when the data extraction rule corresponding to one virtual tag includes an analysis sub-rule, according to the analysis sub-rule, analyzing the standardized data corresponding to the virtual tag, and storing the data obtained after the analysis as the associated data corresponding to the virtual tag.

The analysis sub-rule may be set as required, for example, the virtual tag is "earliest registration time of the phone number", and the query of step S34 for the normalized data corresponding to the virtual tag includes: the historical registration time of each phone number, the analysis sub-rule may be set to: and finding out the earliest historical registration time from all the historical registration times corresponding to each telephone number as the earliest registration time of the telephone number.

Step S37, when the data extraction rule corresponding to one virtual tag does not include the analysis sub-rule, storing the standardized data corresponding to the virtual tag as the associated data corresponding to the virtual tag.

Step S38, based on the associated data corresponding to each virtual label, perform database modeling operation to obtain a virtual label pool.

For example, in the virtual tag library, one virtual tag may correspond to one or more data tables, one data table corresponds to only one virtual tag, and the associated data corresponding to the virtual tag is stored in the data table corresponding to the virtual tag.

Step S40, selecting core data from the standardized data pool and the virtual label pool, and constructing a plurality of core data domains based on the core data.

In this embodiment, the core data may also be selected from a standardized data pool, a metadata pool, a data relationship pool, and a virtual tag pool. The core data comprises a plurality of core data tables, and the association mapping relation can be established among the core data tables through main foreign keys.

The core data field comprises one or more of a population information data field, a legal unit data field, a city component data field, an electronic license data field, a macro-economic data field, a spatial geographic data field and a social credit data field.

The population information data field mainly integrates public security household registration data, general population survey data, officer data, social security data, personnel relationship data and the like, and can also expand data such as health, income, marital, social security, rescue, poverty, disability, mobility, death and the like.

The legal unit data field includes: and the related data of the institutional law persons, the business law persons, the enterprise law persons, the corporate law persons (agricultural private agencies) and other various institutional units established by the law with the unified social credit code as the mark.

The city component data fields include: the system comprises data of the building of a city building, the number, the position, the structure and the configured elevators of the building, component data of other facilities such as various public facilities, landscaping, city appearance environment, road traffic and the like, and is used for determining the distribution, registration management, inspection and maintenance of various components in the city.

The electronic license data field includes: the certificate data formed by electronic certificate information collection, certificate making and issuing, shared use, supervision and management and the like in government affair management and government affair service activities of main bodies such as administrative organs, public institutions, social organizations, enterprises, citizens and the like.

The macro-economic data field comprises: financial tax, consumption investment, import and export, economic trend, energy conservation and emission reduction, intellectual property and the like.

The spatial geographic data domain includes: administrative divisions, marine environments, land mineral products, water sources and water systems, natural resources, ecological environments, geographical landforms, disasters and disasters.

The social credit data field includes: consumption records, investment, personal credit and the like.

Step S50, extracting data to be processed from the core data corresponding to each core data domain, and combining the data to be processed to form a plurality of fused data domains.

For example, the converged data domain comprises one or more of an economic operation converged data domain, a public service converged data domain, a safety and stability maintenance converged data domain, a city management converged data domain, a carrier environment converged data domain and a social security converged data domain.

The data combination rules of each fusion data domain are different, for example, core data of a legal unit data domain, a macroscopic economic data domain and a social credit data domain are associated through enterprise social credit codes to form an enterprise economic basic image, public sentiment information such as financial report information and economic information text is crawled in the internet, and the enterprise economic basic image and the public sentiment information are combined to form an economic operation fusion data domain.

The method comprises the steps of extracting resources such as natural person identity information and legal person unit basic information from a population information data field and a legal person unit data field, building a unified identity authentication system by relying on electronic signatures, electronic seals, electronic certificates, electronic files and the like, forming a public service fusion data field, and providing efficient and convenient public services for enterprises and masses.

Data are extracted from core data fields such as an urban component data field, a spatial geographic data field, a population information data field, a macroscopic economy data field, a social credit data field and the like so as to bring people, land, objects, emotions, events and organizations into a grid, and a safety and stability maintenance data fusion field is formed by centralized monitoring and analysis of behavior tracks, social relations, social public emotions and the like, so that powerful support is provided for the police department for conducting decisions and information research and judgment.

Extracting data from core data fields such as a population information data field, a legal unit data field, an urban component data field and the like, integrating the extracted production element data such as capital, land, labor force, technology, information, knowledge and the like, realizing the coordinated development of the whole region, and fusing three main organization form information of governments, private departments and non-profit organizations within the urban area to construct and form an urban management and fusion data field.

Social elements, economic elements, political elements, legal elements and the like which influence enterprise activities are extracted from core data fields such as a population information data field, a legal unit data field, an urban part data field, a macroscopic economy data field, a spatial geography data field, a social credit data field and the like, and real-time data such as economic development conditions, financial and tax income, social employment conditions and the like are fused to form a carrier and business environment fusion data field.

Data such as public education, labor employment entrepreneurship, social insurance, medical health, social service, housing guarantee, public cultural sports, career arrangement, disabled person service, accurate poverty relief, epidemic situation prevention and control and the like are extracted from core data fields such as a population information data field, an urban part data field, a spatial geographic data field and the like, and are fused to form a social security fusion data field.

In this embodiment, preset type data is obtained from a plurality of preset data sources, a collection combined data pool is constructed based on the preset type data, then the preset type data in the collection combined data pool is converted into standardized data according to a predetermined mapping rule, a standardized data pool is constructed based on the standardized data, then a plurality of virtual tags and data extraction rules corresponding to the virtual tags are determined, standardized data corresponding to the virtual tags are extracted from the standardized data pool according to the data extraction rules corresponding to the virtual tags, a virtual tag pool is constructed based on the standardized data corresponding to the virtual tags, finally, core data is selected from the standardized data pool and the virtual tag pool, a plurality of core data domains are constructed based on the core data, and data to be processed is extracted from the core data corresponding to the core data domains, and combining the data to be processed to form a plurality of fusion data domains. Compared with the prior art, the data of a plurality of preset data sources are collected, and screening, combination and analysis are performed based on the collected data, so that the problems of inconsistent data statistical apertures and difficulty in data acquisition in the smart city data collection process are solved, the data acquisition efficiency and convenience of a user are improved, unnecessary data redundancy in city management is greatly reduced, calculation result multiplexing can be realized, and the storage and calculation cost in a large data system is greatly reduced.

FIG. 2 is a functional block diagram of the data model building apparatus according to the present invention.

The data model building device 100 of the present invention may be installed in an electronic device. According to the implemented functions, the data model building device may include a data aggregation module 101, a normalization module 102, a virtual tag pool building module 103, a core data domain building module 104, and a fused data domain building module 105. A module according to the present invention, which may also be referred to as a unit, refers to a series of computer program segments that can be executed by a processor of an electronic device and that can perform a fixed function, and that are stored in a memory of the electronic device.

In the present embodiment, the functions regarding the respective modules/units are as follows:

the data aggregation module 101 is configured to obtain preset type data from a plurality of preset data sources, and construct an aggregation combined data pool based on the preset type data.

For example, the obtaining preset type data from a plurality of preset data sources and constructing a collection combined data pool based on the preset type data includes:

firstly, communication connection is established with a plurality of preset data sources, preset type data are obtained from each preset data source, and a data model of each preset data source is obtained.

And then, constructing a collection combined data pool based on the preset type data and the data model of each preset data source.

The building and collecting combined data pool comprises the following steps:

And the standardization module 102 is configured to convert preset type data in the collection combined data pool into standardized data, and construct a standardized data pool based on the standardized data.

In this embodiment, if the preset type data is stored in the collection combined data pool in the form of a data table, the collection combined data pool includes a plurality of preset type data tables, each preset type data table includes at least one field (for example, some preset type data tables take columns as fields, and one column is one field), and the field attributes of each field are also stored in the collection combined data pool, where the field attributes include: definition, identification, representation, and permission values.

The virtual tag pool building module 103 is configured to determine a plurality of virtual tags and data extraction rules corresponding to the virtual tags, extract standardized data corresponding to the virtual tags from the standardized data pool according to the data extraction rules corresponding to the virtual tags, and build a virtual tag pool based on the standardized data corresponding to the virtual tags.

In this embodiment, the standardized data is stored in the standardized data pool in the form of a data table, and the data table storing the standardized data may be referred to as a standardized data table.

In this embodiment, the data model building apparatus 100 further includes a metadata pool building module and a data relationship pool building module (not shown in the figure), where:

and the metadata pool construction module is used for reading the table information of each standardized data table from the standardized data pool, extracting the metadata corresponding to each standardized data table from the table information, and constructing the metadata pool based on the extracted metadata.

And the data relation pool building module is used for determining the data relation between the fields and the data relation between the field values and building a data relation pool based on the data relation.

The method comprises the following steps of A1-A6 (not shown in the figure):

step a1, extracting the structured query statement corresponding to each virtual tag from each data extraction rule.

Step a2, according to the structured query statement corresponding to each virtual label, querying the standardized data corresponding to each virtual label in the standardized data pool.

Step a3, determining whether the data extraction rule corresponding to each virtual label includes an analysis sub-rule.

Step A4, when the data extraction rule corresponding to a virtual label includes an analysis sub-rule, according to the analysis sub-rule, analyzing the standardized data corresponding to the virtual label, and storing the data obtained after the analysis as the associated data corresponding to the virtual label.

The analysis sub-rule may be set as required, for example, the virtual tag is "earliest registration time of phone number", and the query of step a2 for the normalized data corresponding to the virtual tag includes: the historical registration time of each phone number, the analysis sub-rule may be set to: and finding out the earliest historical registration time from all the historical registration times corresponding to each telephone number as the earliest registration time of the telephone number.

Step A5, when the data extraction rule corresponding to a virtual label does not include the analysis sub-rule, storing the standardized data corresponding to the virtual label as the associated data corresponding to the virtual label.

Step A6, based on the associated data corresponding to each virtual label, performing database modeling operation to obtain a virtual label pool.

A core data domain constructing module 104, configured to select core data from the standardized data pool and the virtual tag pool, and construct a plurality of core data domains based on the core data.

The fused data domain constructing module 105 is configured to extract data to be processed from core data corresponding to each core data domain, and combine the data to be processed to form a plurality of fused data domains.

In this embodiment, preset type data is obtained from a plurality of preset data sources, a collection combined data pool is constructed based on the preset type data, the preset type data in the collection combined data pool is converted into standardized data, a standardized data pool is constructed based on the standardized data, then a plurality of virtual tags and data extraction rules corresponding to the virtual tags are determined, the standardized data corresponding to the virtual tags are extracted from the standardized data pool according to the data extraction rules corresponding to the virtual tags, a virtual tag pool is constructed based on the standardized data corresponding to the virtual tags, finally, core data is selected from the standardized data pool and the virtual tag pool, a plurality of core data fields are constructed based on the core data, data to be processed is extracted from the core data corresponding to the core data fields, and the data to be processed is combined, a plurality of fused data fields are formed. Compared with the prior art, the data of a plurality of preset data sources are collected, and screening, combination and analysis are performed based on the collected data, so that the problems of inconsistent data statistical apertures and difficulty in data acquisition in the smart city data collection process are solved, the data acquisition efficiency and convenience of a user are improved, unnecessary data redundancy in city management is greatly reduced, calculation result multiplexing can be realized, and the storage and calculation cost in a large data system is greatly reduced.

Fig. 3 is a schematic structural diagram of an electronic device implementing the method for constructing a data model according to the present invention.

The electronic device 1 may comprise a processor 10, a memory 11 and a bus, and may further comprise a computer program, such as a data model building program 12, stored in the memory 11 and executable on the processor 10.

The memory 11 includes at least one type of readable storage medium, which includes flash memory, removable hard disk, multimedia card, card-type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device 1, such as a removable hard disk of the electronic device 1. The memory 11 may also be an external storage device of the electronic device 1 in other embodiments, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the electronic device 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 11 may be used not only to store application software installed in the electronic device 1 and various types of data, such as codes of a data model building program, but also to temporarily store data that has been output or is to be output.

The processor 10 may be composed of an integrated circuit in some embodiments, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects various components of the electronic device by using various interfaces and lines, and executes various functions and processes data of the electronic device 1 by running or executing programs or modules (e.g., a data model building program, etc.) stored in the memory 11 and calling data stored in the memory 11.

The bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. The bus is arranged to enable connection communication between the memory 11 and at least one processor 10 or the like.

Fig. 3 shows only an electronic device with components, and it will be understood by those skilled in the art that the structure shown in fig. 3 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than those shown, or some components may be combined, or a different arrangement of components.

For example, although not shown, the electronic device 1 may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor 10 through a power management device, so as to implement functions of charge management, discharge management, power consumption management, and the like through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The electronic device 1 may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.

Further, the electronic device 1 may further include a network interface, and optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a bluetooth interface, etc.), which are generally used for establishing a communication connection between the electronic device 1 and other electronic devices.

Optionally, the electronic device 1 may further comprise a user interface, which may be a Display (Display), an input unit (such as a Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the electronic device 1 and for displaying a visualized user interface, among other things.

It is to be understood that the described embodiments are for purposes of illustration only and that the scope of the appended claims is not limited to such structures.

The data model building program 12 stored in the memory 11 of the electronic device 1 is a combination of instructions that, when executed in the processor 10, may implement:

Specifically, the specific implementation method of the processor 10 for the instruction may refer to the description of the relevant steps in the embodiment corresponding to fig. 1, which is not described herein again. It is emphasized that the virtual tag may also be stored in a node of a blockchain in order to further ensure the privacy and security of the virtual tag.

Further, the integrated modules/units of the electronic device 1, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. The computer-readable medium may include: any entity or device capable of carrying said computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM).

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof.

The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.

The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms second, etc. are used to denote names, but not any particular order.

Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims

1. A method for constructing a data model, the method comprising:

extracting data to be processed from core data corresponding to each core data domain, and combining the data to be processed to form a plurality of fusion data domains;

the extracting standardized data corresponding to each virtual label from the standardized data pool according to the data extraction rule corresponding to each virtual label, and constructing the virtual label pool based on the standardized data corresponding to each virtual label, includes:

2. The method for constructing a data model according to claim 1, wherein the preset type data includes a plurality of fields, and the converting the preset type data in the collection combined data pool into standardized data includes:

selecting a field to be processed from the preset type data;

3. The method of constructing a data model according to any one of claims 1 to 2, wherein the core data fields include one or more of demographic information data fields, jurisdictional data fields, city component data fields, electronic license data fields, macro-economic data fields, spatial geographic data fields, and social credit data fields.

4. The method according to any one of claims 1 to 2, wherein the converged data domain comprises one or more of an economic operation converged data domain, a public service converged data domain, a safety and stability converged data domain, a city management converged data domain, a carrier environment converged data domain, and a social security converged data domain.

5. The method for constructing a data model according to claim 1, wherein the obtaining preset type data from a plurality of preset data sources and constructing a collection combined data pool based on the preset type data comprises:

6. The method of constructing a data model of claim 5, wherein constructing a pooled federated data pool comprises:

7. An apparatus for constructing a data model, the apparatus comprising:

the fusion data domain building module is used for extracting data to be processed from the core data corresponding to each core data domain and combining the data to be processed to form a plurality of fusion data domains;

8. An electronic device, characterized in that the electronic device comprises:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of constructing a data model according to any one of claims 1 to 6.

9. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out a method of constructing a data model according to any one of claims 1 to 6.