CN112395366A - Data processing and creating method and device of distributed database and electronic equipment - Google Patents

Data processing and creating method and device of distributed database and electronic equipment Download PDF

Info

Publication number
CN112395366A
CN112395366A CN201910763294.6A CN201910763294A CN112395366A CN 112395366 A CN112395366 A CN 112395366A CN 201910763294 A CN201910763294 A CN 201910763294A CN 112395366 A CN112395366 A CN 112395366A
Authority
CN
China
Prior art keywords
data
user
distributed
data table
generating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910763294.6A
Other languages
Chinese (zh)
Inventor
任海峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201910763294.6A priority Critical patent/CN112395366A/en
Publication of CN112395366A publication Critical patent/CN112395366A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries

Abstract

The embodiment of the invention provides a data processing and creating method and device of a distributed database and electronic equipment, wherein the method comprises the following steps: acquiring data characteristics of a data table to be created by a user; generating a data distribution strategy according to the data characteristics; and generating a distributed data model for processing a data table according to the data distribution strategy. The embodiment of the invention can reduce the modeling threshold of the distributed database, abstract the modeling process into the information interaction and feature extraction process, fully utilize the modeling capability of the data service platform to establish a proper distributed data model for the user, and enable the non-professional user to more conveniently use the distributed data service provided by the data service platform.

Description

Data processing and creating method and device of distributed database and electronic equipment
Technical Field
The application relates to a data processing and creating method and device of a distributed database and electronic equipment, and belongs to the technical field of computers.
Background
With the development of distributed data processing technology, users can establish their own databases and execute various data service processes based on the services provided by the data service platform. For example, in a distributed database system, an MPP (Massively Parallel Processing) technology is widely applied, in a database non-shared cluster, each node has an independent disk storage system and an independent memory system, service data is divided into nodes according to a database model and application characteristics, and each data node is connected with each other through a dedicated network or a commercial general network, and cooperatively calculates with each other to provide database services as a whole.
In such distributed data applications, users need to perform distributed data model transaction modeling based on a database platform, and the modeling process needs certain domain knowledge and experience. For example, in the prior art, it is often necessary to make a user know the principle of the MPP database based on technical documents, training, and the like, and have certain optimization skills on such a database, so that a suitable distributed data model can be constructed by itself, and such a manner brings great inconvenience to the user.
Disclosure of Invention
The embodiment of the invention provides a data processing and creating method and device of a distributed database and electronic equipment, so that a user can conveniently construct the distributed database.
In order to achieve the above object, an embodiment of the present invention provides a data processing method for a distributed database, including:
acquiring data characteristics of a data table to be created by a user;
generating a data distribution strategy according to the data characteristics;
and generating a distributed data model for processing a data table according to the data distribution strategy.
The embodiment of the invention also provides a method for creating the distributed database, which comprises the following steps:
responding to a request of a user for establishing a data table, and acquiring field characteristics of the data table to be established by the user;
acquiring the service requirement characteristics of the data table;
generating a data distribution strategy according to the field characteristics and the service demand characteristics;
and generating one or more distributed data models for processing the data table according to the data distribution strategy, and recommending the one or more distributed data models to the user.
An embodiment of the present invention further provides a data processing apparatus for a distributed database, including:
the first characteristic acquisition module is used for acquiring the data characteristics of a data table to be created by a user;
the first strategy generation module is used for generating a data distribution strategy according to the data characteristics;
and the model creating module is used for generating a distributed data model for processing the data table according to the data distribution strategy.
The embodiment of the present invention further provides a device for creating a distributed database, including:
the second characteristic acquisition module is used for responding to a request of a user for establishing a data table and acquiring field characteristics of the data table to be established by the user;
the third characteristic acquisition module is used for acquiring the service requirement characteristics of the data table;
the second strategy generating module is used for generating a data distribution strategy according to the field characteristics and the service demand characteristics;
and the model recommending module generates one or more distributed data models for processing the data table according to the data distribution strategy and recommends the distributed data models to the user.
An embodiment of the present invention further provides an electronic device, including:
a memory for storing a program;
and the processor is used for operating the program stored in the memory so as to execute the data processing method of the distributed database.
According to the data processing and creating method and device for the distributed database and the electronic device, the data distribution strategy is generated by obtaining the data characteristics of the data table to be created by the user, and then the distributed data model is created for the user.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
FIG. 1 is a schematic diagram of an application scenario in which a user creates a data table according to an embodiment of the present invention;
FIG. 2 is a flow chart illustrating a data processing method of a distributed database according to an embodiment of the present invention;
FIG. 3 is a flowchart illustrating a method for creating a distributed database according to an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of a data processing apparatus of a distributed database according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a device for creating a distributed database according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
The technical solution of the present invention is further illustrated by some specific examples.
In a distributed database system of MPP (Massively Parallel Processing), each node in a non-shared cluster has an independent disk storage system and a memory system, service data is divided into nodes according to a database model and an application characteristic, and each data node is connected with each other through a dedicated network or a commercial general network, and performs cooperative calculation with each other to provide database service as a whole. The non-shared database cluster has the advantages of complete scalability, high availability, high performance, excellent cost performance, resource sharing and the like. However, in order to use an MPP-based distributed database system, a user needs to perform data modeling. The data modeling means abstract organization of various real world data, and determining the range of the database to be administered, the organization form of the data and the like until the data is converted into a real database.
In the embodiment of the invention, the data service platform guides the user according to the data characteristics and requirements of the user, so that a suitable distributed data model can be automatically established or recommended to the user, and the user can use the data model more conveniently.
As shown in fig. 1, which is a schematic view of an application scenario in which a user establishes a data table according to an embodiment of the present invention, the user may create the data table in a distributed database system provided by a data service platform according to a business requirement of the user. As shown in the figure, based on the technical solution of the embodiment of the present invention, a user can be guided to input some data characteristics in the process of establishing a data table by the user, so that the data service platform can analyze the characteristics of user data and user requirements, etc. As shown in the figure, the following characteristic information can be obtained through information interaction with a user: data volume, common filter query fields, common UV fields, lifecycle, data deletion time fields, primary key fields, common association query fields, and the like. Based on the information, the data service platform can select a suitable data model processing strategy such as a data distribution strategy, a life cycle management strategy and the like to construct a distributed data processing model, further create a data table meeting the user requirement, and then process the data table of the user by using the distributed data processing model.
After the data table is established, data access and data processing services can be provided for the outside. Each data table or a plurality of data tables may be considered a distributed database. In a distributed database system, individual data tables are managed based on a corresponding distributed data model. The data tables can be, for example, an access log of a website system for recording daily orientation data of the website system, and an office database of an enterprise for recording daily office data of the enterprise and providing office data access and processing services for employees of the enterprise.
Based on the technology provided by the embodiment of the invention, the data service platform can provide diversified modeling services of the distributed database for a plurality of shops. The data service platform can pre-construct some basic distributed data models and corresponding data tables based on industry characteristics and shop type characteristics. When a user accesses the data service platform, basic distributed data models and data tables can be recommended to users according to the industry where the user is located and the basic information such as the shop type operated by the user, for general shop users, the basic distributed data models and data tables can meet the requirements of the users, for some shop users with large scale or complex business requirements, the users can further interact with the users to obtain more business requirement information, and the personalized distributed data models and data tables are customized for the users.
In addition, when the distributed data model and the data table are built for the user, the data characteristics of multiple dimensions can be integrated for building. Besides basic data characteristics such as field characteristics, data volume and access volume, the data characteristics in service requirements such as UV (unique viewer) requirement information, commonly-used associated field information and commonly-used filter field information can be included. In addition, the different dimensions of the store are used as the characteristic basis of the customized distributed data model and the data table, such as the data access amount of the store history, the data characteristics of the whole industry where the store is located, the data characteristics of similar stores and the like, and the service content of the waiter/customer service of the store is used as the data characteristics. For example, more customer service contents of a certain shop are communication contents of travel booking services, and a targeted distributed data model and a targeted distributed data table can be constructed based on the characteristics, so that booking and planning are more rapid and efficient.
Having introduced the application scenario and the basic principle of the embodiment of the present invention for creating a distributed database for a user, the following will further describe the technical solution of the embodiment of the present invention with reference to several embodiments.
Example one
In view of the above requirement, an embodiment of the present invention provides a data processing method for a distributed database, where the method may be executed on a data service platform, so as to provide a distributed data service for a user, as shown in fig. 2, which is a schematic flow diagram of the data processing method for a distributed database according to the embodiment of the present invention, and the method includes:
s101: and acquiring the data characteristics of the data table to be created by the user. The data characteristics of the data table referred to herein may include field characteristics, service requirement characteristics, and the like. The field characteristics are the most basic characteristics of the data table, and specifically may include a field list and a field type, and these characteristics may generally exist in any type of data table.
In addition to the basic features such as the field features, the data features of the data table may also include service requirement features that are often generated based on service features of different users. Here, the service requirement characteristic may include one or more of UV (individual access) requirement information, common associated field information, common filter field information, a service primary key, and field priority information. For example, on some e-commerce platforms, it is often necessary to count the number of valid e-commerce visits per day for a certain page or certain information, that is, within the same day, multiple visits of the same e-commerce user will be recorded as one visit, that is, UV demand information. Therefore, the generated distributed data model is required to be subjected to deduplication processing according to the access data of the e-commerce platform user, so that the UV demand information can be obtained.
In addition, the data characteristics of the data table may further include the size of the data volume and/or the cluster size. The data service platform can judge whether the data table to be established by the current user needs to be partitioned, how to partition and the like according to the size of the data volume and/or the cluster scale. In addition, the data characteristics of the data table may also include life cycle characteristics of the data, and these pieces of information determine the time for which the corresponding data in the data table needs to be retained, and the like.
Further, the obtaining of the data characteristics of the data table may be performed when a request for establishing the data table by the user is received, that is, an operation of obtaining the data characteristics of the data table to be created by the user may be performed in response to the request. Specifically, the field list, the field type, and the like may be obtained from the user request, or the user may be guided to fill in step by step in a dialog box. For the requirement characteristics, the user may be further guided to fill out after setting the field list and the field type, for example, the user is allowed to fill in the requirement characteristics, the data size and the cluster size through information.
S102: and generating a data distribution strategy according to the data characteristics. The data distribution strategy is used for uniformly distributing data on different nodes, so that the balanced query work is operated locally as much as possible, and the data query tasks can be uniformly distributed on all the nodes after being disassembled. The data distribution strategy is basic information for constructing a distributed data model, and the data distribution strategy as described above may include the number of data partitions, the size of the data partitions, the level of the data partitions, and the like. For generating the data distribution policy, after the field features of the data table to be created are obtained, a basic data distribution policy may be constructed, for example, one of the fields is selected to be partitioned based on a field list and a field type provided by a user, for example, data partitioning is performed according to employee numbers.
In addition, as described above, in addition to the basic field characteristics, the data characteristics of the data table may include the size of the data volume and/or the cluster size, and the data service platform may make a judgment based on these information, and for some data tables with smaller data size or smaller cluster size, it may not perform partitioning, but use a relatively simple storage policy. For the data table needing partitioning, the number of the data partitions and/or the size of the data partitions can be selected according to the size of the data volume, the cluster size and the like, so that the data in the data table can be efficiently managed. For the life cycle characteristics of the aforementioned data, based on the characteristics, the data service platform can make a life cycle management strategy for each item of data in the data table, periodically screen the data, update or delete the data according to the life cycle of the data, and the like, so as to improve the operation efficiency of the database. In addition, the lifecycle characteristics of the data may also involve the formulation of data distribution policies, e.g., data with close lifecycles placed in the same partition, thereby facilitating data lifecycle-based management.
In addition, the aforementioned service requirement characteristics belong to some personalized characteristics of a user establishing a data table, and a reasonable distributed data model needs to be constructed by the data service platform according to the requirement characteristics so as to meet personalized requirements of different users.
For example, in the above step S101, the user may be guided to provide the following information: acquiring whether a user has UV requirements, wherein the user can provide a field for inquiring UV most frequently or provide a plurality of fields for UV according to priority; obtaining the input of a common filtering field of a user, wherein the user can provide a field which is filtered most frequently and also can provide a plurality of common filtering fields which discharge blood according to priority; acquiring a service main key of a user; acquiring priority information of each field; and acquiring the life cycle and other information of the data. By performing the relevant interaction with the information with the user, the requirement characteristics of the user are confirmed, and it is certainly feasible that the user does not need to provide all the information, and only part of the information can be provided.
The requirement information can enable the data service platform to execute more targeted partition strategies, and can specify multi-level partition strategies, life cycle management strategies and the like based on the requirement information. For example, a primary data distribution policy may be created according to the priority of common filter field > common association field > UV query field > service primary key, and a secondary or more than secondary partition policy may be generated according to the life cycle. Wherein, the life cycle means: in the field of databases, it is used to manage the flow of data in a database throughout its lifecycle, for example, from creation and initial storage to the entire process in which it is deleted out of date. The multilevel partitioning for the data means that the data can be selectively sliced according to a certain key value during the construction of the data table, for example, the data can be hierarchically partitioned according to a time range or a numerical range, so that when the data is queried, the database can be partitioned and cut according to partition filtering conditions, higher query performance is achieved, and the data can be used for life cycle management of processing the data.
It should be noted that the processing policies such as the data distribution policy and the lifecycle management policy may be extracted from a policy library of the data service platform, the data service platform may preset a plurality of processing policies and store the processing policies in the policy library, the processing policies are selected and recommended to the user according to preset rules according to data characteristics of a data table provided by the user, and the processing policies in the distributed data model may be finally determined in an interactive manner with the user. Of course, a machine learning model may also be adopted, and based on a large number of samples of the matching relationship between the data features of the data table and the data model processing strategies, the machine learning model is trained, so as to learn the matching relationship between the data features of the data table and the data model processing strategies, and thus, the processing strategies of the appropriate distributed data model can be selected according to the data features of the data table provided by the user.
S103: and generating a distributed data model for processing the data table according to the data distribution strategy. It should be noted that, for building the distributed data model, the data distribution policy is the most basic policy, and in addition, other processing policies may also be included, such as a lifecycle management policy, and in practical applications, a required policy may be flexibly selected to build the distributed data model according to characteristics and requirements of user data.
As described in the modeling process, one or more data distribution policies or lifecycle management policies may be generated in step S102, and these policies may be further provided to the user for judgment, and after the user confirms or selects the policies, the policies may be selected or confirmed, and a distributed data model is generated, through which a data table may be created on the data service platform for the user.
After the data table is created, the user can perform various data processing based on the distributed database system on the data table according to the requirement. For example, the user may associate the created data table with the service program of the user, so as to perform data writing, querying, statistics and other services.
The following further describes the technical solution of the embodiment of the present invention by an example of a specific application scenario.
For example, when a certain e-commerce enterprise user uses the data service provided by the data service platform, the e-commerce enterprise user needs to establish an access log (i.e. a data table to be created) to record the access of the access user as a customer to the sales page or the sales website of the e-commerce enterprise. The main requirements are statistics of daily PV and UV. The PV (page view) refers to: the user is counted once per click or refresh. UV (Unique guest) refers to: that is, one user account or terminal access is a visitor, and multiple accesses per day are calculated.
Based on such a situation, the access log may include the following field information: the customer's user identification (which may be a user account or terminal number, a customer's device identification number, such as a device unique identifier (UDID), an anonymous device identifier (OAID), a developer anonymous device identifier (VAID), and an application anonymous device identifier (AAID), etc.), a page access identification (such as an SPM (super position model) number), an access time, etc. After the access log is established, each access by the customer is recorded in the access log.
A typical application scenario for this access log is that the user needs to count the daily PV and UV of the web site or page. Such a requirement can be obtained by data processing of the access log. Wherein, PV statistics: the total number of records can be obtained by summarizing the access records in the access log. And (4) UV statistics: the customer's user identification field may be kicked in the access log and the total number of records after kicking may be retrieved.
The above information can be obtained by information interaction with the user when the user creates the data table as the access log. Based on the analysis of the details of the access log and the query scenario requirements, the following data characteristic elements of the access log can be obtained:
and (3) filtering conditions: page access identification and access time;
a statistic field: the user identification of the customer.
From these feature elements, a data model processing strategy can be generated as follows: the primary partition of the MPP model is done according to the user identification (a hash distribution algorithm may be used) and the secondary partition of the MPP model is done according to the access time (format e.g., 20190601) and is used for lifecycle management such as retention of 30-day history data. After the processing policy of the data model is determined, a distributed data model and an access log may be created.
After the distributed data model corresponding to the access log is established, a record is generated in the access log according to each access behavior of a customer accessing the e-commerce user, and the record is subjected to partition recording and life cycle management based on the processing of the distributed data model. The e-commerce user can perform statistical query based on the distributed data model, for example, a customer access log on a certain page on a certain day is filtered out through page access identification and access time, then the part of access log is summarized to obtain PV data, and then the result obtained after kicking the user account number of the part of log is summarized to obtain UV data.
According to the data processing method of the distributed database, the data distribution strategy is generated by obtaining the data characteristics of the data table to be created by the user, and then the distributed data model is established for the user, the modeling threshold of the distributed database can be reduced through the process, the modeling process is abstracted into the information interaction and characteristic extraction process, the modeling capability of the data service platform is fully utilized to establish the proper distributed data model for the user, and the non-professional user can use the distributed data service provided by the data service platform more conveniently.
In addition, the embodiment of the invention also provides a method for creating the distributed database, and the distributed data model can be recommended to the user for the user to select based on the information interaction between the data service platform and the user so as to assist the user in establishing the data table. As shown in fig. 3, which is a schematic flowchart of a method for creating a distributed database according to an embodiment of the present invention, the method includes:
s201: and responding to a request of a user for establishing the data table, and acquiring field characteristics of the data table to be established by the user, wherein the field characteristics can comprise a field class table and a field type. This step mainly collects the most basic information of the data sheet.
S202: and acquiring service requirement characteristics of the data table, wherein the service requirement characteristics are personalized characteristics of services corresponding to the users, and may include, for example, UV requirement information and/or common associated field information and/or common filtering field information and/or service key and/or life cycle characteristics of data. This step mainly collects individualized demand information based on the data sheet.
S203: and generating a data distribution strategy according to the field characteristics and the service requirement characteristics. Further, lifecycle management policies can be generated in addition to data distribution policies.
S204: and generating one or more distributed data models for processing the data table according to the data distribution strategy, and recommending the one or more distributed data models to the user. After receiving the recommended distributed data models, the user can select from the recommended distributed data models according to the self requirements to find the model most suitable for the self requirements. In the case where only one model is recommended, the user may also select whether to use the model. After the user selects, the data can be further fed back to the data service platform to trigger the creation process of the data table of step S205.
S205: in response to a user selection of a recommended distributed data model, a data table corresponding to the distributed data model is created. The process of creating the data table can be executed by the generated distributed data model, or the data service platform can obtain the relevant information according to the previous steps to generate the data table, and then the data table is associated with the corresponding distributed data model.
Based on the method for creating the distributed database, a plurality of distributed data models can be generated through interaction with the user for the user to select, so that the user is assisted in building a data table, and the data processing requirements of the user are better met.
Example two
As shown in fig. 4, which is a schematic structural diagram of a data processing apparatus of a distributed database according to an embodiment of the present invention, the processing apparatus may be disposed on the foregoing data service platform, and includes:
the first characteristic obtaining module 11 is configured to obtain a data characteristic of a data table to be created by a user. The data characteristics may include field characteristics, UV requirement information and/or commonly-used associated field information and/or commonly-used filtering field information and/or service primary key and/or field priority information and/or life cycle characteristics of the data, and the like.
The first policy generating module 12 is configured to generate a data distribution policy according to the data characteristics, and further, may also generate a lifecycle management policy and the like. The data distribution strategy is basic information for constructing a distributed data model, and may include the number of data partitions, the scale of the data partitions, the level of the data partitions, and the like. How to generate the data model processing policy based on various data characteristics has been explained in the foregoing embodiments.
A model creation module 13 for generating a distributed data model for processing the data table according to the data distribution policy.
After the data table is created, the user can perform various data processing based on the distributed database system on the data table according to the requirement.
According to the data processing device of the distributed database, the data distribution strategy is generated by obtaining the data characteristics of the data table to be created by the user, and then the distributed data model is established for the user, the modeling threshold of the distributed database can be reduced through the process, the modeling process is abstracted into the information interaction and characteristic extraction process, the modeling capability of the data service platform is fully utilized to establish the proper distributed data model for the user, and the non-professional user can use the distributed data service provided by the data service platform more conveniently.
In addition, the embodiment of the invention also provides a device for creating the distributed database, which can recommend the distributed data model for the user to be selected by the user based on the information interaction between the data service platform and the user so as to assist the user in establishing the data table. Fig. 5 is a schematic structural diagram of a device for creating a distributed database according to an embodiment of the present invention, including:
the second feature obtaining module 21 is configured to, in response to a request of a user for establishing a data table, obtain field features of the data table to be created by the user, where the field features may include a field class table and a field type.
And a third feature obtaining module 22, configured to obtain service requirement features of the data table, where the service requirement features may include UV requirement information and/or commonly-used associated field information and/or commonly-used filtered field information and/or service primary key and/or life cycle features of the data.
And the second policy generating module 23 is configured to generate a data distribution policy according to the field characteristics and the requirement characteristics.
And the model recommending module 24 is used for generating one or more distributed data models for processing the data table according to the data distribution strategy and recommending the one or more distributed data models to the user.
After receiving the recommended distributed data models, the user can select from the recommended distributed data models according to the self requirements to find the model most suitable for the self requirements. In the case where only one model is recommended, the user may also select whether to use the model. After the user selects, the data can be further fed back to the data service platform to trigger the creation process of the data table. Accordingly, the apparatus may further comprise:
and a data table creating module 25, configured to create a data table corresponding to the recommended distributed data model in response to selection of the distributed data model by the user.
Based on the establishing device of the distributed database, a plurality of distributed data models can be generated through interaction with the user for the user to select, so as to assist the user in establishing a data table, and further, the data processing requirements of the user can be better met.
The detailed description of the above processing procedure, the detailed description of the technical principle, and the detailed analysis of the technical effect are described in the foregoing embodiments, and are not repeated herein.
EXAMPLE III
The foregoing embodiment describes a flow process and a device structure according to an embodiment of the present invention, and the functions of the method and the device can be implemented by an electronic device, as shown in fig. 6, which is a schematic structural diagram of the electronic device according to an embodiment of the present invention, and specifically includes: a memory 110 and a processor 120.
And a memory 110 for storing a program.
In addition to the programs described above, the memory 110 may also be configured to store other various data to support operations on the electronic device. Examples of such data include instructions for any application or method operating on the electronic device, contact data, phonebook data, messages, pictures, videos, and so forth.
The memory 110 may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
The processor 120, coupled to the memory 110, is used for executing the program in the memory 110 to perform the operation steps of the data processing method of the distributed database and the creation method of the distributed database described in the foregoing embodiments.
Furthermore, the processor 120 may also include various modules described in the foregoing embodiments to perform data processing of the distributed database and creation of the distributed database, and the memory 110 may be used, for example, to store data required for the modules to perform operations and/or output data.
The detailed description of the above processing procedure, the detailed description of the technical principle, and the detailed analysis of the technical effect are described in the foregoing embodiments, and are not repeated herein.
Further, as shown, the electronic device may further include: communication components 130, power components 140, audio components 150, display 160, and other components. Only some of the components are schematically shown in the figure and it is not meant that the electronic device comprises only the components shown in the figure.
The communication component 130 is configured to facilitate wired or wireless communication between the electronic device and other devices. The electronic device may access a wireless network based on a communication standard, such as WiFi, a mobile communication network, such as 2G, 3G, 4G/LTE, 5G, or a combination thereof. In an exemplary embodiment, the communication component 130 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 130 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
The power supply component 140 provides power to the various components of the electronic device. The power components 140 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for an electronic device.
The audio component 150 is configured to output and/or input audio signals. For example, the audio component 150 includes a Microphone (MIC) configured to receive external audio signals when the electronic device is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in the memory 110 or transmitted via the communication component 130. In some embodiments, audio assembly 150 also includes a speaker for outputting audio signals.
The display 160 includes a screen, which may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation.
Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (14)

1. A data processing method of a distributed database comprises the following steps:
acquiring data characteristics of a data table to be created by a user;
generating a data distribution strategy according to the data characteristics;
and generating a distributed data model for processing a data table according to the data distribution strategy.
2. The method of claim 1, wherein obtaining data characteristics of a data table to be created by a user, and generating a data distribution policy according to the data characteristics comprises:
responding to a request of a user for establishing a data table, and acquiring field characteristics of the data table to be established by the user;
and generating the data distribution strategy according to the field characteristics.
3. The method of claim 2, wherein the field characteristics include a field list and a field type.
4. The method of claim 1, wherein obtaining data characteristics of a data table to be created by a user, and generating a data distribution policy according to the data characteristics comprises:
responding to a request of a user for establishing a data table, and acquiring data characteristics of the data table to be established by the user, wherein the data characteristics comprise field characteristics, the size of data volume and/or cluster scale;
and generating a data distribution strategy according to the data characteristics, wherein the data distribution strategy at least comprises the number of data partitions and/or the sizes of the data partitions.
5. The method of claim 1, wherein obtaining data characteristics of a data table to be created by a user, and generating a data distribution policy according to the data characteristics comprises:
responding to a request of a user for establishing a data table, and acquiring data characteristics of the data table to be established by the user, wherein the data characteristics comprise field characteristics and service requirement characteristics;
and generating a data distribution strategy according to the data characteristics, wherein the data distribution strategy at least comprises a multilevel data partition strategy.
6. The method of claim 1, wherein obtaining data characteristics of a data table to be created by a user, and generating a data distribution policy according to the data characteristics comprises:
responding to a request of a user for establishing a data table, and acquiring data characteristics of the data table to be established of the user, wherein the data characteristics comprise field characteristics and life cycle characteristics of data;
and according to the data characteristics, a data distribution strategy and a life cycle management strategy.
7. A method for creating a distributed database, comprising:
responding to a request of a user for establishing a data table, and acquiring field characteristics of the data table to be established by the user;
acquiring the service requirement characteristics of the data table;
generating a data distribution strategy according to the field characteristics and the service demand characteristics;
and generating one or more distributed data models for processing the data table according to the data distribution strategy, and recommending the one or more distributed data models to the user.
8. The method of claim 7, further comprising:
in response to a user selection of a recommended distributed data model, a data table corresponding to the distributed data model is created.
9. A data processing apparatus of a distributed database, comprising:
the first characteristic acquisition module is used for acquiring the data characteristics of a data table to be created by a user;
the first strategy generation module is used for generating a data distribution strategy according to the data characteristics;
and the model creating module is used for generating a distributed data model for processing the data table according to the data distribution strategy.
10. The apparatus of claim 9, wherein the data characteristics comprise field characteristics and traffic demand characteristics.
11. An apparatus for creating a distributed database, comprising:
the second characteristic acquisition module is used for responding to a request of a user for establishing a data table and acquiring field characteristics of the data table to be established by the user;
the third characteristic acquisition module is used for acquiring the service requirement characteristics of the data table;
the second strategy generating module is used for generating a data distribution strategy according to the field characteristics and the service demand characteristics;
and the model recommending module generates one or more distributed data models for processing the data table according to the data distribution strategy and recommends the distributed data models to the user.
12. The apparatus of claim 11, further comprising:
and the data table creating module is used for responding to the selection of the recommended distributed data model by the user and creating a data table corresponding to the distributed data model.
13. An electronic device, comprising:
a memory for storing a program;
a processor for executing the program stored in the memory to perform the data processing method of the distributed database according to any one of claims 1 to 6.
14. An electronic device, comprising:
a memory for storing a program;
a processor for executing the program stored in the memory to perform the method of creating a distributed database of claim 7 or 8.
CN201910763294.6A 2019-08-19 2019-08-19 Data processing and creating method and device of distributed database and electronic equipment Pending CN112395366A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910763294.6A CN112395366A (en) 2019-08-19 2019-08-19 Data processing and creating method and device of distributed database and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910763294.6A CN112395366A (en) 2019-08-19 2019-08-19 Data processing and creating method and device of distributed database and electronic equipment

Publications (1)

Publication Number Publication Date
CN112395366A true CN112395366A (en) 2021-02-23

Family

ID=74603350

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910763294.6A Pending CN112395366A (en) 2019-08-19 2019-08-19 Data processing and creating method and device of distributed database and electronic equipment

Country Status (1)

Country Link
CN (1) CN112395366A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113239016A (en) * 2021-06-01 2021-08-10 通号智慧城市研究设计院有限公司 Database design assistance apparatus and method

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020040639A1 (en) * 2000-10-05 2002-04-11 William Duddleson Analytical database system that models data to speed up and simplify data analysis
CN102968503A (en) * 2012-12-10 2013-03-13 曙光信息产业(北京)有限公司 Data processing method for database system, and database system
CN103778148A (en) * 2012-10-23 2014-05-07 阿里巴巴集团控股有限公司 Life cycle management method and equipment for data file of Hadoop distributed file system
WO2014137258A1 (en) * 2013-03-07 2014-09-12 Telefonaktiebolaget L M Ericsson (Publ) Selection of data storage settings for an application
CN104903887A (en) * 2012-10-16 2015-09-09 华为技术有限公司 System and method for flexible distributed massively parallel processing (MPP)
CN106294757A (en) * 2016-08-11 2017-01-04 上海交通大学 A kind of distributed data base divided based on hypergraph and clustered partition method thereof
CN109033113A (en) * 2017-06-12 2018-12-18 北京京东尚科信息技术有限公司 The management method and device of data warehouse and Data Mart
CN109299115A (en) * 2018-11-30 2019-02-01 北京锐安科技有限公司 A kind of date storage method, device, server and storage medium
CN109902101A (en) * 2019-02-18 2019-06-18 国家计算机网络与信息安全管理中心 Transparent partition method and device based on SparkSQL

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020040639A1 (en) * 2000-10-05 2002-04-11 William Duddleson Analytical database system that models data to speed up and simplify data analysis
CN104903887A (en) * 2012-10-16 2015-09-09 华为技术有限公司 System and method for flexible distributed massively parallel processing (MPP)
CN103778148A (en) * 2012-10-23 2014-05-07 阿里巴巴集团控股有限公司 Life cycle management method and equipment for data file of Hadoop distributed file system
CN102968503A (en) * 2012-12-10 2013-03-13 曙光信息产业(北京)有限公司 Data processing method for database system, and database system
WO2014137258A1 (en) * 2013-03-07 2014-09-12 Telefonaktiebolaget L M Ericsson (Publ) Selection of data storage settings for an application
CN106294757A (en) * 2016-08-11 2017-01-04 上海交通大学 A kind of distributed data base divided based on hypergraph and clustered partition method thereof
CN109033113A (en) * 2017-06-12 2018-12-18 北京京东尚科信息技术有限公司 The management method and device of data warehouse and Data Mart
CN109299115A (en) * 2018-11-30 2019-02-01 北京锐安科技有限公司 A kind of date storage method, device, server and storage medium
CN109902101A (en) * 2019-02-18 2019-06-18 国家计算机网络与信息安全管理中心 Transparent partition method and device based on SparkSQL

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
蔡鸿明;姜祖海;姜丽红;: "分布式环境下业务模型的数据存储及访问框架", 清华大学学报(自然科学版), no. 06 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113239016A (en) * 2021-06-01 2021-08-10 通号智慧城市研究设计院有限公司 Database design assistance apparatus and method
CN113239016B (en) * 2021-06-01 2024-04-02 通号智慧城市研究设计院有限公司 Database design assistance apparatus and method

Similar Documents

Publication Publication Date Title
CN106067080B (en) Configurable workflow capabilities are provided
CN104462177A (en) Mobile application daily user engagement scores and user profiles
CN103678647A (en) Method and system for recommending information
US9208504B2 (en) Using geographical location to determine element and area information to provide to a computing device
CN103136335A (en) Data control method based on data platforms
CN104598557A (en) Method and device for data rasterization and method and device for user behavior analysis
CN105740368A (en) Method and device for generating report form
CN112311612A (en) Family portrait construction method and device and storage medium
CN108363684A (en) List creation method, device and server
CN103473036A (en) Input method skin push method and system
CN113032420A (en) Data query method and device and server
CN112308590A (en) Parameter processing method and device, computing equipment and storage medium
CN104123303A (en) Method and device for providing data
CN112395366A (en) Data processing and creating method and device of distributed database and electronic equipment
KR102547033B1 (en) Method for providing information in the way user selected using keyword recognition function
CN101841555A (en) Be used for effectively utilizing the system and method for the transmission structure of electric network
WO2016206395A1 (en) Weekly report information processing method and device
CN110062112A (en) Data processing method, device, equipment and computer readable storage medium
CN115543428A (en) Simulated data generation method and device based on strategy template
CN106874327B (en) Counting method and device for business data
KR101640870B1 (en) Log management system and method thereof
CN114385623A (en) Data table acquisition method, device, apparatus, storage medium, and program product
CN114490644A (en) Data storage method, device and storage medium
CN109376148B (en) Data processing method and device for slow change dimension table and electronic equipment
CN113157825A (en) User behavior data integration method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination